high-throughput sequencing technologies and genomics...

180
UNIVERSITÀ CATTOLICA DEL SACRO CUORE Sede di Piacenza Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School on the Agro-Food System cycle XXVII S.S.D: AGR/17 High-throughput sequencing technologies and genomics: insights into the bovine species Candidate: Marco Milanesi Matr. n.: 4011148 Academic Year 2013/2014

Upload: others

Post on 07-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

UNIVERSITAgrave CATTOLICA DEL SACRO CUORE Sede di Piacenza

Scuola di Dottorato per il Sistema Agro-alimentare

Doctoral School on the Agro-Food System

cycle XXVII

SSD AGR17

High-throughput sequencing technologies and genomics insights into the bovine species

Candidate Marco Milanesi Matr n 4011148

Academic Year 20132014

Scuola di Dottorato per il Sistema Agro-alimentare

Doctoral School on the Agro-Food System

cycle XXVII

SSD AGR17

High-throughput sequencing technologies and genomics insights into the bovine species

Coordinator Chmo Prof Antonio Albanese

_______________________________________

Candidate Marco Milanesi Matriculation n 4011148

Tutor Prof Paolo AjmoneMarsan PhD Lorenzo Bomba PhD Stefano Capomaccio PhD Ezequiel Louis Nicolazzi

Academic Year 20132014

Alla$mia$famiglia$

A$mio$nonno$Mario$

INDEXamp

Chapteramp1ndashIntroduction 3

Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection

11

Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds

49

Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite

81

Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates

89

Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

99

Chapteramp7ndashConclusion 167

amp

Acknowledgmentsampamp 173

CHAPTER1

Introduction

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 2: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

Scuola di Dottorato per il Sistema Agro-alimentare

Doctoral School on the Agro-Food System

cycle XXVII

SSD AGR17

High-throughput sequencing technologies and genomics insights into the bovine species

Coordinator Chmo Prof Antonio Albanese

_______________________________________

Candidate Marco Milanesi Matriculation n 4011148

Tutor Prof Paolo AjmoneMarsan PhD Lorenzo Bomba PhD Stefano Capomaccio PhD Ezequiel Louis Nicolazzi

Academic Year 20132014

Alla$mia$famiglia$

A$mio$nonno$Mario$

INDEXamp

Chapteramp1ndashIntroduction 3

Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection

11

Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds

49

Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite

81

Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates

89

Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

99

Chapteramp7ndashConclusion 167

amp

Acknowledgmentsampamp 173

CHAPTER1

Introduction

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 3: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

Alla$mia$famiglia$

A$mio$nonno$Mario$

INDEXamp

Chapteramp1ndashIntroduction 3

Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection

11

Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds

49

Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite

81

Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates

89

Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

99

Chapteramp7ndashConclusion 167

amp

Acknowledgmentsampamp 173

CHAPTER1

Introduction

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 4: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

INDEXamp

Chapteramp1ndashIntroduction 3

Chapteramp2Relativeextendedhaplotypehomozygosity(rEHH)signalsacrossbreedsrevealdairyandbeefspecificsignaturesofselection

11

Chapteramp3SearchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisinthreeItaliancattlebreeds

49

Chapteramp4ampMUGBASaspeciesfreegenebasedassociationsuite

81

Chapteramp5Imputationaccuracyisrobusttocattlereferencegenomeupdates

89

Chapteramp6QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

99

Chapteramp7ndashConclusion 167

amp

Acknowledgmentsampamp 173

CHAPTER1

Introduction

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 5: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

CHAPTER1

Introduction

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 6: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

Background

The bovine species counts almost 15 billion of heads classified in some 800

breeds worldwide (httpfaostat3faoorg) The species is adapted to a wide

varietyofdifferentenvironmentandhusbandrysystemsandisoftenrearedfor

multiple purposes (Taberlet et al 2008) draft power milk meat leather

productionandotheruses(Elsiketal2009)

European cattle (Bos$ taurus) derive from a single domestication event of the

auroch (Bos$primigenius)occurred in theFertileCrescent around10000years

ago(Taberletetal2011)FollowingdomesticationcattlefollowedtheNeolithic

expansionofagricultureandcolonisedEurope(seeAjmoneMMarsanetal2010

for a review) Many thousand years of natural and artificial selection have

shaped cattle phenotype and genotype to meet environmental conditions and

human needs and together with demographic events and genetic drift have

started thedifferentiation of the bovine species in diverse and locally adapted

populations This differentiation process accelerated in the last two centuries

whenthebreedconceptwasinventedsothatnowthisspeciescountshundreds

of different breeds with high diversity in coat colour body size production

aptitudeadaptationtoclimate tolerancetodiseaseetc(Taberletetal2008

Elsiketal2009TheBovineHapMapConsortium2009)

Thisdiversityisnowbeingprogressivelylostwiththeextinctionofmanylocal

populationsmainlyforeconomicreasonsSelectioneffortshavethereforebeen

focussedonafewoutperformingindustrialbreedsthatarepresentlyselectedfor

specialiseddairyorbeefproductionandsometimesdoublepurpose

In these breeds breeding schemes are well developed and have been

traditionally based on the genetic evaluation of sires and dams using data

collectedfromperformanceandprogenytestingandunbiasedlinearprediction

modelsThesearefoundedontheFisherinfinitesimalmodellingofquantitative

trait variation and need no genomic information to estimate breeding values

(EBVs) The genetic progress recorded using this approach has been relevant

particularly for traits having medium to high heritability In Italian Holstein

genetic trends for themain production traits (milk protein and fat yield) are

positiveHoweveraccurateestimateofbreedingvalueshasahighcostinterms

4

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 7: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School

of investment and time so that 5 to 6 years are needed to have a dairy bullprogenytested(wwwanafiit)Theadventofgenomicselection(Meuwissenetal2001)markedamilestoneindairycattlebreedingTheideawasdevelopedbeforethemoleculartoolswereinplaceforimplementitinpracticeandhasbecomearealityonlyrecentlyInthisapproach progeny tested bulls are genotyped at many thousand loci and thevalue of their EBVs used to estimate the value of each allele at each locusgenotyped These values are then used to predict the genetic value of younganimals on the basis of their genotype As a result animals from the newgeneration receive an EBV with the same accuracy of estimates obtained byprogenytestingbutmuchearlierThislowerthegenerationintervalandgreatlyincreasestherateofgeneticprogress indairypopulationsThe initial ideawasso good that nowadays genomic selection is widely applied in industrialcountriestothemajordairycattlebreedsWiththedecreaseofgenotypingcostit is likely that its use will be soon expanded to animals selected for otherpurposes (eg beef cattle) and other species (eg pigs) However genomicselection is an extension of traditional selection It is extremely useful inbreedingforcomplextraitsbutgivesnobiologicalinformationonthegenesthatcontrolthemIfthetraditionalselectionviewsasiregenomeasabigblackboxhaving a value but no clue why genomic selection views the same genomedisassembled inmany small boxes but still black and stillwithno clueon thereasonoftheirvalueIt is reasonable to think that a deeper understanding of the biologicalmechanisms controlling traits may further increase the accuracy of genomicevaluationsIndeedthenewtechnologiesoffernewopportunitiesinthissensebyloweringcostsandincreasingprecisionofdataproductionThis thesis is exploring the use of these technologies for retrieving biologicalinformationfromgenomicdataItusesmethodsnowconsideredldquotraditionalrdquoingenomics (GWAS and analysis of selection sweeps) and less traditional ones(gene centered GWAS and exome analysis) All data used here have beenproducedwithin twonationalprojectson farmanimalgenomics fundedby theItalianMinistryofAgriculture

5

M SELMOL ldquoRicerca e innovazione nelle attivitagrave di miglioramento genetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistemazootecniconazionaleIrdquo(ResearchandInnovationinanimalbreeding

sheepgoatbuffalohorseanddonkeyI)

M INNOVAGENldquoRicercaeinnovazionenelleattivitagravedimiglioramentogenetico

animalemediante tecniche di geneticamolecolare per la competitivitagrave del

sistema zootecnico nazionale IIrdquo (Research and Innovation in animal

breedingsheepgoatbuffalohorseanddonkeyII)

In these projects the research group I work with M Istituto di Zootecnica at

Universitagrave Cattolica del Sacro Cuore di Piacenza M coordinated all the research

activitiesconcerningtheapplicationofgenomicstechniquesindairycattle

Contentsofthethesis

The thesis is structured into a short introduction (this first chapter) 5 core

chapters(Chapter2to6)andashortConclusion(Chapter7)Withtheexception

of Chapter 6 still in preparation the other core chapters containmanuscripts

that at time of writing are under review for their publication in international

peerMreviewedjournalsormanuscript

InChapter2selectionsignatureindairyandbeefcattleareinvestigatedwiththe

Illumina BovineSNP50 Genotyping BeadChip Five Italian cattle breeds (Italian

Holstein Italian Brown Italian Simmental Marchigiana and Piedmontese) are

analyzedwiththerEHHmethod(Sabetietal2002)tofindsignaturesofrecent

selectionCandidategenesinthevicinityofsignaturessharedbyeitherdairyor

beefbreeds are identified and theirpossible role indairy andbeefproduction

discussed

In Chapter 3 a GWAS analysis is run to investigate production traits in three

Italian dairy cattle breeds (Italian Holstein Italian Brown Italian Simmental)

Data are analysed with a well established method (Amin et al 2007)

6

implemented in theGenABELRpackage (Aulchenkoet al 2007) andwith the

newgeneMbasedmethodAssociationisinvestigatedbetween50KSNPsandmilk

production(milkyield)andqualitytraits(percentageofmilkproteinandfat)

InChapter4thegeneMbasedmethodusedinthepreviouschapter3inspiredby

theVEGASapproach(Liuetal2010)isdescribedThesoftwarewasdeveloped

andreMcodedinourlaboratoryinordertopermittheanalysisofanyspeciesin

additiontohumanMultiMtestingcorrectionandplotrepresentationcompletethe

toolsmadeavailable

InChapter5wetesttheimpactoftheuseofdifferentreferencesequencesinthe

imputation step Italian Simmental data are used to test variation in genomeM

wide inputation accuracy using different cattle reference sequences (BTAU42

UMD31 and BTAU46) To increase the analyses reliability four different

inputationsoftwaresaretested

InChapter6Holsteindataareanalysedcombiningexomesequencesand800K

SNPs data to seek deleterious recessive variantsWhile in natural populations

the destiny of these variants is to disappear in livestock breeds they can be

maintained and disseminated if carried by a highly productive sire The

intersection of deleterious mutations identified by sequencing and lack of

homozygousregionsidentifiedbytheanalysisofthepopulationwiththeHDSNP

chipidentifiedgenescandidatetocarryrecessivedeleteriousvariants

Objective

Themainobjectiveofthethesiswastoinvestigatethebovinegenomewiththe

aid new highMthroughput technology and to explore new strategies for linking

biology (genes and their function) to traits This linkage once validated may

increase the accuracy of genomic prediction and find application in selection

schemesegineradicatingthemostdeleteriousvariantsandorintheplanning

ofmatings

7

ReferencesAjmoneMMarsanPGarciaJFLenstraJA2010OntheoriginofcattleHow

aurochsbecamecattleandcolonizedtheworldEvolAnthropol19148ndash157doi101002evan20267

AminNvanDuijnCMAulchenkoYS2007AGenomicBackgroundBasedMethodforAssociationAnalysisinRelatedIndividualsPLoSONE2e1274doi101371journalpone0001274

AulchenkoYSKoningDMJdeHaleyC2007GenomewideRapidAssociationUsingMixedModelandRegressionAFastandSimpleMethodForGenomewidePedigreeMBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

ElsikCGTellamRLWorleyKC2009TheGenomeSequenceofTaurineCattleAWindowtoRuminantBiologyandEvolutionScience324522ndash528doi101126science1169588

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKMHaywardNKMontgomeryGWVisscherPMMartinNGMacgregorS2010AVersatileGeneMBasedTestforGenomeMwideAssociationStudiesTheAmericanJournalofHumanGenetics87139ndash145doi101016jajhg201006009

MeuwissenTHHayesBJGoddardME2001PredictionoftotalgeneticvalueusinggenomeMwidedensemarkermapsGenetics1571819ndash1829

SabetiPCReichDEHigginsJMLevineHZPRichterDJSchaffnerSFGabrielSBPlatkoJVPattersonNJMcDonaldGJAckermanHCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash837doi101038nature01140

TaberletPCoissacEPansuJPompanonF2011ConservationgeneticsofcattlesheepandgoatsCRBiol334247ndash254doi101016jcrvi201012007

8

TaberletPValentiniARezaeiHRNaderiSPompanonFNegriniR

AjmoneMMarsanP2008Arecattlesheepandgoatsendangered

speciesMolEcol17275ndash284doi101111j1365M294X200703475x

TheBovineHapMapConsortium2009GenomeMWideSurveyofSNPVariation

UncoverstheGeneticStructureofCattleBreedsScience324528ndash532

doi101126science1167936

9

CHAPTER2

Relative(extendedamphaplotype(homozygosity)(rEHH)ampsignalsampacrossampbreedsamprevealampdairyampand$beef$specificampsignaturesampofampselection

LBomba1ELNicolazzi2MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly2FondazioneParcoTecnologicoPadanoViaEinsteinLocCascinaCodazza26900LodiItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoGeneticSelectionEvolution

Abstract

Background

Nowadays many methods have been implemented to scan for signatures of

selection at genomescale level within and across breeds Extended haplotype

homozygosity is a proficient approach to detect genome regions under recent

selective pressure within breeds The objective of this study is to identify

common regions under positive recent selection shared by most represented

dairyandbeefItaliancattlebreeds

Results

A total of 3220 animals from Italian Holstein (2179) Italian Brown (775)

Simmental (493) Marchigiana (485) and Piedmontese (379) breeds were

genotyped with the Illumina BovineSNP50 BeadChip v1 in the frame of the

ItalianlivestockgenomicprojectsldquoSelMolrdquoandldquoProzoordquoAfterstandardcleaning

proceduresgenotypeswerephasedandcorehaplotypesidentifiedThedecayof

Linkage Disequilibrium (LD) for each core haplotype was assessedmeasuring

the extended haplotype homozygosity (EHH) To correct for the lack of good

estimatesof local recombinationrates therelativeEHH(rEHH)wascalculated

for each corehaplotypeGenomic regionsunder recentpositive selectionwere

identifiedasthosehighlyfrequentcorehaplotypeswithsignificantrEHHvalues

Significant independentcorehaplotypeswerealignedacrossbreeds to identify

signalsofrecentselectionsharedbydairyandsharedbybeefbreedsOverall82

and 87 common regions under selectionwere detected among dairy and beef

cattlebreedsrespectivelyBioinformaticsanalysisidentified244and232genes

mapping in these common genomic regions Pathway and network analysis of

significant genes revealed molecular functions biologically related to signal of

selectionsdetectedinmilkormeatproductiontypes

Conclusions

Resultssuggest that themultibreedapproach isamethodto identifyselection

signatures in cattle breeds under selection for the same production goal

12

allowing to better pinpoint the genomic regions of interest in dairy and beef

production

Background

The advent of genomic technologies and the consequent availability of single

nucleotidepolymorphism(SNP)datahaveenabledthestudyofselectionevents

on the cattle genome (Hayes et al 2008 Qanbari et al 2010) Such selection

signals arise from environmental or anthropogenic pressures and help to

understand the causes that led to breed formation These studies are usually

addressedinaldquotopdownrdquoapproach(ShahzadandLoor2012)fromgenotypeto

phenotypeinwhichgenomicdataisstatisticallyevaluatedtodetectdirectional

selectionThephenotypeneededtorunselectionsignaturesisintendedinavery

generalsenseabreedaproductionaptitudeorevenageographicareaoforigin

Thisapproachholdsthepotentialto investigatetraitsasadaptationtoextreme

climatesanddifferentfeedingandhusbandrysystemsresiliencetodiseasesetc

thatareveryexpensivedifficultandsometimesimpossibletostudywithclassic

GWAS (GenomeWide Association Study) approaches Therefore it is expected

thatgenomic informationretrievedinthesestudiesonlypartiallyoverlapsthat

identified by GWAS on specific traits Searching for these ldquosignatures of

selectionrdquo is of interest for understanding molecular mechanisms underlying

important biological processes and providing new insights to functional

biologicalinformation(egdiseaseandorimportanteconomictraits(Nielsenet

al 2007)) To date many methods have been implemented to scan these

selectionsignaturesatthegenomiclevelMethodsdifferwithrespecttoseveral

properties (Lenstra et al 2012) there are methods that perform analyses

withinbreedoracrossbreedsthatcomparealleleorhaplotypefrequenciesand

size that detect recent or ancient selection events either still segregating or

fixatedintopopulations(Lenstraetal2012OrsquoBrienetal2014Utsunomiyaet

al2013)Differentmethodshavedifferentsensitivityandrobustnesseg they

maybeinfluencedatadifferentextentbymarkerascertainmentbiasanduneven

distributionofrecombinationhotspotsalongthegenome

13

Extended haplotype homozygosity (EHH) is a method that has been used in

many animal species including cattle (Pan et al 2013 Qanbari et al 2010

Zhangetal2006)TheEHHmethodwasfirstdevelopedbySabetietal(2002)

for human population analysis The main objective of this methodology is to

identifylongrangehaplotypesinheritedwithoutrecombinationeventsMorein

detailunderaneutralevolutionmodelchangesinallelefrequenciesaredriven

onlybygeneticdriftInthisscenarioanewvariantwouldrequirealongtimeto

reach high frequency in the population and the Linkage Disequilibrium (LD)

surroundingitwoulddecayduetorecombinationevents(Kimura1984)Inthe

caseofpositiveselectionaquickriseinfrequencyofabeneficialmutationina

relatively short time will preserve the original haplotype structure as the

number of recombination events would be limited Therefore a signature of

positiveselectionisdefinedasaregionwithanalleleinlongrangeLDlocatedin

anuncommonlyhighfrequencyhaplotype

One of themost interesting features of thismethod is that it is able to detect

genomic regions that are under recent selection These recent selection

signatures can be therefore used to identify the genomic regions involved in

breedformationInadditionEHHmethoddoesnotrequireanydefinitionofan

ancestralallele(asneededintheintegratedhaplotypescoreiHS(Voightetal

2006) and is suited to be applied to SNP data because it is less sensitive to

ascertainmentbiasthanothermethods(Nielsenetal2007)HowevertheEHH

methodgeneratesa largenumberof falsepositiveandnegative resultsdue to

heterogeneous recombination rates along the genome (Qanbari et al 2010)

Moreover another drawback not specific of EHH but shared by all selection

signaturemethod is the lack of robust inferences able to discern true signals

fromthosegeneratedbychance(Kemperetal2014)

To partially account for these limitations Sabeti et al (2002) developed the

relative extended haplotype homozygosity (rEHH) including an empirical

methodologytoobtainsignificantsignalsTherEHHofaputativecorehaplotype

compares its original EHH valuewith that of other haplotypes at that specific

locus using all other haplotypes as control for local variation in the

recombination rate Therefore it identifies genomic regions carrying variants

14

under selection that are still segregating in the population analysed Although

thesemethodswere conceived for human populations theywere successfully

appliedto livestockspeciessuchaspig(Maetal2012)andcattle(Qanbariet

al2010)

After domestication about 10000 years ago in the fertile crescent taurine

cattle have colonized Europe and Africa and have been selected for different

human needs (Taberlet et al 2011) Particularly in the last centuries the

anthropic pressure led to the formation of hundreds of specialized breeds

adapted to different environmental conditions and linked to local tradition

constitutingagenepooldeservingattentionforconservation(AjmoneMarsanet

al2010)Someofthesebreedshavebeenselectedspecificallyfordairybeefor

bothproductiontypesfollowingstrongartificialselectionforthesetraits(Hayes

etal2009)Thepresentstudyaimsat identifyingsignalsof recentdirectional

selectionusingrEHHmethodindairyandbeefproductiontypesusingdatafrom

five Italian dairy beef and dual purpose cattle breeds We focused on the

significant core haplotypes scanned by rEHH that are shared among breeds of

thesameproductiontypeThenweusedabioinformaticsapproachto identify

positionalcandidategeneslyingwithinthegenomicregionsunderselectionand

investigatetheirbiologicalrole

Methods

SampleAnimalsandGenotyping

A total of 4311 bulls of five Italian dairy beef anddual purpose breedswere

genotyped with the Illumina BovineSNP50 BeadChip v1 (Illumina San Diego

CA)joiningthegenotypingeffortsoftwoItalianprojects(namelyldquoSelMolrdquoand

ldquoProzoordquo)Thedatasetincluded101replicatesand773siresonspairsusedfor

downstreamqualitycheckofdataproducedIndetailgenotypesof2954dairy

(2179 ItalianHolsteinand775 ItalianBrown)864beef (485Marchigianaand

379 Piedmontese) and 493 dual purpose (Italian Simmental) bulls were

availableDataqualitycontrol(QC)wasperformedintwostagesafirststageon

15

animals independently for each breed applying the same methods and

thresholdsandasecondstageappliedonmarkersacrossall theindividualsof

thedatasetThefirststageexcludedindividualswithunexpectedlyhigh(gt02)

Mendelian errors for fatherson couples andwith low call rates (lt95)The

secondstageexcludedi)SNPswithle25missingvaluesinthewholedataset

orcompletelymissing inonebreed ii)SNPswithminorallele frequencyle5

and iii) SNPs in sexual chromosomes and with unassigned chromosome or

physicalposition

EstimationofrEHH

HaplotypeswereobtainedusingfastPHASEsoftwareconsideringdefaultoptions

(ScheetandStephens2006)andwererunbreedwiseandchromosomewisein

eachbreedPedigreeinformationforallbullswereprovidedbytheirrespective

breederassociations andwereused to filteroutdirect relatives (in fatherson

pairsthesonwasmaintainedinthedatasetandthefatherdiscarded)andover

representedfamilies(amaximumof5randomlychosenindividualsperhalfsib

familywasallowed)Thefinaldatasetcontainingtheseldquolessrelatedrdquoanimalswill

behenceforthcalled ldquononredundantrdquodatasetThisnonredundantdatasetwas

used also to calculate within breed pairwise wholegenome Linkage

Disequilibrium(LD)Ther2statisticforallpairsofmarkerswasobtainedusing

PLINK software v107 (Purcell et al 2007)Thedecayof LDwas assessedby

averagingr2valuesupto1Mbdistance

To test if the population structure influenced the detection of rEHH we

replicated all analyses on the whole Italian Holstein dataset without any

population correction (ie ldquoredundantrdquo dataset) focusing on genes or gene

clustersthatarewellknowntobeunderrecentselectionincattle(ieldquocandidate

regionsrdquo) In particularwe focused on the casein polled gene cluster and coat

colourgenes(MC1RandKIT(Kemperetal2014Qanbarietal2010))

The EHH and rEHH calculations were performed using Sweep v11 software

(Sabetietal2002)Somedefaultprogramsettingshadtobemodifiedtoadapt

theseanalyses to thebovinegenomeas thesoftwarewasoriginallydeveloped

forhumangeneticanalysesSpecificallylocalrecombinationratesbetweenSNPs

16

wereapproximatedto1cMperMbEHHandrEHHcalculationswereperformed

breedwise and chromosomewise using an automatic core selection with

defaultoptions(ieconsideringlongestnonoverlappingcoresandlimitingcores

toatleast3andnomorethan20SNPs)asinQanbarietal(2010)AlthoughEHH

and rEHH values were obtained for all core haplotypes only those with

frequency ge 25 were retained for further analyses The (empirical) rEHH

significancethresholdwasobtainedbydividingtherEHHvaluesinto20binsof

5 frequency range each logtransforming withinbin values to achieve

normality Core haplotypes with pvalues lt 005 were considered significant

DetectionofcommonselectionsignatureswasobtainedcomparingrEHHvalues

acrossgenomicregionsacrossbreeds

Breedgroupingaccordingtoproductiontype

Toexamine thesignals common toeachof the twoproduction types (iedairy

andbeef) corehaplotypes thatsharedoneormoreSNP inat least twobreeds

from the same production type were selected The dual purpose Italian

Simmentalwas included inbothdairy (ItalianHolsteinand ItalianBrown)and

beef groups (Piedmontese and Marchigiana) since potentially it possesses

haplotypes selected for both production types All downstream analyses were

performedseparatelyforthedairyandthebeefproductiontypes

Detectionandannotationofcandidategenes

Genome annotation was performed using Biomart

(httpwwwbiomartorgindexhtml) a comprehensive source of gene

annotationformanyspeciesprovidedbyEnsembl(wwwensemblorg)Thelist

ofsharedhaplotypesregions(inbp)fordairyandbeefwasusedasinputfilein

theBiomartwebinterfaceThesetofgenesretrievedbyBiomartwasthenused

as input for theCanonicalpathwayandregulatorynetworkanalyses Ingenuity

PathwayAnalysis toolversion80(IPA IngenuityregSystems IncRedwoodCity

CA httpwwwingenuitycom) and a substantial examination of published

literaturewasusedtoexaminethefunctionalrelationshipsamongtheresulting

genes The IPA operates with a proprietary knowledge database providing

complementarypathwayanalysisforseveralspeciesincludingcattleIntheIPA

17

analyses the significance of each biological function identified was estimated

using Fisherrsquos exact test A Benjamini and Hochberg correction for multiple

testingwas calculated for function and canonical pathways For eachNetwork

IPAcomputedascore(pscore=log10(pvalue))fittingthesetstudiedgenesand

a list of biological functionspresent in the IPAknowledgedatabase The score

considered thenumberofgenes in thenetworkand thesizeof thenetwork to

estimatehowrelevantthisnetworkwasaccordingtothelistofgenesprovided

Thenthescorewasusedtorankthenetworks

Results

DatasetQC

Therepeatabilityobtainedfromthe101replicatespresentinthewholedataset

washigherthan998AfterthetwoQCstages105individualsand9730SNPs

were discarded Moreover after phasing 1292 individuals were removed to

reducethe largenumberofsibfamiliespresent intheredundantdatasetAfter

quality control procedures the final dataset had 44271 SNPs and 1132 514

393 410 and 364 individuals from Italian Holstein Italian Brown Italian

SimmentalMarchigianaandPiedmontesebullsrespectively(Table1)

18

Table1JNumberofanimalsgenotypedbeforeandafterQCTotalnumberofanimalgenotypedandrelativenumberofanimaldiscardedafterQCanalysis

Detectionofselectionsignatures

Sweepv11 softwaredetectedrEHHsignalswitha frequencyhigher than25overlapping the test candidate regions in both datasets corrected or not forpopulation structure In the redundant dataset we found only one significantrEHHsignal(caseincluster)whereasinthenonredundantdatasetweidentified5significantrEHHsignalsoverlappingallthetestcandidateregionsconsidered(caseinclusterpolledclusterMC1R)(Table2)

BreedTotal

Genotyped

ED15

misAN

ED1

REPL

ED1

MENDCleaned

HOL 2179 40 31 5 2093

BRW 775 6 16 4 749

SIM 493 6 6 2 479

MAR 485 37 38 410

PIE 379 5 10 364

19

Table2JComparisonofcandidateregionrEHHinnonJredundantandredundantdatasetrEHH overlapping candidate regions in both corrected (nonredundant) and not corrected

(redundant)datasetsforpopulation

Redundant NonRedundant

Candidate

geneBTA1

Closest

SNP(bp)CH2range CH2Frequency

rEHHJ

log(pval)CH2Frequency

rEHHJ

log(pval)

POLLED

Cluster1 1981154 18974181981154 H1079 093037 H1078 167174

MC1R 18 13657912 1331772014007505 H1069 120096 H1069 077167

KIT_BOVIN 6 72821175 7250492172821175 H1054 030030 H1054 033048

Casein

Cluster6 88427760 8835009888452835 H1047 094139 H1046 148133

Casein

Cluster6 88427760 8835009888452835 H2032 002003 H2033 002003

In total5526567847724388and4049corehaplotypeswitha frequency

higher than 25 were detected on Italian Holstein Italian Brown Italian

Simmental Marchigiana and Piedmontese breeds respectively A total of 838

866 740 692 and 613 core haplotypes were found significant outliers (see

Materials and Methods) in the aforementioned breeds Table 3 shows the

distribution of the total and significant core haplotypes per chromosome and

breed

20

Tableamp3amp(ampCoreamphaplotypesampdistributionampDistributionoftotalandsignificantcorehaplotypesperchromosomeandbreed

ampamp ItalianampampHolsteinamp

ItalianampampBrownamp

ItalianampSimmentalamp Marchigianaamp Piedmonteseamp

BTA1amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

Hampgtamp252amp

SignampH3amp

1amp 389 66 389 68 335 49 310 51 288 46

2amp 312 47 314 52 267 44 257 42 238 33

3amp 292 48 313 62 264 46 236 34 227 34

4amp 268 46 279 50 245 41 237 39 229 39

5amp 223 32 231 29 200 30 183 27 171 22

6amp 291 51 307 40 265 45 252 38 233 42

7amp 249 37 253 46 214 39 195 33 204 27

8amp 268 39 270 38 235 35 223 32 209 31

9amp 227 37 228 41 210 37 165 22 187 30

10amp 219 36 234 34 192 25 196 26 168 25

11amp 248 39 246 34 209 30 205 33 187 25

12amp 162 24 176 18 148 21 148 25 135 22

13amp 205 32 200 19 155 18 178 35 135 19

14amp 175 17 189 25 168 29 143 22 129 19

15amp 180 31 185 29 148 23 139 13 130 19

16amp 176 28 192 29 160 25 149 19 127 15

17amp 170 27 183 29 168 31 135 22 123 16

18amp 133 18 153 19 119 14 108 10 82 12

19amp 137 22 130 23 118 16 99 18 92 15

20amp 184 35 172 33 145 23 124 23 124 17

21amp 145 23 147 30 113 18 110 21 94 13

22amp 140 24 145 19 115 21 112 18 104 13

23amp 106 13 100 16 67 10 64 9 57 8

24amp 129 11 140 20 124 17 97 23 101 16

25amp 91 12 98 11 68 10 77 12 70 6

26amp 125 7 114 14 93 12 96 15 90 16

27amp 91 12 91 15 75 11 84 12 60 10

28amp 88 9 96 10 70 10 66 9 55 11

29amp 103 15 103 13 82 10 72 9 67 12

TOTamp 5526 838 5678 866 4772 740 4388 692 4049 613

1BostaurusAutosomes2Detectedcorehaplotypeswithafrequencyinthebreedhigherthan253Significantcorehaplotypesatplt005

Comparisonamptoamppreviouslyampreportedampdataamp

AnumberofstudieshavesearchedselectionsweepsinHolstein(Qanbarietal

2010) Brown (Qanbari et al 2011) and Simmental (Fan et al 2014) breeds

Since different methods are expected to identify signatures having different

characteristicsthecomparisonwithpreviousstudiesislimitedtothosestudies

using thesamemethodand thesamebreed(s)analysed inourstudyOnlyone

study analysed (German) HolsteinRFriesian cattle with rEHH (Qanbari et al

2010)Qanbarietal(2010)reportedcandidategenesandgeneclustersunder

Marco Milanesi
21

recent positive selection that we compared with our rEHH in dairy breeds

dataset(Table4)

Table4JCorehaplotypesincandidategenesfollowingQanbarietal2011

1BosTaurusChromosome2Corehaplotype

Thenumberofcorehaplotypesfoundinourstudyinthecandidateregionswas

lowerpreviouslyreportedWhenconsideringHolsteinsthetwomostsignificant

candidateregionsinbothstudiesmatch(caseinclusterandthesomatostatinSST

gene)althoughitisimpossibletodetermineifthehaplotypeunderselectionis

the same as this information was not provided in Qanbari et al (2010)

However other genes considered significant in Qanbari et al (with pvalues

ranging from 004 to 010) were not significant in our studyWhen using the

same loose significant threshold (pvalues le 010) ldquosignificantldquo signals were

foundalsoinItalianBrownfortheCaseinCluster(log10pvalue=109)andin

ItalianSimmentalfortheSSTgene(log10pvalue=126)

HOLSTEIN BROWN SIMMENTAL

Candidate

geneBTA1

Closest

SNP

(bp)

CH2rangeCH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)CH2range

CH2

Frequency

rEHH

Hlog(p)

DGAT1 14 444963236653443936

H1057 011443936763332

H1042 0130007

Casein

Cluster6 88391612

8835009888452835

H1046 1481338832601288452835

H2068 0801098835009888452835

H1044 037022

8835009888452835

H2033 018030

8835009888452835

H3034 022045

GH 19 49652377 GHR 20 33908597

SST 1 813769568128358581376961

H1036 200189 8131845181376961

H1042 126053

8128358581376961

H2029 00630084 8131845181376961

H2042 006027

IGFH1 5 71169823

ABCG2 6 37374911 3731702038256889

H1044 031027

3731702038256889

H1040 029031

Leptin 4 95715500

LPR 3 855692038549710885594551

H1047 0910728549710885794693

H1068 0800638549710885794693

H1063 002005

8549710885594551

H2041 027025

PITH1 1 35756434

22

Signaturessharedbydairyandbybeefbreeds

Significantcorehaplotypeswerealignedacrossbreedsto identifythoseshared

among dairy or beef production types Since breeds can be considered

independentsetsofobservationssharedsignaturesaremorelikelytorepresent

real effects rather than false positives A total of 123 and 142 significant core

haplotypesweresharedbetweenatleasttwodairyorbeefbreedsrespectively

(File S1) Only 82 and 87 of those significant core haplotypes contained genes

considered positional candidates under positive selection In total 22 and

17 of the genomewas under selection for dairy and beef production types

respectivelyTherEHHaveragelengthswere216932and190994bpfordairy

andbeefrespectively

Genesetannotationandnetworkpathwayanalysis

Atotalof244and232annotatedgenesfellwithintheselectedregionsfordairy

andbeefproductiontypesrespectively(Table5)

23

Table5JStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreedsStatisticsonsharedsignificantcorehaplotypesindairyandbeefbreeds

DairyCattle BeefCattle

BTA SelSig

n

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

SelSi

gn

Genes1 Sum

SelSignsize

(bp)

Mean

SelSign

size(bp)

1 7 11 1521608 138328 7 13 2263213 174093

2 5 9 2216685 246298 8 11 2658549 2416863 2 5 1619336 323867 7 21 3755847 1788504 7 21 5956755 283655 6 11 1761288 1601175 4 17 4506119 265066 5 11 1251127 1137396 2 4 1467738 366934 5 12 5149692 4291417 6 30 4842969 161432 3 28 11889191 4246148 0 0 0 0 4 12 3512215 2926859 4 7 1554863 222123 2 6 1106768 18446110 4 17 8839762 519986 2 3 573138 19104611 6 8 1841802 230225 1 1 100439 10043912 2 8 4244133 530517 1 1 536461 53646113 3 8 1933026 241628 2 8 985376 12317214 0 0 0 0 3 9 692397 7693315 5 9 1415405 157267 2 2 189094 9454716 3 6 1302506 217084 4 6 905719 15095317 3 4 471046 117762 2 9 1551010 17233418 0 0 0 0 3 23 2961718 12877019 3 23 5744143 249745 2 2 116938 5846920 1 2 148886 74443 2 4 627971 15699321 3 7 1930206 275744 1 5 1150760 23015222 3 5 620674 124135 3 10 2411751 24117523 0 0 0 0 1 1 68421 6842124 2 18 9166004 509222 2 4 803770 20094225 2 11 2016602 183327 1 3 329199 10973326 2 4 466134 116534 4 7 776080 11086927 1 1 631792 631792 2 3 1004717 33490628 0 0 0 0 1 1 93464 9346429 2 9 935209 103912 1 5 798300 159660TOT 82 244 65393403 216932 87 232 50024613 190994

1GenesidentifiedGenesintheregionssharedbyallthreedairybreeds(8genes)orallthreebeefbreeds(11genes)foramoredetailedanalysiswereselected(Fig1)

24

Figure1JGenomiclocationoftheselectionsignaturessharedamongstudiedbreedsa)GenesinensembletracksaredisplayedasaredboxcorehaplotypesandSNPsarecolouredinorange(MarchigianaMAR)inviolet(PiedmontesePIE)andpink(SimmentalSIM)b)Genesinensemble tracks are displayed as a red box core haplotypes and SNPs are coloured in blue(HolsteinHOL)ingreen(ItalianBrownBRW)andpink(SimmentalSIM)

All the genes identified were submitted to downstream pathwaynetwork

analyses(Table5Figure1FileS1)Interestinglygenesintheregionssharedby

allthreedairyorbeefbreedsweremanifestedinthemostlikelynetworksforthe

two production aptitudes The IPA analysis detected a total of 12 and 15

networksindairyandbeefbreedsrespectively

Indairybreeds network scores ranged from46 to2 (File S2) Eightnetworks

obtainedscoreshigherthan20andincludedmorethan14moleculeseachThey

wereassociatedtoi)geneexpressionRNAdamageandrepairproteinsynthesis

andposttranslationalmodificationii)cellassemblyorganizationfunctionand

maintenance iii) cell cycle proliferation and cancer iv) carbohydrate

metabolism v) gastrointestinal and neurological disease developmental and

endocrine system disorders A total of 23moleculeswere associatedwith the

highest ranked network (score = 46) In this the immunoglobulins mitogen

25

activatedproteinkinasesandNFkBcomplexesformacentralhublinkinggenes

involved in celltocell signaling and interaction hematological system

developmentandfunctionandimmunecelltraffickingfunctions(Figure2)

Figure2JThemostlikelyIPAnetworkindairycattlebreedsThefunctionsassociatedtothenetworkarecelltocellsignallingandinteractionhematologicalsystem development and function immune cell trafficking Nodes in red correspond to genesidentifiedincorehaplotypesoverlappinginallthethreebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Breast cancer antiNestrogen resistance gene 3 (BARC3) andpituitary glutaminyl

cyclase gene (QPCT) genes were common to all three dairy breeds and are

directlyconnectedwithmammaryglandmetabolisms(Ezuraetal2004GSun

et al 2012) The solute carrier family 2 member 5 (SLC2A5) gene facilitates

glucosefructose transport (Zhao et al 1998) and the zetaNchain (TCR)

26

associated protein kinase 70kDa (ZAP70) gene plays a critical role in Tcell

signaling (Bonnefont et al 2011) Calpain is another important complex that

togetherwithcalpainN3(CAPN3)mediatesepithelialcelldeathduringmammary

gland involution (Wilde et al 1997) Furthermore RAS guanyl nucleotide

releasingprotein(RASGRP1)activatestheErkMAPkinasecascaderegulatesT

cellsandBcellsdevelopmenthomeostasisanddifferentiationandisinvolvedin

the regulation of breast cancer cell (Madani et al 2004 Yasuda et al 2007

Wickramasingheetal2009)Genesinvolvedintheothernetworksarelistedin

FileS2

The scoreof thebeef cattlebreedsnetwork rangedbetween46and2 and the

numberofmoleculespresentineachnetworkrangedfrom23to1(FileS3)Asin

dairy breeds eight networks obtained a score higher than 20 and 13 ormore

molecules each These networkswere associated to i) carbohydrate lipid and

nucleic acidsmetabolism energyproduction and smallmoleculebiochemistry

ii) cardiovascular system development and function iii) cellular and tissue

developmentcellassemblyandorganizationfunctionandmaintenanceandcell

and organmorphology iv) cell cycle celltocell signaling and interaction v)

DNA replication recombination and repair and RNA posttranscriptional

modification vi) dermatological diseases and conditions infectious disease

organismal injury and abnormalities In the network with the highest score

(score=46)chondroitinsulfateproteoglycan4(CSPG4)andsnurportinN1(SNUPN)

genes were shared among all beef cattle breeds investigated CSPG4 gene is

relatedtomeattendernesswhileSNUPN isanimprintedgeneimportantinthe

embryodevelopmentandinvolvedinhumanmuscleatrophy(Narayananetal

2002) (Figure 3) All the genes included in this network were involved in

carbohydratemetabolismsmallmoleculesbiochemistryandcellcycleTheother

networkswere related to important signaling andmetabolic process essential

foranimalgrowth(FileS3)

27

Figure3JThemostlikelyIPAnetworkinbeefcattlebreedsThe functions associated to the network are carbohydrate metabolism small molecule

biochemistry and cell cycle Nodes in red correspond to genes identified in core haplotypes

overlapping in all the three breeds of each production type whereas those in green depict

overlappingcorehaplotypesinatleasttwoofthosebreeds

A total of 6 and 9 statistically significant Canonical Pathways (FDR le 005

log10(FDR) ge 13) were identified using IPA for dairy and beef breeds

respectively(Figure4)

28

Figure4JBarplotofstatisticallysignificantCanonicalPathwaysPvalues were corrected for multiple testing using BenjaminiHochberg method and arepresented in thegraphas log(pvalue)Thebar represents thepercentageof genes in a givenpathway that meet cutoff criteria within the total number of molecules that belong to thefunctiona)BarplotofstatisticallysignificantCanonicalPathways indairycattlebreedsb)BarplotofstatisticallysignificantCanonicalPathwaysinbeefcattlebreeds

Themost significant Canonical Pathway in dairywas for purinemetabolism (

log10(FDR) = 26) supporting the highly synthetic processes in the mammary

epithelium(Figure5)

29

Figure5JGenesdetectedunderrecentpositiveselectionindairycattleinvolvedinpurinemetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Inbeefproductiontheephrinreceptorsignal(log10(FDR)=27)wasthemost

significant Canonical Pathway (Figure 6) Among other functions ephrin is

knowntopromotemuscleprogenitorcellmigrationbeforemitoticactivation(Li

andJohnson2013)AlltheotherCanonicalPathwaysarereportedinFileS4

30

Figure6JGenesdetectedunderrecentpositiveselectioninbeefcattleinvolvedinEphrinmetabolismcanonicalpathwayNodes in red correspond to genes identified in core haplotypes overlapping in all the threebreedsofeachproductiontypewhereasthoseingreendepictoverlappingcorehaplotypesinatleasttwoofthosebreeds

Discussion

In this work more than 4000 genotyped bulls of five dairy beef and dualpurposeItalianbreedswerescreenedsearchingforselectionsignaturesindairyand beef production types A strict data quality controlwas applied to reducepossiblesourcesofbiassuchastechnicalgenotypingproblemsandpopulationstructurethatcouldhavelargeimpactonrEHHresultsInfactsincetheimpactofpopulationstructureonrEHHvaluesisstillunexploredwereplicatedpartofouranalyseswithandwithoutcloserelativesinanattempttostudytheeffectofpopulationstructureonthisselectionsignaturemethodInprinciplepopulationstratificationshouldleadtoanoverrepresentationofcertainhaplotypessimplyduetothepresenceoftightpedigreelinks(egsirespassinghalfoftheirgeneticmaterialtotheirsons)ratherthanactualselectionprocessesForthisreasonfor

31

the full analyses all siresons couples were removed after haplotype phasing(retaining only the younger animal) and halfsib familieswere restricted to amaximum of five randomly chosen individuals in an attempt to reduce overrepresentation This assessment was restricted to four regions known to beunderselectioninItalianHolsteinasthecaseinclusterthepolledgeneclusterMC1R and KIT Italian Holstein was selected for two reasons i) it is a highlystructuredbreedaccordingtoourdata(abouthalfofindividualsremovedwereclose relatives) ii) allowed us to compare our results with a previous studyAlthoughtheanalysesconductedonbothredundantandnonredundantdatasetswereable to identify rEHHsignals in these regions thenonredundantdatasetproduced five significant rEHH signals rather than just one in the redundantdataset(Table2)Thisresulthighlightstheconfoundingeffectofthepresenceofcloserelativesinthedatasetandconsequentlytheimprovedabilitytodetectedsignificant selection signature while correcting for population structureInterestingly our results only partially overlap those found by Quanbari et al(2010) These authors detected significant signals at the casein cluster andsomatostatinSSTaswedid but also at ahighernumberof candidate regionsThese inconsistencies may be the effect of different sires included in theanalyses of the different dataset size and to the close relative trimmingprocedure we adopted to decrease the effect of population structure andconsequent bias (Table 4) However poor correlation across investigations isoftenobservedalsointhedeeplyinvestigatedhumanspeciesThisisduetotheuse of different within and between breed statistics that identify selectionsignatureshavingdifferent characteristics (ancientrecent segregatingfixatedunder directionalbalancing selection) to the likely high rate of falsepositivesnegativesresultsandalsotothedifferentconsiderationofpopulationstructureandbackgroundselection(Enardetal2014)The number of total and significant core haplotypes identified by the SweepsoftwarewasthehighestinItalianBrownandthelowestinPiedmontesebreedSinceEHHisheavilybasedonpopulationLDtheaverageLDleveloverdistancewascalculatedineachbreed(Figure7)

32

Figure7JMultiJbreedaveragelinkagedisequilibriumagainstphysicaldistance(inkb)Marchigiana Piedmontese and Italian Simmental breeds present a lower persistence of LD allargerdistancethanItalianHolsteinandItalianBrownbreedsThisanalysishighlightedageneralpositivecorrelationbetweenthe levelofLDover distance and the number of total and significant core haplotypes foundHoweverconsideringthatrEHHisarelativemeasure it ismorelikelythatthehighernumberofsignificantcorehaplotypesidentifiedindairybreedswasduemostlytoahigherselectivepressureindairycomparedtobeefbreedsSignificance tests used in detecting selection signatures should measure theprobability of a statistic to be anoutlier compared to its expecteddistributionunder a neutral model However no reliable neutral model has so far beendevelopedincattlebecauseofthecomplexityofthedemographichistoryofthisspecies (Kemper et al 2014) As a consequence empiric rather than modelbasedsignificancetestaregenerallyusedinthedetectionofselectionsignaturesAccordinglyweconsideredasoutliersvaluesas thosesimply falling in the5plusvarianttailoftherEHHdistributionWekeptthewithinbreedsignificancethresholdloosenoncorrectingitformultipletestingbutconsideredonlysignals

33

eitherexceedingthisthresholdbyatleasttwoordersofmagnitudeorsharedby

twoormoreindependentbreeds

Parallel comparison of results from independent analyses of different breeds

allowed the achievement of a double objective i) to identify putative regions

under(recent)selectioninbreedsbelongingtotwoproductiontypeswhichwas

the main objective of this study and ii) reducing false positives since multi

breed analyses served as internal control Since the rEHH method does not

considerphenotypicinformationasignificantsignalmightarisebecausei)the

core haplotype is actually under a selective pressure ii) the result is a false

positive ie causedby chance population structureor anyotherdriving force

However even considering an unrealistic scenario with no false positives a

proportionofthetotalamountofsignalswouldactuallybeselectionsignatures

due to a selectionpressurenot ondairy or beef traits This becausedairy and

beefbreedsinvestigatedareonlyafewandshareanumberoftraitsnotdirectly

related to dairybeef production Even considering this limitation to our

knowledge this is the first dairy and beef multibreed study applying this

strategytoreducefalsepositivesatthecostofpossiblelossofinformationdue

tohighfalsenegativeratesSignificantsignalsharedbydairyandbybeefbreeds

wereused fordownstreamnetworkanalysesonpositional candidategenes to

search forabiological justification towhatwasobtainedstudying thegenomic

information Only the top network for dairy and the top network for beef

productionarediscussedindetail

Dairybreeds

ThemostsignificantnetworkdetectedindairybreedsisassociatedwithCellTo

CellSignalingandInteractionHematologicalSystemDevelopmentandFunction

andImmuneCellTraffickingfunctions(Figure2)ImmunoglobulinandNFkBact

aspivotalgeneswithinthisnetworkInterestinglyBARC3andQPCTgeneswere

sharedamongallthethreedairybreedsBothgeneshavenotyetbeenstudiedin

cattlealthoughinhumanstudiesthesegenesarewellknowntobelinkedtothe

mammaryglandmetobolismandthecalciumregulationTheformerisinvolved

in integrinmediated cell adhesions and signaling which is required for

mammary gland development and functions (G Sun et al 2012) The latter is

34

associatedwithlowradialbodymineraldensity(BMD)inadultwomen(Ezuraet

al2004)AnotherimportantplayeronthisnetworkisSLC2A5genealsoknown

asGLUT5 SLC2A5 gene acts as fructose transporter in the intestine and has a

significantroleinenergybalanceofdairycows(Burantetal1992)Findingthis

gene in adairy cattle specificnetwork seems surprising since there shouldbe

little need for transporting glucose andor fructose in the ruminant intestinal

tractassimplecarbohydratesaredegradedintovolatilefattyacids(VFA)inthe

rumen (Iqbal et al 2009) However it is often the case that large amount of

starchbypasstherumenincowsfedwithdietsrichincerealgrains(Zhaoetal

1998)Thisbypassedstarchneedstobedigestedinthesmallintestinetoavoid

highlevelsofglucoseconcentrationinthelargeintestine

WedetectedcalpaincomplexandcalpainN3(CAPN3)genesunderpositiverecent

selection in the dairy group as reported also by Utsunomiya et al (2013)

AltoughCalpainisknowntobeinvolvedinpostmortemmeattendernizationitis

also related to dairy metabolism as the muscle breakdown promoted by the

calpain gene provides an energy source for milk production especially at the

beginningoflactation(Kuhlaetal2011)InadditionasreportedbyWildeetal

(1997) calpainNcalpastatin system is related to programmed cell death of

alveolar secretory epithelial cells during lactation Zap70 gene is a cytoplasmic

proteintyrosinekinaseItisrelatedtotheimmunesystemasacomponentofthe

Tcell receptor playing a central role in Tcell responses (Wang et al 2010)

Bonnefontet al (2011) found thatZap70 genewasup regulated (FoldChange

(FC)=82FDR=277E12)onmilksomaticcellsofsheepinfectedbySaureus

andSepidermidisshowinganassociationwithmastitisresistance

Purinemetabolismwasthemostsignificantcanonicalpathwayindairybreeds

In a study conducted in human mammary epthelium Maningat et al (2008)

conductedageneexpressionanalysisonbreastmilkfatglobulesidentifyingthe

PurinemetabolismasthemostsignificantpathwaySynthesisandbreakdownof

purine is essential in the metabolism of the tissues of many organisms

particularlyofthemammaryglandduringlactation

Another interesting canonical pathway was endothelin signaling (Figure 5)

endothelin functions as vasocontrictor and is secreted by endothelial cells

35

(Nussdorferetal1999)Acostaetal(1999)reportedthatincattleendothelinsareinvolvedinthefollicularproductionofprostaglandinsandtheregulationofsteroidogenesis in the mature follicle In a recent study Puglisi et al (2013)confirmed the implication of endothelin in particular EDNRA and endothelinconvertin enzyme 1 in reproductive disorder in bovine They classified theEDNRAaspotentialbiomarkerforfertilityincow

Beefbreeds

Inbeefcattlebreedsthefocaladhesionkinase(FAK)togetherwithintegrinandERK12genesplayacentralroleashubsinthemostsignificantnetwork(Figure3)Theyareinvolvedincellularadhesionandcellmotilityprocessesparticularlyduringskeletalmyogenesis(Nguyenetal2014)Moreovertheoverexpressionof FAK can determine a reduced apoptosis and an increase risk of metastaticcancers(MehlenandPuisieux2006)Thesehubsarenotdirectlyinvolvedinourstudybut sharemany interactors thatare located in region found tobeunderselction such as CSPG4 RB1CC1 MGAT3 CIRPB CSPG4 gene belongs tochondroitin sulfate proteoglycans (CSPGs) family CSPGs are proteoglycansconsistingofaproteincoreandachondroitinsulfatesidechainTheyareknownto be structural components of a variety of tissues includingmuscle andplaykey roles inneuraldevelopmentandglial scar formationTheyare involved incellular processes such as cell adhesion cell growth receptor binding cellmigrationandinteractionswithotherextracellularmatrixconstituents

Manystudiesreporttheroleofproteoglycansinmeattextureofseveralmusclesfromthebovinespecies(Nishimuraetal1996)Dubostetal(2013)highlightadirect involvement of proteoglycans with respect to cooked meat juicinessRB1CC1geneplaysacrucialroleinmusculardifferentiationanditsactivationisa essential for myogenic differentiation (Watanabe et al 2005 p 1)Monoacylglycerol acyltransferase (MGAT3) gene cause the synthesis ofdiacylglycerol (DAG)using2monoacylglyceroland fattyacyl coenzymeAThisenzymaticreactionis fundamental fortheabsorptionofdietaryfat inthesmallintestineSunetal(2012p3)reportedinastudyonfiveChinesecattlebreedsthatMGAT3gene isassociatedwithgrowthtraitsCIRPBgenemaybepartofacompensatorymechanism inmuscles undergoing atrophy It preservesmuscle

36

tissuemassduringcoldshockresponsesaginganddisuse(DupontVersteegden

et al 2008) SNUPN is an imprinted gene and is expressed monoallelically

dependingonitsparentaloriginSNUPNplayimportantrolesinembryosurvival

andpostnatal growth regulation (Smithet al 2012Wanget al 2012)Ephrin

receptorsignalingwasthetopcanonicalpathwayproducebyIPAandhasalsoan

interestingbiologicalinterpretationformeatproduction(Figure6)Indeedthis

pathaway is important for the muscle tissue growth and regeneration

participating in correct positioning and formation of the neural muscular

junction(LiandJohnson2013)

Conclusions

In this study we analysed the selection signatures at genomewide level in 5

Italian cattle breeds Then we used a multibreed approach to identify the

genomicregionssharedamongcattlebreedsselectedfordairyorbeefpurpose

Thisapproachincreasedthepowertopinpointregionsofthegenomethatplay

important roles in economically relevant traits in both dairy and beef cattle

Moreover network and pathways analyses were used to display a detailed

functional characterization of genes mapping in the regions detected under

recentpositiveselection

Particularly in dairy cattle genes under directional selection are related to

feeding adaptation (increasing level of starch in the diet) mammary gland

metabolismandresistencetomastitisThebiologicalfeaturesunderselectionin

beef cattle breeds are involved in animal growth meat texture and juiceness

These results have a logical biological explanation However the frequent

approach of interpreting selection signatures on the basis of the function of

geneslocatednearbythepeaksignalofthestatisticusedispresentlychallenged

Novelinformationinhumanssuggeststhatmostselectedvariationisnotlocated

within genes and coding regions but in regulatory sites identified by the

ENCODEproject(Enardetal2014)Thesemaycontroltheexpressionofentire

genomicregionsorofgeneslocatedatarelavantdistancefromtheselectedsite

makingbiologicalinterpretationmorecomplex

37

InanycasefurtherstudiesusingdenserSNPchipsorwholegenomesequencing

producinginformationnotsubjectedtoascertainmentbias(Qanbarietal2014)

may increase the resolution of our analysis and togetherwith new knowledge

arisingonthecontrolofgeneexpressionvalidateorcorrectourresults

Acknowledgements

ANAFI (Italian Association of Holstein Friesian Breeders) ANAPRI (Italian

Association Simmental Breeders) ANABORAPI (National Association of

PiedmonteseBreeders)ANABIC(AssociationofItalianBeefCattleBreeders)and

ANARB (Italian Brown Cattle Breeders Association) are acknowledged for

providing the biological samples and the necessary information for this work

The Authors are also grateful to SelMol project (MiPAAF Ministero delle

PoliticheAgricoleAlimentarieFortestali)andbytheProZooproject(Regione

LombardiandashDGAgricoltura Fondazione Cariplo Fondazione Banca Popolare di

Lodi)forfunding

Thefundershadnoroleinstudydesigndatacollectionandanalysisdecisionto

publishorpreparationofthemanuscript

References

Acosta TJ Berisha B Ozawa T Sato K Schams D Miyamoto A 1999

Evidence for a Local EndothelinAngiotensinAtrial Natriuretic Peptide

SysteminBovineMature Follicles In Vitro Effects on SteroidHormones

and Prostaglandin Secretion Biol Reprod 61 1419ndash1425

doi101095biolreprod6161419

AjmoneMarsan P Garcia JF Lenstra JA 2010 On the origin of cattle how

aurochs became cattle and colonized theworld Evol Anthropol Issues

NewsRev19148ndash157

38

BonnefontCToufeerMCaubetCFoulonETascaCAurelMRBergonier

D Boullier S RobertGranieacute C Foucras G 2011 Transcriptomic

analysisofmilksomaticcells inmastitisresistantandsusceptiblesheep

upon challenge with Staphylococcus epidermidis and Staphylococcus

aureusBMCGenomics12208

BurantCFTakedaJBrotLarocheEBellGIDavidsonNO1992Fructose

transporter inhumanspermatozoaandsmall intestine isGLUT5 JBiol

Chem26714523ndash14526

DubostAMicolDPicardBLethiasCAnduezaDBauchartDListratA

2013Structuralandbiochemicalcharacteristicsofbovineintramuscular

connective tissue and beef quality Meat Sci 95 555ndash561

doi101016jmeatsci201305040

DupontVersteegden EE Nagarajan R Beggs ML Bearden ED Simpson

PMPetersonCA2008IdentificationofcoldshockproteinRBM3asa

possible regulator of skeletal muscle size through expression profiling

AJP Regul Integr Comp Physiol 295 R1263ndashR1273

doi101152ajpregu904552008

Enard D Messer PW Petrov DA 2014 Genomewide signals of positive

selection in human evolution Genome Res gr164822113

doi101101gr164822113

EzuraYKajitaMIshidaRYoshidaSYoshidaHSuzukiTHosoiTInoue

SShirakiMOrimoHEmiM2004Associationofmultiplenucleotide

variationsinthepituitaryglutaminylcyclasegene(QPCT)withlowradial

BMDinadultwomenJBoneMinRes191296ndash301

Fan H Wu Y Qi X Zhang J Li J Gao X Zhang L Li J Gao H 2014

Genomewide detection of selective signatures in Simmental cattle J

ApplGenet1ndash9doi101007s1335301402006

HayesBJChamberlainAJMaceachernSSavinKMcPartlanHMacLeodI

SethuramanLGoddardME2009Agenomemapofdivergentartificial

selectionbetweenBostaurusdairycattleandBostaurusbeefcattleAnim

Genet40176ndash184doi101111j13652052200801815x

Hayes BJ Lien S Nilsen H Olsen HG Berg P Maceachern S Potter S

Meuwissen THE 2008 The origin of selection signatures on bovine

39

chromosome 6 Anim Genet 39 105ndash111 doi101111j1365

2052200701683x

IqbalSZebeliQMazzolariABertoniGDunnSMYangWZAmetajBN

2009 Feeding barley grain steeped in lactic acid modulates rumen

fermentation patterns and increases milk fat content in dairy cows J

DairySci926023ndash6032doi103168jds20092380

KemperKESaxtonSJBolormaaSHayesBJGoddardME2014Selection

for complex traits leaves littleornoclassic signaturesof selectionBMC

Genomics15246doi1011861471216415246

KimuraM1984Theneutraltheoryofmolecularevolution

KuhlaBNuumlrnbergGAlbrechtDGoumlrsSHammonHMMetgesCC2011InvolvementofSkeletalMuscleProteinGlycogenandFatMetabolismin

the Adaptation on Early Lactation of Dairy Cows J Proteome Res 10

4252ndash4262doi101021pr200425h

Lenstra JAGroeneveld LF EdingHKantanen JWilliams JL Taberlet P

NicolazziELSolknerJSimianerHCianiEGarciaJFBrufordMW

AjmoneMarsan P Weigend S 2012 Molecular tools and analytical

approaches for the characterization of farm animal genetic diversity

AnimGenet43483ndash502

Li J Johnson SE 2013 EphrinA5 promotes bovine muscle progenitor cell

migration before mitotic activation J Anim Sci 91 1086ndash1093

doi102527jas20125728

Ma YL Zhang Q Ding XD 2012 [Detecting selection signatures on X

chromosome in pig through high density SNPs] Yi Chuan Hered

ZhongguoYiChuanXueHuiBianJi341251ndash1260

Madani S Hichami A CharkaouiMalki M Khan NA 2004 Diacylglycerols

Containing Omega 3 and Omega 6 Fatty Acids Bind to RasGRP and

Modulate MAP Kinase Activation J Biol Chem 279 1176ndash1183

doi101074jbcM306252200

Maningat PD Sen P Rijnkels M Sunehag AL Hadsell DL Bray M

Haymond MW 2008 Gene expression in the human mammary

epitheliumduring lactation themilk fat globule transcriptome Physiol

Genomics3712ndash22doi101152physiolgenomics903412008

40

Mehlen P Puisieux A 2006Metastasis a question of life or deathNat Rev

Cancer6449ndash458doi101038nrc1886

NarayananUOspinaJKFreyMRHebertMDMateraAG2002SMNthe

Spinal Muscular Atrophy Protein Forms a PreImport Snrnp Complex

withSnurportin1andImportin HumMolGenet111785ndash1795

Nguyen N Yi JS Park H Lee JS Ko YG 2014 Mitsugumin 53 (MG53)

LigaseUbiquitinatesFocalAdhesionKinaseduringSkeletalMyogenesisJ

BiolChem2893209ndash3216doi101074jbcM113525154

NielsenRHellmannIHubiszMBustamanteCClarkAG2007Recentand

ongoing selection in the human genome Nat Rev Genet 8 857ndash868

doi101038nrg2187

NishimuraTHattoriATakahashiK1996Relationshipbetweendegradation

of proteoglycans andweakening of the intramuscular connective tissue

duringpostmortemageingofbeefMeatSci42251ndash260

Nussdorfer GG Rossi GPMalendowicz LKMazzocchi G 1999Autocrine

ParacrineEndothelinSysteminthePhysiologyandPathologyofSteroid

SecretingTissuesPharmacolRev51403ndash438

OrsquoBrienAMPUtsunomiyaYTMeacuteszaacuterosGBickhartDM LiuGE Tassell

CPV Sonstegard TS Silva MVD Garcia JF Soumllkner J 2014

Assessing signatures of selection through variation in linkage

disequilibrium between taurine and indicine cattle Genet Sel Evol 46

19doi101186129796864619

Pan D Zhang S Jiang J Jiang L Zhang Q Liu J 2013 GenomeWide

DetectionofSelectiveSignatureinChineseHolsteinPLoSONE8e60440

doi101371journalpone0060440

PuglisiRCambuliCCapoferriRGianninoLLukajADuchiRLazzariG

Galli C Feligini M Galli A Bongioni G 2013 Differential gene

expressionincumulusoocytecomplexescollectedbyovumpickupfrom

repeat breeder and normally fertile Holstein Friesian heifers Anim

ReprodSci14126ndash33doi101016janireprosci201307003

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender D

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKa

41

tool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

Qanbari S Gianola D Hayes B Schenkel F Miller S Moore S Thaller GSimianer H 2011 Application of site and haplotypefrequency basedapproachesfordetectingselectionsignaturesincattleBMCGenomics12318doi1011861471216412318

Qanbari S Pausch H Jansen S Somel M Strom TM Fries R Nielsen RSimianer H 2014 Classic Selective Sweeps Revealed by MassiveSequencinginCattlePLoSGenet10doi101371journalpgen1004148

Qanbari S Pimentel ECG Tetens J Thaller G Lichtner P Sharifi ARSimianerH2010AgenomewidescanforsignaturesofrecentselectioninHolsteincattleAnimGenetdoi101111j13652052200902016x

Sabeti PC Reich DE Higgins JM Levine HZ Richter DJ Schaffner SFGabriel SB Platko JV Patterson NJ McDonald GJ Ackerman HCCampbellSJAltshulerDCooperRKwiatkowskiDWardRLanderES2002DetectingrecentpositiveselectioninthehumangenomefromhaplotypestructureNature419832ndash7

ScheetPStephensM2006Afastandflexiblestatisticalmodelforlargescalepopulation genotype data applications to inferring missing genotypesandhaplotypicphaseAmJHumGenet78629ndash44

Shahzad K Loor JJ 2012 Application of topdown and bottomup systemsapproaches in ruminantphysiologyandmetabolismCurrGenomics13379

SmithLSuzuki JGoffAFilionFTherrien JMurphyBKohanGhadrHLefebvreRBrisvilleABuczinskiSFecteauGPerecinFMeirellesF2012DevelopmentalandEpigeneticAnomaliesinClonedCattleReprodDomestAnim47107ndash114doi101111j14390531201202063x

Sun G Cheng SYS Chen M Lim CJ Pallen CJ 2012 Protein TyrosinePhosphatase Phosphotyrosyl789 Binds BCAR3 To Position Cas forActivationatIntegrinMediatedFocalAdhesionsMolCellBiol323776ndash3789doi101128MCB0021412

Sun J Zhang C Lan X Lei C ChenH 2012 Exploring polymorphisms andassociationsofthebovineMOGAT3genewithgrowthtraitsGenomeNatl

42

Res Counc Can Geacutenome Cons Natl Rech Can 55 56ndash62doi101139g11077

TaberletPCoissacEPansu J PompanonF 2011Conservationgeneticsofcattle sheep and goats C R Biol On the trail of domesticationsmigrationsandinvasionsinagricultureDominiqueJobGeorgesPelletierJeanClaudePernollet334247ndash254doi101016jcrvi201012007

Utsunomiya YT Perez OrsquoBrien AM Sonstegard TS Van Tassell CP doCarmo AS Meszaros G Solkner J Garcia JF 2013 Detecting lociunder recent positive selection in dairy and beef cattle by combiningdifferentgenomewidescanmethodsPLoSOne8e64280

Voight BF Kudaravalli S Wen X Pritchard JK 2006 A Map of RecentPositive Selection in the Human Genome PLoS Biol 4 e72doi101371journalpbio0040072

WangHKadlecekTAAuYeungBBGoodfellowHESHsuLYFreedmanTSWeissA2010ZAP70AnEssentialKinaseinTcellSignalingColdSpringHarbPerspectBiol2doi101101cshperspecta002279

WangMZhangXKangLJiangCJiangY2012MolecularcharacterizationofporcineNECD SNRPNandUBE3Agenesand imprinting status in theskeletal muscle of neonate pigs Mol Biol Rep 39 9415ndash9422doi101007s1103301218066

WatanabeRChanoTInoueHIsonoTKoiwaiOOkabeH2005Rb1cc1iscritical for myoblast differentiation through Rb1 regulation VirchowsArchIntJPathol447643ndash648doi101007s0042800411831

WickramasingheNSManavalanTTDoughertySMRiggsKALiYKlingeCM 2009 Estradiol downregulates miR21 expression and increasesmiR21targetgeneexpressioninMCF7breastcancercellsNucleicAcidsRes372584ndash2595doi101093nargkp117

WildeCJAddeyCVLiPFernigDG1997Programmedcelldeathinbovinemammary tissue during lactation and involution Exp Physiol 82 943ndash953

YasudaSStevensRLTeradaTTakedaMHashimotoTFukaeJHoritaTKataoka H Atsumi T Koike T 2007 Defective Expression of Ras

43

Guanyl NucleotideReleasing Protein 1 in a Subset of Patients with

SystemicLupusErythematosusJImmunol1794890ndash4900

ZhangCBaileyDKAwadTLiuGXingGCaoMValmeekamVRetiefJ

Matsuzaki H Taub M Seielstad M Kennedy GC 2006 A whole

genome longrange haplotype (WGLRH) test for detecting imprints of

positiveselectioninhumanpopulationsBioinformatics222122ndash8

ZhaoFQOkineEKCheesemanCIShiraziBeecheySPKennellyJJ1998

Glucose transporter gene expression in lactating bovine gastrointestinal

tractJAnimSci762921ndash2929

44

Supplementaryfiles

YoucanfindChapterʹsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter02)orscantheQRcode

SupplementaryfileS1ndashSignificantcorehaplotypesandgenessharedin

dairyandbeef

Significantcorehaplotypes(pvaluele005haplotypefrequencyge025)shared

indairyandbeefcattlebreedsandrelativegenesintersectingwiththemThefile

isdividedinto4tablesThefirst2sheets(sheet1and2)shownthesignificant

corehaplotypesfordairyandbeefThelast2sheets(sheet3and4)showngenes

intersectingwiththesignificantcorehaplotypesfordairyandbeef

SupplementaryfileS2ndashNetworksrankingindairybreeds

Networksrankingindairybreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

SupplementaryfileS3ndashNetworksrankinginbeefbreeds

Networksrankinginbeefbreedswithrelativemoleculessymbolslistscore

valuenumberofmoleculesbelongingtothedataset(focusmolecules)andtop

functionforeachnetwork

45

SupplementaryfileS4ndashCanonicalpathwaysrankingindairyorbeefcattle

breeds

Canonicalpathwaysrankingindairy(sheet1)orbeef(sheet2)cattlebreedswithrelativemoleculessymbolslistratioandndashlog10ofthepvaluesforeachcanonicalpathway

46

CHAPTER3Searchingnewsignalsforproductivetraitsthroughgenebasedassociationanalysisin

threeItaliancattlebreedsSCapomaccio1MMilanesi1etal1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItalyTheseauthorscontributedequallytothiswork

____________________SubmittedtoAnimalGenetics

Abstract

Genome wide association studies (GWAS) have been widely applied to

disentanglethegeneticbasisofcomplextraits IncattlebreedsclassicalGWAS

approach with medium density markers panels are far to be conclusive

especiallyforcomplextraitsThisisduetotheintrinsiclimitationsofGWASand

to the choices that are made to step from the associationrsquos signals to the

functionalunits

Here we applied a geneHbased association strategy to prioritize genotypeH

phenotype associations found with classical approaches in three Italian dairy

cattlebreedswithdifferentsamplesizes(ItalianHolsteinn=2058ItalianBrown

n=745ItalianSimmentaln=477)formilkproductionandqualitytraits

WhileclassicalsinglemarkerregressionfailedtorevealgenomeHwidesignificant

genotypeHphenotype association except for Italian Holstein the gene based

approach found specific genes in each breed that are associated with milk

physiologyandmammaryglanddevelopment

Since no established method is yet implemented to step from variation to

functional units our strategy may contribute to reveal new genes that play

significant roles in complex traits like thosehere investigated condensing low

signalsofassociationinageneHcentricfashionapproach

Background

Modern cattle breeds have nomore than 200 years of history (Taberlet et al

2008) a time spanwhere selection reproductive isolationand the consequent

geneticdriftshapedallelefrequenciesandlinkagedisequilibriumthroughoutthe

genome This is particularly true for industrial dairy breeds where the joint

application of reproductive technologies and modern genetic evaluation

methodshave imposedahigh selectionpressureona limitednumberof traits

(Taberletetal2008)involvedinmilkquantityandquality

Indeed the milk production industry has been historically ready to acquire

technicalimprovementsfromdifferentfieldsstatisticsbiologyandengineering

50

ForexampleBLUPbasedstrategiesandhighHdensitygenotyping throughSNPHchipsarenowroutinelyincludedinbullevaluationshavingsignificantimpactonselectionefficiencyandpopulationstructureThese techniques have dramatically improved milk yield and compositionwithout knowing the biological basis of the traits under selection To datedespite many efforts and many QTL mapped only a few genes have beeneffectivelyassociatedwithmilkyieldandcomposition(Grisartetal2002Blottetal2003CohenHZinderetal2005)In thiscontextGenomeWideAssociationStudies(GWAS) is themostcommonmethodfordissectingthebiologythatunderliescomplextraits(McCarthyetal2008)TheunderlyingideaofGWASissimplefindmarkerHtraitassociationsexploitingthe linkage disequilibrium (LD) that exists between the causative mutation ndashwhichweignorendashandoneormoreanalyzedmarkerswhichwewellknowInthiswayonecanpinpointgenomicregionscarryingcausalvariantsforanytrait

Although a number of GWAS have been conducted in cattle productive andmorphological traits (some examples (Jiang et al 2010 Cole et al 2011Meredith et al 2012)) and found several genomic regions associated to thesetraits none focused their attention directly on genes involved inmilk biologyMoreoveroneof themajordrawbacksofGWASisrelatedtothescarceresultsportability across populations due to a variety of confounding factorspopulationstructuredifferentialLD levelsbreedspecific selection targetsandeven SNPs ascertainment bias (Clark et al 2005) For instance significantassociationsonfatpercentagewereidentifiedinmanyregionsoftheBostaurusgenome in different breeds Except for genes of major effect there is littleagreement on which regions and H more importantly H genes are actuallyinvolvedinthecontrolofthetraitinvestigatedThisissomehownotsurprisingone favorable allele may segregate in one breed and be fixated in a differentbreed the same allele segregates in both breeds but allelesmay differ or thesegregates in both breeds but the genetic background masks the effect ofsegregation

51

AnotherlimitationinGWASisthestringentsignificancethresholdoftenapplied

tocorrectformultipletestingAsaresulta largeproportionofgeneswithlow

effectaredisregardedwithconsequentoverestimationof thehigheffectgenes

variance(Xu2003)Thisbiasisparticularlyworryingwheninvestigatingtraits

controlledbya largenumberofgeneswithsmalleffectand inhighLDasmilk

yield(GIANOLAETAL2013)

To overcome this kind of limitation different strategies have been invented

usingprior knowledge Interesting ldquocandidatepathwayrdquo approacheshavebeen

conducted(Ravenetal2013)testingforassociationbetweenaphenotypeand

genes inpathwaysthatare likelytobe involvedinthecontrolofthetraitThis

approach reduces thenumberofmarkers tobe tested increases the statistical

poweroftheexperimentrestrictingtheanalysistoexistingknowledge

Other methods based on statistical manipulations or particular experimental

designs have been developed to prioritize or validate genotypeHphenotype

associations For example geneHbased association strategies while restricting

GWA study only to genes and neighboring genomic regionsmay identify new

candidategenesdeservingpostHGWASanalysis(Akulaetal2011Cantoretal

2010 Liu et al 2010) Moreover it is true that classical GWAS downstream

bioinformatics analyses concentrate their efforts in findingmeaningful results

usinggenesasfinaltarget

The aim of ourworkwas to find new candidate genes and novel information

associated to complex traits as milk production (yiled) and quality (fat and

protein percentage) using a geneHcentric genome wide association in three

differentdairybreedsItalianHolsteinItalianBrownandItalianSimmental

Methods

BiologicalSamples

Bulls from the three most represented dairy breeds in Italy (Italian Holstein

ItalianBrownandItalianSimmental)wereselectedfromArtificialInsemination

(AI) bulls progeny tested in Italy until 2010 and forwhich biological samples

52

wereavailableItalianHolsteinbullswerechosenaccordingtofollowingcriteria

i) having the national Selection Index namely production functionality type

(PFT)withreliabilitygt75andii)havingthelowestpossiblerelationshipwith

theothersiresinthesetThiswasachievedbymaximizingthevariabilitywithin

the group On the contrary considering the low number of available Italian

BrownandItalianSimmentalbullstheywereallincludedintheanalysesAfter

thisprocedureatotalof2139ItalianHolstein775ItalianBrownand486Italian

Simmental samples (including quality control replicates)were genotypedwith

theBovineSNP50BeadChipv1(IlluminaSanDiegoCA)

Traitsconsideredintheanalysis

Italian Holstein (ANAFI) Italian Brown (ANARB) and Italian Simmental

(ANAPRI)breedersassociationsprovidedderegressedproofs(DRP)ordaughter

yielddeviations (DYD)hereafter referredasphenotype forall traitsanalyzed

Traitsanalyzedweremilkyieldfatpercentageandproteinpercentage

Genotypingandqualitycontrol

Quality controls (QC) were run independently for each breed using the same

pipelineandthresholds IndetailQCexcluded i)thereplicatesamplewiththe

highernumberofmissinggenotypesii)sampleswithunexpectedlyhigh(ge100)

mendelianerrors(forfatherHsonpairs)iii)sampleswithmorethan5missing

genotypes iv) SNPs with ge 25 missing data v) SNPs with minor allele

frequency(MAF)le5vi)SNPswithmendelianerrorsge25andvii)SNPsout

ofHardyHWeinbergequilibriumtestedbyFisherexacttest(ple0001Bonferroni

corrected)

Analysisofpopulationstructure

MultiDimensionScaling(MDS)wasperformedusingPLINK107(Purcelletal

2007)andresultsplottedwithRscriptsThisanalysisshowninsupplementary

material (FigureS1) exploredpopulationsubstructureandverified thegenetic

homogeneityofthedatasetpriortofurtheranalyses

53

PairwiseLDwascalculatedusingPLINK107(Purcelletal2007)LDdecayand

LD structure on specific regions were calculated with in house R scripts and

snpplotterRpackage(LunaandNicodemus2007)

GWAS

GenomeHwide association analysis was made with the GenABEL package in R

(Team 2014) using GRAMMARHCG (Genome wide Association using Mixed

Model and Regression H Genomic Control) approach (Amin et al 2007

Aulchenkoetal2007)

Basingonpriorknowledge(Minozzietal2013) in ItalianHolsteintheDGAT1

regionwas also treated as fixed effect inGenABEL analysis of the three traits

ResultsusingthismodelarereferredtoasldquoHolsteinDGATrdquo

Manhattan andquantileHquantileplots for each traitwereproduced to allowa

thoroughevaluationoftheGWASresultsAllcalculatedpHvaluesfromtheGWAS

analysiswere retained for the downstream prioritizationwith the geneHbased

association

Workingongenomecoordinates

Coordinatesof themarkerswere liftedover fromBTA40(Elsiketal2009)to

UMD31assembly(Ziminetal2009)usingSnpChimpv1(Nicolazzietal2014)

andgenecoordinateswereretrievedusingBioMartEnsembl

QTLcoordinatesweredownloadedfromtheAnimalQTLdb(Huetal2013) in

bedformat

Operations on genomic intervalswereperformedusing thebedtools suite ver

2162(QuinlanandHall2010)

GeneBasedAssociationanalysis

Toreducetheantagonismbetweensignificancethresholdandfalsepositiverate

of a classical GWA study (Gianola et al 2013) and to facilitate selection

signature comparison across the three breeds investigated we used a geneH

54

centricstatisticthatcollapsesinasinglevalueallthepHvaluesfromSNPsrelated

toagenecorrectingforLDinformation

Our approachderivesdirectly from the softwareVEGAS (VersatileGeneHBased

TestforGenomeHwideAssociationStudies)(Liuetal2010)adaptedtopursue

meaningfulresultsinanyspeciesbeyondhumans

Briefly the geneHwise test statistic is the sum of the n upper tail chiHsquared

valuesofaSNPsubset withonedegreeof freedom(wheren is thenumberof

SNPmappinginthegivengene)WithperfectlinkageequilibriumthegeneHwise

pHvaluewouldbetheoneHsidepHvalueofachiHsquaredistributionwithndegrees

of freedom Otherwise more likely the distribution of the pHvalues has to be

evaluated(withsimulations)andweighted(withLDvalues)(Liuetal2010)

The VEGAS speciesHfree procedure was implemented in a UNIX environment

usingBashandRlanguagesusingEnsemblannotationasinputfilesPLINKdata

typeandpHvaluesforeachSNPfromthepreviouslycitedGRAMMARprocedure

WemappedSNPsonBostaurusgenes(UMD31)andtheirboundaries(100000

bpupHanddownHstream)toaccountforregulatoryregionsOnlygeneswithat

least 2 SNPs mapped were retained Significant entries from the geneHbased

associationweremappedinthecattleQTLdatabasewithbedtools

ResultsandDiscussion

In this study a gene based association strategy was applied to identify new

genomicregionsboostingGWASanalysisresultsformilkproductionandquality

fromthreeItaliandairybreeds

Complextraitslikethosehereevaluatedareaffectedbyafewmajorgeneswith

large effects andmany otherswithmoderate to low effects the latter are not

easily identifiedby genomewide scans inmodern cattle breedsduemainly to

samplesizelimitationwhilesignalfromtheformersarelostingenomescansH

withsomeexceptionsHduerapidfixations

InthisscenariowithmediumdensitySNPpanelsliketheoneusedinthisstudy

associationbasedonsingleSNPregressionorotherldquoclassicalrdquomethodscouldbe

inconclusiveintermsofresults

55

AftertheQCprocess7452058and477samplesand3575938535and38574SNPs were retained for Italian Brown Italian Holstein and Italian Simmentalbulls respectively (Table S1) HolsteinDGAT analysis retained 38534markersoneSNPlessthantheotherHolsteinpanelDescriptive statistics on genotypes for each breed (histogram and barplot forheterozygosis and call rate per individual Figure S7 and multi dimensionalscalingFigureS1)revealedabsenceofconfoundingeffectsWiththesedatasetsatthenominalgenomeHwidepHvaluesignificancethresholdof 005 (ple125 x 10H6 after Bonferroni correction (Burton et al 2007Dudbridge and Gusnanto 2008)) significant associations were found only inItalianHolsteinbetweenmilkyieldandfatpercentageand26SNPsmappingintheproximalregionofBTA14nearbyDGAT1(Table1)

56

Table1PHvaluesoftheSNPsovercomingthegenomewidesignificancethresholdintheHolsteinbreedformilkyieldandfatpercentage(notaccountingforDGAT1effect)SNPsaresortedbypositionsintheUMD31assembly

SNPID BTA Position(Bp) POvalue(Fat) POValue(Milk)

Hapmap30383OBTCO005848 14 1489496 112EH12 361EH12

ARSOBFGLONGSO57820 14 1651311 247EH46 161EH16

ARSOBFGLONGSO34135 14 1675278 857EH17

ARSOBFGLONGSO94706 14 1696470 665EH16

ARSOBFGLONGSO4939 14 1801116 534EH49 106EH20

Hapmap52798Oss46526455 14 1923292 915EH21

UAOIFASAO6878 14 2002873 382EH24

ARSOBFGLONGSO107379 14 2054457 118EH33 664EH15

ARSOBFGLONGSO18365 14 2117455 250EH16

Hapmap30922OBTCO002021 14 2138926 439EH13

Hapmap25384OBTCO001997 14 2217163 390EH10 697EH09

Hapmap24715OBTCO001973 14 2239085 108EH09 369EH09

BTAO35941OnoOrs 14 2276443 402EH13

Hapmap30374OBTCO002159 14 2468020 528EH12

Hapmap30086OBTCO002066 14 2524432 798EH11

ARSOBFGLONGSO74378 14 3640788 161EH08

ARSOBFGLOBACO1511 14 3765019 375EH09

UAOIFASAO9288 14 3956956 811EH09

ARSOBFGLONGSO100480 14 4364952 133EH09

UAOIFASAO5306 14 4468478 458EH08

ThisislikelyduetotheeffectoftheK232ADGAT1mutationthatissegregatingin

thisbreed(Fontanesietal2014)NootherSNPwassignificantlyassociatedto

anyothertraitinthe3breedsinvestigated

SomeSNPswereunderasuggestivesignificancethresholdofple5x10H5Details

on these markers are in supplementary material (Table S3) but they are not

discussedherePlotsdescribingatglancetheassociationresultsrevealingalow

signaltonoiseratioareavailableintheonlinesupplementarymaterial(Figure

S2S3S4andS5respectively for ItalianBrown ItalianHolsteinHolsteinDGAT

andItalianSimmental)

57

Asperthegenecentricanalysiswemappedatotalof192371989919894and

19844 genes on 24616 available on Ensembl (release version 73) in Italian

BrownItalianHolsteinHolsteinDGATandItalianSimmentalrespectivelyGenes

with a geneHwisepHvalue at least 10H5were considered significantly associated

withthestudiedtraitThesignificanceappearedtobeadequatetothehighrate

ofredundancyof theannotationfeaturesandto figureoutthetopassociations

with a reasonable level of confidence Results on these associations per breed

andpertraitaresummarizedinTable2Table3andinsupplementarymaterials

(TableS2andTableS4)ThemajorityofsignificantgenesfoundwiththegeneH

centricanalysismapped inpreviouslycharacterizedQTL thatareassociated to

milkproductionorqualityThesearesummarizedintheTableS5

Table2Numberofsignificantlyassociatedgenesforeachbreedineachtraitinthegenebasedassociationtest

Fat Milkyield Protein

Brown 3 1

Holstein 92 80 1

HolsteinDGAT 1 4 1

Simmental 1 13

58

Tableamp3FeaturessignificantlyassociatedperbreedtraitHolsteinanalysisnotaccountingforDGAT1isshow

ninTableS3

Breedamp

Traitamp

EnsemblampIDamp

GeneWiseampPValueamp

SNPsInGeneamp

BestSNPamp

BestSNP_PValueamp

BTAamp

Startamp(bp)amp

Endamp(bp)amp

GeneampNam

eamp

BROWNamp

FAT

Nosignificantassociationfound

MILK

ENSBTAG00000019237

500EG05

4Hapmap53770Gss46526325

204EG04

23

47333403

47348212

TXNDC5

ENSBTAG00000043918

500EG05

4Hapmap53770Gss46526325

204EG04

23

47337650

47337787

5S_rRNA

ENSBTAG00000046839

500EG05

4Hapmap53770Gss46526325

204EG04

23

47328131

47352621

UncharacterizedProtein

PROTEIN

ENSBTAG00000001260

700EG05

3ARSGBFGLGNGSG116222

441EG03

18

52120387

52124418

PINLYP

amp

HOLSTEINDGATamp

FAT

ENSBTAG00000000369

100EG05

3Hapmap53294Grs29016908

333EG05

594673755

94749093

EPS8

MILK

ENSBTAG00000026896

130EG05

5ARSGBFGLGNGSG104949

385EG04

23

51549721

51553563

FOXF2

ENSBTAG00000005260

400EG05

5Hapmap33288GBTCG033751

120EG03

638120578

38127577

SPP1

ENSBTAG00000006579

500EG05

4ARSGBFGLGBACG2207

632EG05

15

54418559

54459500

P4HA3

ENSBTAG00000007013

700EG05

6UAGIFASAG7665

396EG05

19

5221507

5283308

TOM1L1

PROTEIN

ENSBTAG00000006638

600EG05

3ARSGBFGLGNGSG104774

176EG04

18

56523439

56530499

BCL2L12

amp

SIMMENTHALamp

FAT

ENSBTAG00000037649

900EG05

6Hapmap30001GBTAG142716

529EG05

4120427915

120494453

VIPR2

MILK

ENSBTAG00000017538

100EG05

6ARSGBFGLGBACG13093

587EG06

12

85630428

85630912

Pseudogene

ENSBTAG00000000856

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1766767

1769754

FBXL6

ENSBTAG00000000857

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1763994

1766621

GPR172B

ENSBTAG00000026356

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1795351

1804562

DGAT1

ENSBTAG00000035158

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1770660

1772329

TMEM

249

ENSBTAG00000046208

200EG05

2ARSGBFGLGNGSG4939

413EG06

14

1782901

1788087

SCRT1

ENSBTAG00000009811

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1825982

1844587

UncharacterizedProtein

ENSBTAG00000009816

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1838135

1840081

UncharacterizedProtein

ENSBTAG00000014458

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1844664

1894424

MROH1

ENSBTAG00000020751

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1806081

1825793

HSF1

ENSBTAG00000044406

400EG05

2ARSGBFGLGNGSG4939

413EG06

14

1883849

1883971

SCARNA15

ENSBTAG00000046889

400EG05

5ARSGBFGLGBACG13093

587EG06

12

85610279

85610762

UncharacterizedProtein

ENSBTAG00000020117

800EG05

3BTAG78494GnoGrs

188EG04

721189879

21196263

ZBTB7A

PROTEIN

Nosignificantassociationfound

amp

59

1 ItalianHolstein

The$ large$ number$ of$ significant$ signals$ was$ observed$ in$ the$ Italian$ Holstein$

especially$ for$ milk$ yield$ and$ fat$ percentage$ As$ shown$ in$ Table$ 2$ the$ strong$

effect$ of$ DGAT1$ in$ Italian$ Holsteins$ observed$ with$ individual$ SNPs$ was$

confirmed$by$the$geneBcentric$approach$In$this$region$almost$the$totality$of$the$

geneBwise$pBvalues$under$the$chosen$threshold$appeared$to$be$influenced$by$this$

major$ gene$ and$ by$ the$ high$ level$ of$ linkage$ disequilibrium$ that$ characterizes$

cosmopolitan$bovine$breeds$(see$Figure$S6)$Within$this$pool$of$genes$it$is$worth$

to$mention$PLEC$ (plectin)$ a$ cell$ surface$ receptor$ gene$ in$ the$RNASE5$ pathway$

(Angiogenin$ encoded$ by$ angiogenin$ ribonuclease$ RNase$ A$ family$ 5)$ Proteins$

encoded$ by$ these$ genes$ that$ are$ present$ in$ milk$ are$ involved$ in$ protein$

synthesis$ and$ act$ as$ growth$ factors$ in$ vitro$Moreover$RNASE5$ seems$ to$ have$

other$ functions$ such$ as$ regulation$ of$ stress$ response$ and$ apoptosis$ and$ even$

angiogenesis$ through$ which$ it$ may$ also$ contribute$ to$ the$ regulate$ milk$

production$(Raven$et$al$2013)$

The$ exclusion$ of$DGAT1$ effect$ (HolsteinDGAT$ results)$ highlighted$ that$ all$ the$

significant$ effects$ on$ fat$ percentage$ found$ for$ genes$ located$ within$

approximately$25$Mb$up$or$downstream$from$the$excluded$SNP$(ARS5BFGL5NGS5

4939)$were$ redundant$ signals$ of$DGAT1$ (Table3$ and$Table$ S2$ and$ S3)$ This$ is$

also$observed$for$milk$yield$due$to$the$genetic$correlations$between$fat$and$milk$$

Also$ PLEC$ was$ not$ significant$ in$ our$ analyses$ when$ taking$ into$ account$ the$

DGAT1$effect$contradicting$previous$findings$in$Holstein$that$report$association$

of$PLEC$gene$with$production$traits$either$accounting$or$not$for$the$effect$of$the$

diglyceride$ acyltransferase$ (Raven$ et$ al$ 2013)$ A$ possible$ explanation$ for$ the$

different$results$between$these$studies$is$a$difference$between$genetic$structures$

of$the$populations$analyzed$To$reduce$false$positive$signals$we$only$evaluated$

the$genes$based$on$HolsteinDGAT$results$for$the$Italian$Holstein$$

11 Milkyield

We$found$genes$associated$with$this$trait$in$Holstein$in$particular$SPP1$P4HA3=

FOXF2$and$TOM1L1$genes$

Among$ these$ the$ osteopontin$ SPP1$ (secreted$ phosphoprotein$ 1)$ is$ directly$

related$to$milk$production$(de$Mello$et$al$2012)$ It$ is$ located$ in$BTA6$nearby$

60

$

ABCG2$ and$ there$ is$ evidence$ that$ polymorphisms$ in$ this$ region$ are$ strongly$

correlated$with$high$production$ levels$of$milk$ in$various$breeds$(CohenBZinder$

et$ al$ 2005)$Osteopontin$SPP1$ is$ a$ secreted$glycophosphoprotein$ that$plays$ a$

crucial$ role$ in$ mammary$ gland$ development$ having$ an$ essential$ role$ also$ in$

mammary$gland$differentiation$and$branching$of$the$mammary$epithelial$ductal$

system$ This$ gene$has$ also$ been$ implicated$ in$many$ types$ of$ cancer$ including$

breast$cancer$ for$ the$potentiality$of$proliferation$and$alveologenesis$ (Hubbard$

et$ al$ 2013)$ It$ is$worth$mentioning$ that$ the$most$ significant$ SNPs$within$ this$

gene$was$Hapmap332885BTC5033751=with=a$pBvalue$of$00012$ largely$over$ the$

genomeBwide$significance$ threshold$This$signal$would$be$ ignored$ in$ the$single$

SNP$approach$

Another$gene$revealed$by$the$geneBbased$association$is$P4HA3$encoding$for$the$

prolyl$4Bhydroxylase$α$polypeptide$III$involved$in$the$correct$folding$of$nascent$

procollagen$ chains$ It$ is$ known$ that$ stromalBepithelial$ interactions$ are$ critical$

for$mammary$gland$development$and$defective$deposition$or$reorganization$of$

extracellular$ matrix$ can$ lead$ to$ incorrect$ mammary$ epithelial$ morphogenesis$

(Gillette$ et$ al$ 2013)$ highlighting$ the$ role$ of$ this$ gene$ in$ the$ healthy$ breast$

tissue=

FOXF2=gene$ (Forkhead$Box$F2)$ is$ a$member$of$ the$ forkhead$box$ transcription$

factors$ known$ to$ be$ involved$ in$ tissue$ development$ as$ part$ of$ the$ ldquoHedgehog$

signaling$pathwayrdquo$This$key$cascade$is$essential$for$the$correct$segmentation$of$

tissues$during$embryonic$development$but$has$been$recently$demonstrated$that$

is$active$in$adult$stem$cells$proliferation$in$mammary$tissue$(Liu$et$al$2006)$In$

particular$ FOXF2$ plays$ a$ key$ role$ in$ extracellular$ matrix$ synthesis$ and$

epithelialBmesenchymal$ interactions$ and$ could$ therefore$ affect$ the$ proper$

development$of$the$mammary$stroma$In$addition$low$levels$of$transcription$of$

this$gene$are$correlated$ to$early$onset$metastasis$ in$breast$cancer$ (Kong$et$al$

2013)$ underlining$ a$ specific$ role$ in$ the$ mammary$ gland$

Lastly$ TOM1L1$ (Target$ Of$ Myb1$ ChickenBLike$ 1)$ is$ another$ gene$ potentially$

involved$ with$ the$ milk$ physiology$ as$ is$ known$ its$ implication$ with$ the$ early$

endosome$ sorting$ coupled$ with$ EGFR= (epidermal$ growth$ factor$ receptor)$

another$ important$player$ in$ lactation$(Fowler$et$al$1995)$TOM1L1$belongs$to$

61

$

the$ VHS$ domain$ containing$ proteins$ that$ are$ involved$ in$ the$ postBGolgi$

trafficking$and$signaling$(Wang$et$al$2010)$

12 Fatpercentage

In$ Italian$ Holstein$ breed$ we$ found$ a$ significant$ association$ with$ or$ without$

DGAT1$ effect$ correction$ to$ the$EPS8$ gene$ located$on$BTA5$at$around$95$Mbp$

This$gene$encodes$for$a$substrate$of$the$already$cited$EGFR$and$is$involved$in$the$

fatty$acid$metabolism$(Fazioli$et$al$1993)$A$recent$study$on$Holstein$Friesian$

(Wang$et$al$2012)$found$strong$association$with$a$variant$located$in$the$second$

intron$of$the$epidermal$growth$factor$receptor$pathway$substrate$enforcing$the$

hypothesis$of$a$true$involvement$of$this$gene$on$fat$percentage$

13 Proteinpercentage

The$BCL2L12$gene$(BclB2Blike$protein$12)$associated$with$protein$percentage$in$

Italian$Holstein$breed$represents$another$interesting$result$The$product$of$this$

gene= is$ an$ important$ oncoprotein$ in$ two$ fundamental$ cell$ cycle$ pathways$

namely$ p53$ and$ cytoplasmic$ caspase$ signaling$ BCL2L12= resides$ in$ both$

cytoplasm$and$nucleus$inhibiting$caspases$3$and$7$and$forming$a$complex$with$

p53$thereby$ inhibiting$p53Bdirected$transcriptomic$changes$upon$DNA$damage$

(Stegh$ and$DePinho$ 2011)$ The$members$ of$ BCL2$ family$ are$ considered$ antiB

apoptotic$genes$and$it$is$obvious$that$dysfunctional$programmed$cell$death$plays$

an$ important$ role$ in$ the$ pathogenesis$ and$ in$ the$ progression$ of$ cancer$Many$

members$of$this$family$were$found$to$be$modulated$in$various$malignancies$and$

BCL2L12= in$ particular$ was$ expressed$ in$ mammary$ gland$ and$ proposed$ as$

prognostic$ cancer$ biomarker$ (Thomadaki$ et$ al$ 2007)$ In$ dairy$ cows$ the$

members$of$BCL2$family$are$involved$in$the$key$events$ in$the$mammary$tissue$

development$ and$ remodeling$ during$ lactation$ and$ involution$ (Stefanon$ et$ al$

2002$p$200)$Members$of$BCL2$family$regulate$the$turnover$of$cell$population$

in$the$mammary$gland$during$lactation$and$its$overexpression$was$found$to$be$

linearly$related$to$milk$production$in$Holstein$cow$(Colitti$et$al$2004)$

$

62

$

2 ItalianSimmental

In$ the$ Italian$ Simmental$ breed$ we$ found$ one$ association$ on$ BTA4$ for$ fat$

percentage$(VIPR2)$and$two$associations$ for$milk$yield$one$ in$BTA7$(ZBTB7A)$

and$ the$ other$ in$ BTA14$ related$ to$ the$ DGAT1$ No$ association$ was$ found$ for$

protein$percentage$

21 Fatpercentage

Among$the$genes$found$significantly$associated$with$production$traits$in$Italian$

Simmental$VIPR2=a$ gene$ encoding$ a$ receptor$ for$ a$ small$ vasoactive$ intestinal$

neuropeptide$ was$ the$ single$ signal$ in$ BTA4$ for$ fat$ percentage$ Vasoactive$

intestinal$ peptide$ (VIP)$ is$ synthesized$ in$ the$pituitary$ gland$ and$ among$other$

functions$ stimulates$ prolactin$ secretion$ and$ is$ involved$ in$ proliferation$

survival$and$differentiation$in$human$breast$cancer$cells$(Valdehita$et$al$2010)$$

An$interesting$novel$finding$is$the$effect$of$immunosuppression$(through$direct$

effect$of$IL56)$on$prolactin$cells$and$its$ability$to$regulate$prolactin$by$acting$on$

pituitary$ VIP$ through$ VIP$ receptors$ (Blanco$ et$ al$ 2013)$ These$ evidences$

suggest$ that$VIPR2$ could$ be$ an$ important$ candidate$ for$ further$ investigations$

through$post$GWAS$approaches$such$as$reBsequencing$or$fine$mapping$

22 Milkyield

Surprisingly$the$DGAT1$gene$was$significantly$associated$with$milk$yield$but$not$

with$fat$percentage$in$Italian$Simmental$breed$Indeed$the$most$significant$SNP$

for$the$milk$yield$trait$in$this$breed$was$the$ARS5BFGL5NGS54939$with$a$pBvalue$

of$ 413EB06$ The$ importance$ of$ this$ gene$ in$ the$ lactation$ is$ largely$ known$

(Grisart$et$al$2002)$$

To$ investigate$DGAT1$ signal$ in$ Simmental$ we$ analyzed$ the$ fine$ LD$ structure$

genomeBwide$ and$ in$ BTA14$ proximal$ region$ in$ all$ the$ three$ breeds$ revealing$

different$patterns$

Observed$LD$decay$on$ the$ three$breeds$genomes$revealed$similar$behavior$ for$

Italian$Holstein$and$Italian$Brown$with$r2$gt$010$$until$ around$ 250$ Kbp$ while$

Italian$Simmental$showed$lower$LD$Distances$higher$that$400$Kbp$revealed$half$

level$of$LD$in$Simmental$(Figure$S6)$

63

$

In$DGAT1$region$LD$behavior$is$very$similar$between$Italian$Holstein$and$Italian$

Brown$ (Figure$ S8)$ while$ association$ values$ are$ really$ different$ because$ ARS5

BFGL5NGS54939=is$fixed$in$Italian$Brown$and$purged$by$QC$process$Interestingly$

association$ patterns$ become$ similar$ when$ DGAT1$ effect$ is$ removed$ from$ the$

Holstein$ panel$ (HolsteinDGAT$ Figure$ S8)$ In$ Simmental$ we$ found$ a$ different$

situation$ very$ low$ LD$ level$ and$ significant$ association$ levels$ with$ ARS5BFGL5

NGS54939$with$ the$milk$ trait$This$ is$probably$due$ to$ the$ low$ frequency$of$ the$

p232K$ allele$ in$ Italian$ Simmental$ (Scotti$ et$ al$ 2010)$ and$ to$ the$ different$

selection$strategies$applied$in$these$breeds$

The$ relatively$ high$ pBvalue$ of$ this$ SNP$ is$ responsible$ for$ the$ statistical$

significance$of$many$of$the$13$genes$found$significant$in$this$breed$for$this$trait$

as$ shown$ in$ Table$ 3$ therefore$ reducing$ the$ gene$ set$ to$ three$ locations$ two$

overlapping$features$indicating$a$pseudogene$in$BTA12$and$a$zinc$finger$protein$

ZBTB7A$in$BTA7$DGAT1$role$and$function$will$be$not$discussed$here$as$already$

done$elsewhere$(Grisart$et$al$2002)$

ZBTB7A=is$part$of$zinc$finger$and$BTB$proteins$(Broad$complex$Tramtrack$and$

Bricabrac)$ transcription$ factors$ with$ emerging$ importance$ for$ their$

involvement$ in$ oncogenesis$ and$ cell$ differentiation$by$ repressing$key$ genes$of$

cell$cycle$regulation$as$retinoblastoma$(Rb)$and=p53=(ldquoJeon$et$al$B$2008$B$ProtoB

oncogene$ FBIB1$ (PokemonZBTB7A)$ Represses$ Trpdfrdquo$ nd)= ZBTB7A$ over$

expression$ is$ shown$ in$primary$ and$ recurrent$ breast$ cancer$ tissues$ leading$ to$

disease$progression$and$poor$survival$(Zu$et$al$2011)$Link$between$this$gene$

polymorphism$and$milk$production$might$be$in$the$proper$cellular$development$

of$ the$ mammary$ gland$ This$ transcription$ factor$ is$ also$ known$ as$ a$ master$

regulator$ of$ B$ and$ TBcell$ lineage$ (Lee$ and$ Maeda$ 2012)$ enforcing$ the$ link$

between$ milk$ production$ and$ immune$ protection$ as$ already$ outlined$ by$ of$

Lemay$ and$ colleagues$ with$ proteomic$ studies$ where$ the$ ldquoactivation$ of$ the$

innate$ immune$systemrdquo$resulted$as$one$of$ the$most$represented$gene$ontology$

categories$(Lemay$et$al$2009)$

$

64

$

3 ItalianBrown

In$ the$ Italian$Brown$breed$we$ found$ one$ association$ on$BTA23$ (TXNDC5)$ for$

milk$yield$and$one$in$BTA18$for$protein$percentage$(PINLYP)$The$three$features$

listed$in$Table$3$for$milk$yield$in$Italian$Brown$were$collapsed$due$to$overlap$$

31 Milkyield

Thioredoxin$ domainBcontaining$ protein$ 5$ (TXNDC5)$ belongs$ to$ thioredoxin$

family$small$ubiquitous$proteins$containing$disulphide$domain$affecting$protein$

folding$with$chaperone$activity$preventing$protein$misBfolding$within$the$lumen$

of$ the$endoplasmic$reticulum$This$multifunctional$organelle$have$a$key$role$ in$

several$ processes$ including$ lipid$ catabolism$ and$ protein$ synthesis$ (RamiacuterezB

Torres$ et$ al$ 2012)$ Because$ of$ its$ disulphide$ bonds$ thioredoxin$ facilitate$ the$

reduction$of$other$proteins$by$cysteine$thiol$disulfide$exchange$This$mechanism$

is$essential$for$detossification$

During$ lactation$ indeed$ an$ enhanced$ detoxification$ level$ is$ functional$ to$

preserve$ milk$ quality$ against$ increased$ metabolic$ activities$ Higher$ metabolic$

levels$ lead$ to$ free$ radicals$ production$ making$ necessary$ the$ maintenance$ of$

redox$ homeostasis$ that$ include$ among$ others$ the$ reduction$ and$ oxidation$ of$

peroxiredoxin$and$thioredoxin$(Lu$and$Holmgren$2014)$

32 Proteinpercentage

For$ protein$ percentage$ in$ Italian$ Brown$ there$ is$ only$ one$ associated$ gene$

PINLYP=is$encoding$a$phospholipase$A2$inhibitor$but$this$protein$product$is$not$

yet$well$studied$ $Many$types$of$phospholipase$A2$were$characterized$and$ it$ is$

known$ that$ they$ are$ mediator$ of$ inflammation$ atherosclerosis$ and$ cancer$ in$

mammals$and$that$plays$an$important$role$in$the$innate$immune$response$(Qiu$

and$Lai$2013)$It$may$be$reasonable$to$imagine$a$role$for$PINLYP$in$the$lactation$

in$light$of$previously$discussed$results$that$link$milk$to$innate$immune$response$

(Lemay$et$al$2009)$

$

Although$ limited$ by$ the$ size$ of$ the$ study$ and$ the$ number$ of$ the$ included$

markers$these$results$represent$both$a$confirmation$and$a$step$forward$towards$

the$comprehension$of$the$biological$bases$of$milk$yield$and$quality$$

65

$

New$ genes$ potentially$ involved$ with$ production$ traits$ in$ dairy$ cows$ were$

tracked$when$GWAS$signals$resulted$too$weak$to$be$considered$in$a$classical$SNP$

based$mapping$with$the$ldquoclosest$gene$to$peakrdquo$or$the$windowBbased$strategy$

It$ is$ worth$ mentioning$ that$ none$ of$ the$ analyzed$ SNPs$ (not$ considering$ the$

Italian$Holstein$with$ARS5BFGL5NGS54939= included$ in$ the$ analysis)$were$ above$

the$ genomeBwide$ significance$ threshold$ in$ single$ SNP$ analysis$ meaning$ that$

important$information$is$often$lost$in$such$studies$$

In$addition$ to$ the$newly$ identified$ loci$we$observed$a$confirmation$of$existing$

associations$ with$ genes$ related$ to$ milk$ production$ not$ only$ dissecting$ gene$

specific$ functions$ but$ from$ mapping$ these$ genes$ in$ previously$ characterized$

regions$(Table$S5)$This$ is$not$trivial$when$using$new$data$analysis$techniques$

especially$ in$ high$ noise$ to$ signal$ ratio$ datasets$ like$ those$ here$ presented$We$

believe$ that$ this$ approach$will$ benefit$ of$data$ from$denser$marker$panels$ (eg$

BovineHD)$ to$ exploit$ local$ linkage$ disequilibrium$ and$ increase$ the$ number$ of$

mapped$genes$with$more$than$two$SNPs$

Results$ obtained$ with$ the$ gene$ centric$ approach$ although$ having$ intrinsic$

limitations$are$not$ influenced$by$a$number$of$subjective$choices$often$made$in$

GWAS$ For$ instance$when$ ldquowindow$mappingrdquo$ is$ used$ to$decrease$ the$noise$ to$

signal$ ratio$ eg$ due$ to$ nearby$ markers$ having$ different$ allele$ frequencies$

windows$size$ is$ to$be$decided$This$can$be$based$on$equal$number$of$markers$

equal$number$of$base$pairs$or$equal$map$distance$in$cM$per$window$The$three$

choices$ do$ not$ coincide$ because$ markers$ are$ not$ equally$ spread$ along$ the$

chromosomes$ and$ recombination$ differs$ in$ different$ genomic$ regions$ In$

addition$ windows$ can$ be$ chosen$ nonBoverlapping$ or$ with$ different$ degree$ of$

overlap$Finally$when$stepping$ from$significant$SNPs$or$windows$ to$candidate$

genes$the$size$of$the$flanking$region$from$which$picking$up$candidate$genes$is$to$

be$decided$as$well$Sometimes$the$gene$closer$to$the$signal$peak$is$selected$ in$

other$cases$all$genes$within$certain$boundaries$are$included$in$further$analyses$

All$ these$ choices$ may$ generate$ misleading$ results$ For$ example$ not$ truly$

associated$genes$may$be$retained$for$further$analyses$disregarding$the$correct$

ones$ In$addition$ a$ large$number$of$ false$positives$can$be$ included$ to$be$ later$

interpreted$with$ the$help$of$network$and$pathway$analysis$As$ a$ consequence$

these$decisions$have$a$direct$impact$in$terms$of$production$and$interpretation$of$

66

$

results$leading$to$conclusions$heavily$based$on$the$enrichment$analysis$and$preB

existing$knowledge$$

With$ the$ gene$ based$ association$ we$ also$ noticed$ some$ advantages$ in$ the$

downstream$data$mining$First$genes$are$functional$units$and$the$golden$targets$

of$ the$ ldquoclassical$ postBGWASrdquo$ bioinformatics$ pipelines$ Hence$ the$ gene$ based$

association$leads$to$genes$with$an$embedded$statistically$robust$framework$This$

is$ a$ ldquoplusrdquo$ in$ studying$ complex$ traits$ because$ one$ can$ concentrate$ on$ specific$

signals$ rather$ than$ cherry$ picking$meaningful$ features$ from$ a$ list$ constructed$

with$the$previously$cited$approaches$$

Moreover$it$is$easier$to$track$back$gene$clusters$to$major$gene$effects$B$like$for$

example$DGAT1$in$a$gene$rich$genome$chunk$many$genes$may$result$significant$

due$to$the$same$SNP$(Table$S2$and$Table$3)$With$the$VEGAS$method$finding$the$

ldquotruerdquo$association$is$facilitated$by$following$the$ldquobest$SNPrdquo$in$the$region$

In$addition$we$found$phenotype$coherent$signals$in$such$situation$where$only$a$

few$SNPs$B$ if$none$ndash$are$above$an$acceptable$statistical$genomeBwide$threshold$

using$a$single$SNP$regression$method$

The$gene$centric$approach$ is$a$ first$and$slight$ improvement$ in$ livestock$GWAS$

analyses$since$we$still$have$a$partial$genome$annotation$ In$addition$ it$ is$now$

known$that$ the$definition$of$gene$ is$ to$be$revised$many$genes$do$not$code$ for$

proteins$ have$ alternative$ starting$ sites$ have$ antisense$ transcripts$ host$

regulatory$ RNA$ are$ subjected$ to$ alternative$ splicing$ and$may$ be$ regulated$ by$

sequences$quite$ far$ away$ from$ their$ coding$ regions$ (Consortium$2012)$Other$

limitations$are$the$need$for$larger$number$of$animals$in$these$type$of$studies$as$

in$this$study$and$the$still$relatively$high$cost$of$sequencing$that$cannot$be$used$

routinely$in$large$experiments$at$least$not$yet$$

$

$

Conclusions

This$study$underlines$the$importance$of$the$implementation$of$new$methods$to$

analyze$complex$traits$with$genomic$data$No$golden$path(s)$is$discovered$yet$to$

walk$from$variation$to$functional$units$The$geneBcentric$analysis$may$contribute$

67

$

to$ reveal$ new$ signals$ throughout$ the$ bovine$ genome$ condensing$ weak$

association$signals$in$chromosome$bits$having$functional$significance$

$

$

Acknowledgements

The$Authors$would$ like$ to$ thank$Dr$ Jimmy$ Liu$ for$ his$ advices$ in$ building$ the$

software$AIA$ANAFI$ANARB$and$ANAPRI$breedersrsquo$associations$ for$ their$help$

and$ support$during$ the$data$analysis$ SelMol$ and$ProZoo$projects$ that$provide$

samples$ The$ Authors$ are$ also$ grateful$ to$MIPAAF$ and$ Cariplo$ Foundation$ for$

funding$

$

$

$

References

Akula$ N$ Baranova$ A$ Seto$ D$ Solka$ J$ Nalls$MA$ Singleton$ A$ Ferrucci$ L$

Tanaka$T$Bandinelli$ S$Cho$YS$Kim$YJ$Lee$ JBY$Han$BBG$Bipolar$

Disorder$ Genome$ Study$ (BiGS)$ Consortium$ The$ Wellcome$ Trust$ CaseB

Control$Consortium$McMahon$FJ$2011$A$NetworkBBased$Approach$to$

Prioritize$ Results$ from$ GenomeBWide$ Association$ Studies$ PLoS$ ONE$ 6$

e24220$doi101371journalpone0024220$

Amin$N$ van$Duijn$ CM$ Aulchenko$ YS$ 2007$ A$Genomic$Background$Based$

Method$ for$ Association$ Analysis$ in$ Related$ Individuals$ PLoS$ ONE$ 2$

e1274$doi101371journalpone0001274$

Aulchenko$YS$de$Koning$DBJ$Haley$C$2007$Genomewide$Rapid$Association$

Using$ Mixed$ Model$ and$ Regression$ A$ Fast$ and$ Simple$ Method$ For$

Genomewide$PedigreeBBased$Quantitative$Trait$Loci$Association$Analysis$

Genetics$177$577ndash585$doi101534genetics107075614$

Blanco$ EJ$ CarreteroBHernaacutendez$ M$ GarciacuteaBBarrado$ J$ IglesiasBOsma$ MC$

Carretero$M$Herrero$JJ$Rubio$M$Riesco$JM$Carretero$J$2013$The$

activity$and$proliferation$of$pituitary$prolactinBpositive$cells$and$pituitary$

68

$

VIPBpositive$ cells$ are$ regulated$by$ interleukin$6$Histol$Histopathol$ 28$

1595ndash1604$

Blott$S$Kim$JBJ$Moisio$S$SchmidtBKuntzel$A$Cornet$A$Berzi$P$Cambisano$

N$Ford$C$Grisart$B$Johnson$D$Karim$L$Simon$P$Snell$R$Spelman$

R$ Wong$ J$ Vilkki$ J$ Georges$ M$ Farnir$ F$ Coppieters$ W$ 2003$

Molecular$ dissection$ of$ a$ quantitative$ trait$ locus$ a$ phenylalanineBtoB

tyrosine$substitution$in$the$transmembrane$domain$of$the$bovine$growth$

hormone$ receptor$ is$ associated$ with$ a$ major$ effect$ on$ milk$ yield$ and$

composition$Genetics$163$253ndash266$

Burton$PR$Clayton$DG$Cardon$LR$Craddock$N$Deloukas$P$Duncanson$A$

Kwiatkowski$ DP$ McCarthy$ MI$ Ouwehand$ WH$ Samani$ NJ$ Todd$

JA$ Donnelly$ P$ Barrett$ JC$ Burton$ PR$ Davison$ D$ Donnelly$ P$

Easton$ D$ Evans$ D$ Leung$ HBT$ Marchini$ JL$ Morris$ AP$ Spencer$

CCA$Tobin$MD$Cardon$LR$Clayton$DG$Attwood$AP$Boorman$JP$

Cant$B$Everson$U$Hussey$JM$Jolley$JD$Knight$AS$Koch$K$Meech$

E$ Nutland$ S$ Prowse$ CV$ Stevens$ HE$ Taylor$ NC$ Walters$ GR$

Walker$NM$Watkins$NA$Winzer$T$Todd$JA$Ouwehand$WH$Jones$

RW$ McArdle$ WL$ Ring$ SM$ Strachan$ DP$ Pembrey$ M$ Breen$ G$

Clair$ DS$ Caesar$ S$ GordonBSmith$ K$ Jones$ L$ Fraser$ C$ Green$ EK$

Grozeva$D$Hamshere$ML$Holmans$PA$Jones$IR$Kirov$G$Moskvina$

V$ Nikolov$ I$ OrsquoDonovan$ MC$ Owen$ MJ$ Craddock$ N$ Collier$ DA$

Elkin$A$Farmer$A$Williamson$R$McGuffin$P$Young$AH$Ferrier$IN$

Ball$SG$Balmforth$AJ$Barrett$JH$Bishop$DT$Iles$MM$Maqbool$A$

Yuldasheva$N$Hall$AS$Braund$PS$Burton$PR$Dixon$RJ$Mangino$

M$ Stevens$ S$ Tobin$ MD$ Thompson$ JR$ Samani$ NJ$ Bredin$ F$

Tremelling$ M$ Parkes$ M$ Drummond$ H$ Lees$ CW$ Nimmo$ ER$

Satsangi$ J$Fisher$SA$Forbes$A$Lewis$CM$Onnie$CM$Prescott$NJ$

Sanderson$J$Mathew$CG$Barbour$J$Mohiuddin$MK$Todhunter$CE$

Mansfield$JC$Ahmad$T$Cummings$FR$Jewell$DP$Webster$J$Brown$

MJ$ Clayton$DG$ Lathrop$GM$ Connell$ J$Dominiczak$A$ Samani$NJ$

Marcano$CAB$Burke$B$Dobson$R$Gungadoo$J$Lee$KL$Munroe$PB$

Newhouse$SJ$Onipinla$A$Wallace$C$Xue$M$Caulfield$M$Farrall$M$

Barton$A$ (braggs)$TB$ in$RG$and$G$Bruce$ IN$Donovan$H$Eyre$S$

69

$

Gilbert$ PD$ Hider$ SL$ Hinks$ AM$ John$ SL$ Potter$ C$ Silman$ AJ$Symmons$ DPM$ Thomson$ W$ Worthington$ J$ Clayton$ DG$ Dunger$DB$ Nutland$ S$ Stevens$ HE$ Walker$ NM$ Widmer$ B$ Todd$ JA$Frayling$TM$Freathy$RM$Lango$H$Perry$JRB$Shields$BM$Weedon$MN$ Hattersley$ AT$ Hitman$ GA$ Walker$ M$ Elliott$ KS$ Groves$ CJ$Lindgren$ CM$ Rayner$ NW$ Timpson$ NJ$ Zeggini$ E$ McCarthy$ MI$Newport$M$Sirugo$G$Lyons$E$Vannberg$F$Hill$AVS$Bradbury$LA$Farrar$ C$ Pointon$ JJ$ Wordsworth$ P$ Brown$ MA$ Franklyn$ JA$Heward$JM$Simmonds$MJ$Gough$SCL$Seal$S$(uk)$BCSC$Stratton$MR$Rahman$N$Ban$M$Goris$A$Sawcer$SJ$Compston$A$Conway$D$Jallow$ M$ Newport$ M$ Sirugo$ G$ Rockett$ KA$ Kwiatkowski$ DP$Bumpstead$ SJ$ Chaney$A$Downes$K$ Ghori$MJR$ Gwilliam$R$Hunt$SE$Inouye$M$Keniry$A$King$E$McGinnis$R$Potter$S$Ravindrarajah$R$ Whittaker$ P$ Widden$ C$ Withers$ D$ Deloukas$ P$ Leung$ HBT$Nutland$S$Stevens$HE$Walker$NM$Todd$JA$Easton$D$Clayton$DG$Burton$PR$Tobin$MD$Barrett$JC$Evans$D$Morris$AP$Cardon$LR$Cardin$NJ$Davison$D$Ferreira$T$PereiraBGale$ J$Hallgrimsdoacutettir$ IB$Howie$ BN$Marchini$ JL$ Spencer$ CCA$ Su$ Z$ Teo$ YY$ Vukcevic$ D$Donnelly$P$Bentley$D$Brown$MA$Cardon$LR$Caulfield$M$Clayton$DG$ Compston$ A$ Craddock$ N$ Deloukas$ P$ Donnelly$ P$ Farrall$ M$Gough$ SCL$ Hall$ AS$ Hattersley$ AT$ Hill$ AVS$ Kwiatkowski$ DP$Mathew$ CG$McCarthy$MI$ Ouwehand$WH$ Parkes$M$ Pembrey$M$Rahman$N$Samani$NJ$Stratton$MR$Todd$ JA$Worthington$ J$2007$GenomeBwide$ association$ study$ of$ 14000$ cases$ of$ seven$ common$diseases$ and$ 3000$ shared$ controls$ Nature$ 447$ 661ndash678$doi101038nature05911$

Cantor$ RM$ Lange$ K$ Sinsheimer$ JS$ 2010$ Prioritizing$ GWAS$ Results$ A$Review$ of$ Statistical$ Methods$ and$ Recommendations$ for$ Their$Application$Am$J$Hum$Genet$86$6ndash22$doi101016jajhg200911017$

Clark$ AG$ Hubisz$ MJ$ Bustamante$ CD$ Williamson$ SH$ Nielsen$ R$ 2005$Ascertainment$ bias$ in$ studies$ of$ human$ genomeBwide$ polymorphism$Genome$Res$15$1496ndash1502$doi101101gr4107905$

70

$

CohenBZinder$M$ Seroussi$ E$ Larkin$ DM$ Loor$ JJ$Wind$ AE$ der$ Lee$ JBH$Drackley$ JK$Band$MR$Hernandez$AG$Shani$M$Lewin$HA$Weller$JI$ Ron$ M$ 2005$ Identification$ of$ a$ missense$ mutation$ in$ the$ bovine$ABCG2$gene$with$a$major$effect$on$ the$QTL$on$ chromosome$6$affecting$milk$yield$and$composition$in$Holstein$cattle$Genome$Res$15$936ndash944$doi101101gr3806705$

Cole$JB$Wiggans$GR$Ma$L$Sonstegard$TS$Lawlor$TJ$Crooker$BA$Tassell$CPV$ Yang$ J$ Wang$ S$ Matukumalli$ LK$ Da$ Y$ 2011$ GenomeBwide$association$ analysis$ of$ thirty$ one$ production$ health$ reproduction$ and$body$ conformation$ traits$ in$ contemporary$ US$ Holstein$ cows$ BMC$Genomics$12$408$doi1011861471B2164B12B408$

Colitti$M$Wilde$CJ$Stefanon$B$2004$Functional$expression$of$bclB2$protein$family$and$AIF$in$bovine$mammary$tissue$in$early$lactation$J$Dairy$Res$71$20ndash27$

Consortium$ TEP$ 2012$ An$ integrated$ encyclopedia$ of$ DNA$ elements$ in$ the$human$genome$Nature$489$57ndash74$doi101038nature11247$

De$Mello$F$Cobuci$JA$Martins$MF$Silva$MVGB$Neto$JB$2012$Association$of$the$polymorphism$g8514CgtT$in$the$osteopontin$gene$(SPP1)$with$milk$yield$ in$ the$ dairy$ cattle$ breed$ Girolando$ Anim$ Genet$ 43$ 647ndash648$doi101111j1365B2052201102312x$

Dudbridge$ F$ Gusnanto$ A$ 2008$ Estimation$ of$ significance$ thresholds$ for$genomewide$ association$ scans$ Genet$ Epidemiol$ 32$ 227ndash234$doi101002gepi20297$

Elsik$ CG$ Tellam$ RL$ Worley$ KC$ 2009$ The$ Genome$ Sequence$ of$ Taurine$Cattle$ A$window$ to$ ruminant$ biology$ and$ evolution$ Science$ 324$ 522ndash528$doi101126science1169588$

Fazioli$ F$ Minichiello$ L$ Matoska$ V$ Castagnino$ P$ Miki$ T$ Wong$ WT$ Di$Fiore$ PP$ 1993$ Eps8$ a$ substrate$ for$ the$ epidermal$ growth$ factor$receptor$kinase$enhances$EGFBdependent$mitogenic$signals$EMBO$J$12$3799ndash3808$

Fontanesi$ L$ Calograve$ DG$ Galimberti$ G$ Negrini$ R$ Marino$ R$ Nardone$ A$AjmoneBMarsan$ P$ Russo$ V$ 2014$ A$ candidate$ gene$ association$ study$

71

$

for$ nine$ economically$ important$ traits$ in$ Italian$ Holstein$ cattle$ Anim$

Genet$nandashna$doi101111age12164$

Fowler$KJ$Walker$F$Alexander$W$Hibbs$ML$Nice$EC$Bohmer$RM$Mann$

GB$ Thumwood$ C$ Maglitto$ R$ Danks$ JA$ 1995$ A$ mutation$ in$ the$

epidermal$growth$factor$receptor$in$wavedB2$mice$has$a$profound$effect$

on$ receptor$ biochemistry$ that$ results$ in$ impaired$ lactation$ Proc$ Natl$

Acad$Sci$92$1465ndash1469$doi101073pnas9251465$

Gianola$ D$ Hospital$ F$ Verrier$ E$ 2013$ Contribution$ of$ an$ additive$ locus$ to$

genetic$variance$when$inheritance$is$multiBfactorial$with$implications$on$

interpretation$ of$ GWAS$ Theor$ Appl$ Genet$ 126$ 1457ndash1472$

doi101007s00122B013B2064B2$

Gillette$ M$ Bray$ K$ Blumenthaler$ A$ VargoBGogola$ T$ 2013$ P190B$ RhoGAP$

Overexpression$ in$ the$ Developing$Mammary$ Epithelium$ Induces$ TGFβB

dependent$ Fibroblast$ Activation$ PLoS$ ONE$ 8$ e65105$

doi101371journalpone0065105$

Grisart$B$Coppieters$W$Farnir$F$Karim$L$Ford$C$Berzi$P$Cambisano$N$

Mni$ M$ Reid$ S$ Simon$ P$ Spelman$ R$ Georges$ M$ Snell$ R$ 2002$

Positional$Candidate$Cloning$of$a$QTL$ in$Dairy$Cattle$ Identification$of$a$

Missense$Mutation$ in$the$Bovine$DGAT1$Gene$with$Major$Effect$on$Milk$

Yield$ and$ Composition$ Genome$ Res$ 12$ 222ndash231$

doi101101gr224202$

Hubbard$NE$Chen$QJ$Sickafoose$LK$Wood$MB$Gregg$ JP$Abrahamsson$

NM$ Engelberg$ JA$ Walls$ JE$ Borowsky$ AD$ 2013$ Transgenic$

Mammary$Epithelial$Osteopontin$(Spp1)$Expression$Induces$Proliferation$

and$ Alveologenesis$ Genes$ Cancer$ 4$ 201ndash212$

doi1011771947601913496813$

Hu$ ZBL$ Park$ CA$ Wu$ XBL$ Reecy$ JM$ 2013$ Animal$ QTLdb$ an$ improved$

database$tool$for$livestock$animal$QTLassociation$data$dissemination$in$

the$ postBgenome$ era$ Nucleic$ Acids$ Res$ 41$ D871ndashD879$

doi101093nargks1150$

Jeon$et$al$ B$2008$ B$ProtoBoncogene$FBIB1$ (PokemonZBTB7A)$Represses$Trpdf$

nd$

72

$

Jiang$ L$ Liu$ J$ Sun$ D$ Ma$ P$ Ding$ X$ Yu$ Y$ Zhang$ Q$ 2010$ Genome$Wide$

Association$ Studies$ for$ Milk$ Production$ Traits$ in$ Chinese$ Holstein$

Population$PLoS$ONE$5$e13661$doi101371journalpone0013661$

Kong$ PBZ$ Yang$ F$ Li$ L$ Li$ XBQ$ Feng$ YBM$ 2013$Decreased$FOXF2$mRNA$

Expression$ Indicates$ EarlyBOnset$ Metastasis$ and$ Poor$ Prognosis$ for$

Breast$ Cancer$ Patients$ with$ Histological$ Grade$ II$ Tumor$ PLoS$ ONE$ 8$

e61591$doi101371journalpone0061591$

Lee$SBU$Maeda$T$2012$POKZBTB$proteins$an$emerging$ family$of$proteins$

that$ regulate$ lymphoid$ development$ and$ function$ Immunol$ Rev$ 247$

107ndash119$

Lemay$ DG$ Lynn$ DJ$ Martin$ WF$ Neville$ MC$ Casey$ TM$ Rincon$ G$

Kriventseva$ EV$ Barris$ WC$ Hinrichs$ AS$ Molenaar$ AJ$ 2009$ The$

bovine$lactation$genome$ insights$ into$the$evolution$of$mammalian$milk$

Genome$Biol$10$R43$

Liu$JZ$Mcrae$AF$Nyholt$DR$Medland$SE$Wray$NR$Brown$KM$2010$A$

versatile$ geneBbased$ test$ for$ genomeBwide$ association$ studies$ Am$ J$

Hum$Genet$87$139$

Liu$S$Dontu$G$Mantle$ID$Patel$S$Ahn$N$Jackson$KW$Suri$P$Wicha$MS$

2006$Hedgehog$Signaling$and$BmiB1$Regulate$SelfBrenewal$of$Normal$and$

Malignant$ Human$ Mammary$ Stem$ Cells$ Cancer$ Res$ 66$ 6063ndash6071$

doi1011580008B5472CANB06B0054$

Lu$ J$ Holmgren$ A$ 2014$ The$ thioredoxin$ superfamily$ in$ oxidative$ protein$

folding$ Antioxid$ Redox$ Signal$ 140202091634000$

doi101089ars20145849$

Luna$ A$ Nicodemus$ KK$ 2007$ snpplotter$ an$ RBbased$ SNPhaplotype$

association$and$linkage$disequilibrium$plotting$package$Bioinforma$Oxf$

Engl$23$774ndash776$doi101093bioinformaticsbtl657$

McCarthy$ MI$ Abecasis$ GR$ Cardon$ LR$ Goldstein$ DB$ Little$ J$ Ioannidis$

JPA$ Hirschhorn$ JN$ 2008$ GenomeBwide$ association$ studies$ for$

complex$traits$consensus$uncertainty$and$challenges$Nat$Rev$Genet$9$

356ndash369$doi101038nrg2344$

Meredith$ BK$ Kearney$ FJ$ Finlay$ EK$ Bradley$ DG$ Fahey$ AG$ Berry$ DP$

Lynn$ DJ$ 2012$ GenomeBwide$ associations$ for$ milk$ production$ and$

73

$

somatic$cell$score$in$HolsteinBFriesian$cattle$in$Ireland$BMC$Genet$13$21$doi1011861471B2156B13B21$

Minozzi$ G$Nicolazzi$ EL$ Stella$ A$ Biffani$ S$Negrini$ R$ Lazzari$ B$ AjmoneBMarsan$ P$Williams$ JL$ 2013$ Genome$Wide$ Analysis$ of$ Fertility$ and$Production$ Traits$ in$ Italian$ Holstein$ Cattle$ PLoS$ ONE$ 8$ e80219$doi101371journalpone0080219$

Nicolazzi$EL$Picciolini$M$Strozzi$F$Schnabel$RD$Lawley$C$Pirani$A$Brew$F$ Stella$ A$ 2014$ SNPchiMp$ a$ database$ to$ disentangle$ the$ SNPchip$jungle$ in$ bovine$ livestock$ BMC$ Genomics$ 15$ 123$ doi1011861471B2164B15B123$

Purcell$ S$ Neale$ B$ ToddBBrown$ K$ Thomas$ L$ Ferreira$ MAR$ Bender$ D$Maller$J$Sklar$P$de$Bakker$PIW$Daly$MJ$Sham$PC$2007$PLINK$a$tool$ set$ for$ wholeBgenome$ association$ and$ populationBbased$ linkage$analyses$Am$J$Hum$Genet$81$559ndash575$doi101086519795$

Qiu$ S$ Lai$ L$ 2013$ Antibacterial$ properties$ of$ recombinant$ human$ nonBpancreatic$secretory$phospholipase$A2$Biochem$Biophys$Res$Commun$441$453ndash456$doi101016jbbrc201310092$

Quinlan$AR$Hall$IM$2010$BEDTools$a$flexible$suite$of$utilities$for$comparing$genomic$ features$ Bioinformatics$ 26$ 841ndash842$doi101093bioinformaticsbtq033$

RamiacuterezBTorres$ A$ BarceloacuteBBatllori$ S$ MartiacutenezBBeamonte$ R$ Navarro$ MA$Surra$ JC$ Arnal$ C$ Guilleacuten$N$ Aciacuten$ S$ Osada$ J$ 2012$ Proteomics$ and$gene$ expression$ analyses$ of$ squaleneBsupplemented$ mice$ identify$microsomal$thioredoxin$domainBcontaining$protein$5$changes$associated$with$ hepatic$ steatosis$ J$ Proteomics$ 77$ 27ndash39$doi101016jjprot201207001$

Raven$ LBA$ Cocks$ BG$ Pryce$ JE$ Cottrell$ JJ$ Hayes$ BJ$ 2013$ Genes$ of$ the$RNASE5$pathway$ contain$ SNP$ associated$with$milk$ production$ traits$ in$dairy$cattle$Genet$Sel$Evol$45$25$doi1011861297B9686B45B25$

Scotti$E$Fontanesi$L$Schiavini$F$La$Mattina$V$Bagnato$A$Russo$V$2010$DGAT1$ pK232A$ polymorphism$ in$ dairy$ and$ dual$ purpose$ Italian$ cattle$breeds$Ital$J$Anim$Sci$9$doi104081ijas2010e16$

74

$

Stefanon$ B$ Colitti$ M$ Gabai$ G$ Knight$ CH$ Wilde$ CJ$ 2002$ Mammary$

apoptosis$and$lactation$persistency$in$dairy$animals$J$Dairy$Res$69$37ndash

52$

Stegh$ AH$ DePinho$ RA$ 2011$ Beyond$ effector$ caspase$ inhibition$ Bcl2L12$

neutralizes$ p53$ signaling$ in$ glioblastoma$ Cell$ Cycle$ 10$ 33ndash38$

doi104161cc10114365$

Taberlet$ P$ Valentini$ A$ Rezaei$ HR$ Naderi$ S$ Pompanon$ F$ Negrini$ R$

AjmoneBMarsan$ P$ 2008$ Are$ cattle$ sheep$ and$ goats$ endangered$

species$Mol$Ecol$17$275ndash284$doi101111j1365B294X200703475x$

Team$RC$ 2014$ R$ A$ Language$ and$Environment$ for$ Statistical$ Computing$ R$

Foundation$for$Statistical$Computing$Vienna$Austria$

Thomadaki$H$ Talieri$M$ Scorilas$ A$ 2007$ Prognostic$ value$ of$ the$ apoptosis$

related$genes$BCL2$and$BCL2L12$in$breast$cancer$Cancer$Lett$247$48ndash

55$doi101016jcanlet200603016$

Valdehita$ A$ Bajo$ AM$ FernaacutendezBMartiacutenez$ AB$ Arenas$ MI$ Vacas$ E$

Valenzuela$ P$ RuiacutezBVillaespesa$ A$ Prieto$ JC$ Carmena$ MJ$ 2010$

Nuclear$ localization$ of$ vasoactive$ intestinal$ peptide$ (VIP)$ receptors$ in$

human$ breast$ cancer$ Peptides$ 31$ 2035ndash2045$

doi101016jpeptides201007024$

Wang$T$Liu$NS$Seet$LBF$Hong$W$2010$The$Emerging$Role$of$VHS$DomainB

Containing$Tom1$Tom1L1$and$Tom1L2$in$Membrane$Trafficking$Traffic$

11$1119ndash1128$doi101111j1600B0854201001098x$

Wang$ X$Wurmser$ C$ Pausch$ H$ Jung$ S$ Reinhardt$ F$ Tetens$ J$ Thaller$ G$

Fries$R$2012$Identification$and$Dissection$of$Four$Major$QTL$Affecting$

Milk$Fat$Content$in$the$German$HolsteinBFriesian$Population$PLoS$ONE$7$

e40711$doi101371journalpone0040711$

Xu$S$2003$Theoretical$Basis$of$the$Beavis$Effect$Genetics$165$2259ndash2268$

Zimin$AV$Delcher$AL$Florea$L$Kelley$DR$Schatz$MC$Puiu$D$Hanrahan$

F$ Pertea$ G$ Tassell$ CPV$ Sonstegard$ TS$ Marccedilais$ G$ Roberts$ M$

Subramanian$ P$ Yorke$ JA$ Salzberg$ SL$ 2009$ A$ wholeBgenome$

assembly$ of$ the$ domestic$ cow$ Bos$ taurus$ Genome$ Biol$ 10$ R42$

doi101186gbB2009B10B4Br42$

75

$

Zu$X$Ma$J$Liu$H$Liu$F$Tan$C$Yu$L$Wang$J$Xie$Z$Cao$D$Jiang$Y$2011$ProBoncogene$ Pokemon$ promotes$ breast$ cancer$ progression$ by$upregulating$ survivin$ expression$ Breast$ Cancer$ Res$ 13$ R26$doi101186bcr2843$ $

76

$

Supplementaryfiles$

You$can$find$Chapter$͵$supplementary$files$at$

- DocTa$(Doctoral$Thesis$Archive)$(httptesionlineunicattit)$

- Google$Drive$(httptinyurlcomPhDMMChapter03)$or$scan$the$QR$code$

$

$

SupplementaryFigureS1Multidimensionalscaling

Spatial$distribution$of$genetic$structure$in$the$analyzed$breeds$$

$

Supplementary$FigureS2GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Brown$

$

Supplementary$FigureS3GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Holstein$

$

Supplementary$FigureS4GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$HolsteinDGAT$

$

Supplementary$FigureS5GWASplot

Manhattan$plots$and$QQBplots$respectively$for$Fat$percentage$Milk$yield$and$

Protein$percentage$in$the$Italian$Simmental$

77

$

$Supplementary$FigureS6LDdecayGenomeBwide$LD$decay$for$the$three$breeds$(Holstein$Simmental$Brown)$$Supplementary$FigureS7DescriptiveplotsDescriptive$statistics$for$the$Italian$Brown$Italian$Holstein$and$Italian$Simmental$$Supplementary$FigureS8DGAT1LDFocus$on$the$LD$of$the$DGAT1$region$in$all$the$datasets$(Brown$Holstein$HolsteinDGAT$Simmental)$$Supplementary$TableS1SNPfactNumber$of$subjects$and$SNPs$after$QC$$Supplementary$TableS2GeneBasedAssociationresultsFull$results$of$the$significant$genes$associated$with$the$traits$in$all$the$datasets$$ Sheet$1$Brown$ndash$fat$percentage$$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS3SingleSNPAssociationresultsFull$results$of$the$SNP$that$overcome$the$suggestive$threshold$$$ Sheet$1$Brown$ndash$fat$percentage$

78

$

$ Sheet$2$Brown$ndash$milk$yield$$ Sheet$3$Brown$ndash$protein$percentage$$ Sheet$4$Holstein$ndash$fat$percentage$$ Sheet$5$Holstein$ndash$milk$yield$$ Sheet$6$Holstein$ndash$protein$percentage$$ Sheet$7$HolsteinDGAT$ndash$fat$percentage$$ Sheet$8$HolsteinDGAT$ndash$milk$yield$$ Sheet$9$HolsteinDGAT$ndash$protein$percentage$$ Sheet$10$Simmental$ndash$fat$percentage$$ Sheet$11$Simmental$ndash$milk$yield$$ Sheet$12$Simmental$ndash$protein$percentage$$Supplementary$TableS4GenesandtraitsGeneWise$pBvalues$of$the$discussed$genes$in$all$the$analysis$In$grey$are$evidenced$the$pBvalues$that$overcome$the$significance$threshold$Genes$are$alphabetically$listed$$Supplementary$TableS5GenesinQTLsGenes$from$the$geneBbased$association$mapping$into$known$QTLs$$$$$

79

CHAPTER(4(

MUGBAS(a(species(free(gene(based(association(suite(

S(Capomaccio1(M(Milanesi1(L(Bomba1(et(al(1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122ItalyTheseauthorscontributedequallytothiswork

____________________ApplicationnotesubmittedtoBioinformatics

Abstract(Theassociationstudiesbetweengenomewidemarkersandphenotypes(GWAS)

arenowwidelyusedinmodelandnonmodelspeciesHowevertoolsthatmerge

singlemarkerresultsandtheirprobabilityofbeingassociatedtofunctionalunits

(typicallygenes)areseldomdevelopedforspeciesotherthanhuman

Herewe presentMUGBAS a suite that calculates a generegionbased pvalue

fromagivenannotationsinglemarkerGWAresultandgenotypedatawiththe

objective to ease postGWAS analysis The software is species and annotation

independentfasthighlyparallelizedandreadyforhighdensitymarkerstudies

Availabilityandimplementationhttpsbitbucketorgcapemastermugbas

Introduction(With the availability of highdensity SNP panels Genome Wide Association

Studies (GWAS) have become the gold standard for dissecting the biology of

complextraits(McCarthyetal2008)Thisistrueforhumansandotherspecies

TheunderlyingideaofGWASissimplefindmarkertraitassociationsexploiting

the linkage disequilibrium (LD) that exists between the causative mutation ndash

whichweignorendashandoneormoremarkersofknownchromosomalposition

Whileanumberofwellestablishedapproachescanbeeasilyusedinastandard

GWAS (Aulchenko et al 2007 Kang et al 2010 Purcell et al 2007) to find

marker(s) associated to a particular phenotype several issues have to be

accountedforduringtheupanddownstreamanalyses

Firstly genome wide studies have to take into account multiple testing If

stringent statistical thresholds are applied a proportion ofmoderate or small

effectsactuallyassociatedwiththetraitwillbedisregardedThereforereducing

the number of tests while maintaining all the information provided by high

densitypanelswouldbeaclearadvantageInadditiononceasignificantsignalis

detected different strategiesmaybeused to tag the candidate causative gene

Sometimesthegeneclosertothepeaksignalisselectedinothercasesallgenes

withinaflankingregionofanarbitrarysizeofthesignificantSNP(s)areincluded

82

infurtheranalysesThesechoicesmayendupinbiasresultseitherretaininga

largenumberofnottrulyassociatedgenesordisregardingthecorrectones

Somemethodshaverecentlybeenproposedtoovercometheselimitationsusing

aprioriknowledgeastheldquocandidatepathwayrdquoanalysis(Ravenetal2013)and

the ldquogenebasedrdquo association strategies (Akula et al 2011 Cantor et al 2010

Liuetal2010)TheseapproachesrestrictGWAstudiesonlytoknowngenesor

genesubsetsresizingthemultipletestingproblemandincreasingthepowerof

theanalyses

Usually bioinformatics pipelines are developed for the analysis of human and

modelorganismsandarenotadaptedtononmodelorganismThisisthecaseof

the VEGAS software (Liu et al 2010 Yang et al 2014) developed only for

humansandonlyforgenebasedanalyses

Here we introduce MUGBAS [MUlti species GeneBased Association Suite] a

softwarebasedonVEGASthatusesGWASdataandagivenannotation(typically

genesbutanyregionmaybesuitable)toestimategenebasedandregionbased

associationpvaluesThesuite is speciesandannotation free ready toanalyze

highdensitydatafromanyexperimentaldesigns

Methods(MUGBASisasuiteofscriptsbuiltinBashRPythonandPerlThesuiterequires

several prerequisites that can be easily fulfilled in a LinuxUnixMac

environmentwith a few command lines Further information on how tomeet

theserequirementsaredetailedintheonlinerepository

MUGBAS suite can be invoked launching the Bash wrapper that checks

dependenciesandstoresuserchoices

Required input files are MAPPED files in PLINK format with genotype data

from(atleast200individualsarerecommendedforaproperLDcalculation)an

annotation file inBED format (chromosome startingposition endingposition

nameandusercustomfields)ofthedesiredgenomefeatureandtheGWASresult

file with mandatory headers ldquoSNPNAMErdquo and ldquoPVALUErdquo Multiple association

resultscanbetestedatthesametime ifprovidedinseparatefiles inthesame

directoryAtrialdatasetisdownloadablealongwiththesoftwarereleasePlease

83

notethatforsimplicityweprovideageneannotationbutanyBEDfilewithany

desiredfeaturecanbeused

The program starts the analysis assigning SNPs to any feature of the given

annotationusingbedtools(QuinlanandHall2010)Thisstepiscustomizableby

user defined boundaries (up and downstream) in order to catch regulatory

regionsthatcouldbeinhighLDwiththecurrentgeneregion

Inordertospeeduptheanalysistheprogramsubsamples200individualsfrom

theinputdatasetWethenimplementedtwolevelsofparallelizationoneforthe

LDandonefortheldquogenewiserdquoorldquoregionwiserdquopvaluecalculationThelatteris

particularlyusefulanalyzingmorethanoneGWAresultsfromthesamedataset

BrieflyMUGBASfirstsplitstheLDcalculationinncores(asdefinedbytheuser)

using the foreach doParallel packages (Analytics andWeston 2014a 2014b)

and then distributes traits across cores using GNU Parallel and foreach

doParallel

SinceLDcalculationisaslowprocessVEGASbenefitsofHapMapphase2datato

speed up the processwhile preserving the option for a user defined (human)

population using PLINK In order to expand these analysis to any species LD

calculationinMUGBASisimplementedwithther2fast()functionfromGenABEL

package(Aulchenkoetal2007)whilethegenewisepvalueiscalculatedasin

theVEGASapproachBrieflythegenewiseteststatisticisthesumofthenupper

tailchisquaredvaluesofaSNPsubsetwithonedegreeoffreedom(wheren is

thenumberofSNPmappinginthegivengene)Withperfectlinkageequilibrium

the genewise pvalue would be the onetailed pvalue of a chisquare

distributionwithndegreesof freedomOtherwisemorelikelythedistribution

of the pvalues has to be evaluated (with simulations) andweighted (with LD

values)(Liuetal2010)

Once every generegion has its ldquogenewiserdquo pvalue a FDR qvalue (False

Discovery Rate) statistics is calculated with the base R function padjust()

Results are stored and enrichedwith ldquomanhattanrdquo and ldquoundergroundrdquo plots of

theldquogenewiserdquopvalueandtheqvaluerespectively

84

Results(MUGBAS output is generegionoriented with useful information about the

location of the analyzed feature the ldquoBest SNPrdquo within that feature with its

genomiccoordinatesandtheoriginalpvaluethegeneorregionwisestatistics

(pvalue number of simulations and qvalue) the custom annotation of that

particularentryandthepositionoftheSNPwithrespecttothefeatureandthe

decidedboundariesAll this informationareuseful to track theeffectofhighly

significant markers that could map into multiple region of interest and

effectively pinpoint the most probable location to focus on with data mining

processes

A typical analysis performed with a highdensity panel (gt600K markers)

distributed in 10 midperformance cores (Intel Xeon X5675 307 GHz) on

threetraitssimultaneouslytakesaboutfourhours

Discussion(Generegionbased methods are now widely used and accepted in genomic

research However non modelspecies suffer for the lack of availability of

specifictoolstoenhancediscoveryfromgenomewidedataMUGBASprovidesa

robust and accepted statistics to researchers dealing with species other than

humantoeasilyjumpfromassociatedmarkerstothedesiredfunctionalunits

Acknowledgements(((WewouldliketothankDrJimmyLiuforhisadvicesinbuildingthesuite

Reference(Akula N Baranova A Seto D Solka J NallsMA Singleton A Ferrucci L

TanakaTBandinelli SChoYSKimYJLee JYHanBGBipolar

Disorder Genome Study (BiGS) Consortium The Wellcome Trust Case

85

ControlConsortiumMcMahonFJ2011ANetworkBasedApproachtoPrioritize Results from GenomeWide Association Studies PLoS ONE 6e24220doi101371journalpone0024220

Analytics R Weston S 2014a doParallel Foreach parallel adaptor for theparallelpackage

AnalyticsRWestonS2014bforeachForeachloopingconstructforRAulchenkoYSdeKoningDJHaleyC2007GenomewideRapidAssociation

Using Mixed Model and Regression A Fast and Simple Method ForGenomewidePedigreeBasedQuantitativeTraitLociAssociationAnalysisGenetics177577ndash585doi101534genetics107075614

Cantor RM Lange K Sinsheimer JS 2010 Prioritizing GWAS Results AReview of Statistical Methods and Recommendations for TheirApplicationAmJHumGenet866ndash22doi101016jajhg200911017

KangHMSulJHServiceSKZaitlenNAKongSFreimerNBSabattiCEskin E 2010 Variance component model to account for samplestructure in genomewide association studies Nat Genet 42 348ndash354doi101038ng548

LiuJZMcraeAFNyholtDRMedlandSEWrayNRBrownKM2010Aversatile genebased test for genomewide association studies Am JHumGenet87139

McCarthy MI Abecasis GR Cardon LR Goldstein DB Little J IoannidisJPA Hirschhorn JN 2008 Genomewide association studies forcomplextraitsconsensusuncertaintyandchallengesNatRevGenet9356ndash369doi101038nrg2344

Purcell S Neale B ToddBrown K Thomas L Ferreira MAR Bender DMallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKatool set for wholegenome association and populationbased linkageanalysesAmJHumGenet81559ndash575doi101086519795

QuinlanARHallIM2010BEDToolsaflexiblesuiteofutilitiesforcomparinggenomic features Bioinformatics 26 841ndash842doi101093bioinformaticsbtq033

86

Raven LA Cocks BG Pryce JE Cottrell JJ Hayes BJ 2013 Genes of the

RNASE5pathway contain SNP associatedwithmilk production traits in

dairycattleGenetSelEvol4525doi101186129796864525

87

CHAPTER5

Imputationaccuracyisrobusttocattlereferencegenomeupdates

MMilanesi1etal1IstitutodiZootecnicaUniversitagraveCattolicadelSacroCuoreviaEmiliaParmense84Piacenza29122Italy

____________________ShortCommunicationacceptedinAnimalGenetics

SummaryGenotype imputation is routinely applied in a large number of cattle breeds Imputation

hasbecomeaneedduetothelargenumberofSNParrayswithvariabledensity(currently

from2900to777962SNPs)Althoughmanyauthorshavestudiedtheeffectofdifferent

statisticalmethodsonimputationaccuracytheimpactofa(likely)changeinthereference

genomeassemblyonimputationfromlowertohigherdensityhasnotbeendeterminedso

far In this work 1021 Italian Simmental SNP genotypes were reJmapped on the three

mostrecentreferencegenomeassembliesFour imputationmethodswereusedtoassess

theimpactofanupdateinthereferencegenomeAsexpectedthefourmethodsbehaved

differentlywith largedifferences in termsof accuracyUpdatingSNPcoordinateson the

three tested cattle reference genome assemblies determined only a slight variation on

imputationresultswithinmethod

KeywordsImputationreferencegenomeassemblySNPbovine

MaintextFromthepublicationof the firstbovinesinglenucleotidepolymorphism(SNP)array the

wholegenomicscenarioofthisspeciesevolvedrapidlyNowadaysseveralSNPpanelshave

been designed with a range of densities mapping on different reference genome

assemblies and produced by different manufacturers When running analyses with SNP

arrays of different densities or from different commercial companies (eg Illumina and

Affymetrix) an imputation step is often required The impact of different statistical

methods on imputation accuracy has already been assessed (Pei et al 2008) (Marchini

andHowie2010)(Maetal2013)Surprisinglyuptonowtherearenotavailablestudies

investigating the impact of an update on the reference genome assembly on imputation

accuracy This assessment would be of fundamental importance considering that reJ

mappingSNPsdeterminesarearrangementofthehaplotypestructureofthegenotypes(in

somecasesSNPsarereJmappedtoadifferentchromosome)andthatthebovinereference

assembly is likely toberepeatedlyupdatedover timeThisworkwasaimedatassessing

the effect of different reference genome assemblies in cattle As test casewe compared

four different imputation methods from low (Illumina BovineLDv10 Beadchip) to

90

mediumhigh (Illumina BovineSNP50v2 Beadchip) density in the Italian Simmental

populationForeachmethodweassessedthe impacton imputationaccuracyofmapping

SNPs on the BTAU42 BTAU46 (wwwhgscbcmeduotherJmammalsbovineJgenomeJ

project)orUMD31(Ziminetal2009)cattlereferencegenomeassemblies

Atotalof900and121SimmentalbullsweregenotypedwiththeIlluminaBovineSNP50v1

and v2 Beadchips (Illumina Inc San Diego CA) respectively Illumina BovineSNP50v2

Beadchip (hereinafter named 54k) the official reference chip for the Italian Simmental

breedersassociation(ANAPRI)wasconsideredasreferenceSNPpanelTheSNPchiMpv1

tool(Nicolazzietal2014)wasusedtoobtainSNPcoordinatesonBTAU42UMD31and

BTAU46genomeassembliesForeachassemblySNPsnotcontainedinthe54kunmapped

orinthesexualchromosomeswerediscardedThefinaldatasetcontained5118952886

and51290SNPsinBTAU42UMD31andBTAU46respectively

FourimputationmethodswereconsideredPedImpute(Nicolazzietal2013)FindHapv2

(VanRadenetal2011)FImpute(Sargolzaeietal2014)andBeaglev332(Browningand

Browning2007)Thefirstthreeusepedigreeandpopulation(ie linkagedisequilibrium

LD) informationwhereas the fourth uses only LD information Performance of the four

methods testedacrossgenomicassemblieswas comparedusingwrongly imputedalleles

(ldquoerrorsrdquo)andallelicsquaredcorrelation(ldquoallelicJR2rdquo)statisticsAllelicJR2wasdefined

asthesquaredcorrelationbetweenimputedandtrue(eggenotyped)allelesmultipliedby

100 as in Browning amp Browning (2009) The 878 oldest sires were used as training

population whereas the 143 youngest bulls were used as test population SNPs not in

commonwith the lowdensity panelwere set tomissing in the test population The test

population was split in four scenarios individuals with sire and maternal grandsire

genotypedinthereferencepopulation(scenarioAJ84animals)individualswithonlythe

sire genotyped (scenario B J 34 animals) individuals with only the maternal grandsire

genotyped(scenarioCJ13animals)ornoneofcloserelativesgenotyped(scenarioDJ12

animals)Thepopulationusedwashighlystructuredwithnearlyhalfofthebullspresent

in the dataset (490) sired by 115 genotyped sires In addition 35 out of these 115

genotypedsiresweresiredby22bulls(eggrandsires)alsopresentinthedataset

DifferencesacrossassembliesarenothomogeneousFromBTAU42toBTAU4646SNPs

weremapped to different chromosomes and the average difference in physical position

withinchromosomeswasnearly025MbpHowevertherewasnosubstantialdifferencein

the order of markers In fact not considering unmapped SNPs on either of the

aforementionedassembliesadifferenceinthesequentialorderwasobservedforonly134

91

single (ie spread throughout the genome) SNPs On the contrary from UMD31 to

BTAU46 222 SNPsweremapped to different chromosomes and amuch larger average

differenceinSNPsphysicalpositionwasobserved(iemorethan2Mbp)Alsothenumber

ofSNPswithdifferentsequentialorderwasmuchlarger(ie1893SNPs)Howevermore

than50oftherearrangementsinvolvedblocksfrom3tonearly100markersAsaresult

slight differences in overall imputation accuracy from BTA46 to UMD31 rather than

between the two BTAU assemblies were expected across methods On the contrary if

rearrangements are due to issues in the assembly of scaffolds theymight have a direct

(negative) impact on local imputation performances In fact a preliminary analysis on

BTA1consideringlocalerroracrossassembliesandmethodssuggestatleasttwosuch

SNPsclustersat44and138MbponUMD31(FiguresS1S2andS3)

Considering allmethods and scenarios amaximum average variation of 02 inerrors

(Table1)andof09inallelicJR2(Table2)wasobservedacrossreferenceassembliesThe

standarddeviation(SD)oftheimputationaccuracywasstableacrossreferenceassemblies

withinmethodsOnthecontraryalargervariabilitywasobservedacrossmethods

PedImpute and FindHap showed a relatively large average allelicJR2 variability across

scenarios evaluated over the same reference assembly Only PedImpute showed a large

variation in SD across scenarios in errors and allelicJR2 with high variability on

scenarios C and D showing a low performance of thismethodwhen there are no close

relatives(egsiredam)genotypedinthereferencedatasetFImputeinsteadachievedthe

highestaccuracyacrossallreferencegenomesandscenariossurprisinglyobtainingitsbest

resultsJerrorsof14(06SD)andallelicJR2of955(26SD)JinscenarioCwhereonly

onerelativelycloserelativeisgenotyped(maternalgrandJsire)FImputeappeartobeable

todetectshorterhaplotypesfromdistantrelativesmoreaccuratelythananyothermethod

tested in this study thus it is lessdependenton theavailabilityofpedigree information

thantheothertwopedigreeJbasedmethods(Sargolzaeietal2010)

The large variability across methods within assembly was expected although the

differences obtained among PedImpute FindHap and Beagle were larger in this study

comparedtoasimilarstudyinItalianHolstein(Nicolazzietal2013)Thiscoulddependon

a number of factors First the sample size of the test population is small (especially for

groups C and D where less than 15 individuals were present) Second the interaction

between population structure and imputationmethodsmay have a role PedImpute and

FindHap rely much more heavily on pedigree information than FImpute and Beagle

assumesall individualsareunrelatedHowever the lattertwomethods indirectlybenefit

92

greatly of from highly structured populations with large number of relatives ndash close or

distantndashpresent inthereferencepopulation(Sargolzaeietal2014)Anotherinteresting

factor that influences the imputation accuracy may be the persistence of LD at long

distances this is much lower in the Italian Simmental than in the Italian Holstein

population(FigureS4)Informationonrelatives(eitherpedigreeinformationorgenotyped

relativesinthereferencepopulation)isexpectedtoplayabiggerroleifLDisconservedat

shorter distances This hypothesis is supported by the fact that although there is high

variability acrossmethods better performances of the pedigreeJbasedmethods are still

observedinscenariosAandB(andCforFImpute)

Theslightdifferencesinimputationaccuracyobservedacrossreferenceassembliesmaybe

explained by the aforementioned small variation in the SNP blocks across reference

assemblies irrespectively of the rearrangements of single SNPs Considering that SNP

blocksaremostlymaintainedweexpectthatbreedswithlargerLDpersistenceatlonger

distances (ie Holstein Brown) will obtain even better imputation accuracy across

assembliescomparedtowhatreportedinthisstudyinItalianSimmentalHoweverastill

relativelyunexploredaspectof imputationperformance is thedistributionof imputation

errors across the genomeNot considering genotyping errors (which are expected to be

randomly distributed in the genome) a cluster of consecutive markers with high

percentage of imputation error across methods are most probably identifying missJ

assembledregions in thegenome(FiguresS1S2andS3)Althoughthe impactonoverall

imputation accuracy is limited as results in this study show these errorsmight have a

large impact on any analyses relying on specific genomic coordinates (eg genomeJwide

associationstudies)Morecomprehensiveresultsincludingannotationstudiesandwhole

genomesequencingdataarerequiredtoconfirmandinvestigatetheseobservations

Toconcludeonlyasmalleffectofupdatingthereferencegenomeassemblyonimputation

accuracywas observed in Italian Simmental On the other hand as already reported by

other studies the variability of imputation accuracy estimates is much larger across

imputation methods The choice of the most suitable imputation method based on the

breedgeneticbackgrounditscharacteristicsandontheavailabledataisthereforecritical

AcknowledgementsThe research leading to these results has received funding from the European Unions

Seventh Framework Programme (FP72007J2013) under grant agreement ndeg 289592 ndash

93

Gene2FarmPartofthegenotypicdatausedinthisstudywasprovidedbyprojectldquoSelMolrdquo

fundedbytheItalianMinistryofAgriculture(MiPAAF)MMwassupportedbytheDoctoral

SchoolontheAgroJFoodSystem(Agrisystem)of theUniversitagraveCattolicadelSacroCuore

(Italy) FB was supported by the Marie Curie European Reintegration Grant

ldquoNEUTRADAPTrdquo(FP7JPEOPLEJ2009JRG)

ReferencesBrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingandMissingJ

Data Inference for WholeJGenome Association Studies By Use of Localized

HaplotypeClusteringAmJHumGenet811084ndash1097

MaPBroslashndumRFZhangQLundMSSuG2013Comparisonofdifferentmethods

forimputinggenomeJwidemarkergenotypesinSwedishandFinnishRedCattleJ

DairySci964666ndash4677doi103168jds2012J6316

Marchini J Howie B 2010 Genotype imputation for genomeJwide association studies

NatRevGenet11499ndash511doi101038nrg2796

NicolazziELBiffaniSJansenG2013ShortcommunicationImputinggenotypesusing

PedImputefastalgorithmcombiningpedigreeandpopulationinformationJournal

ofDairyScience962649ndash2653doi103168jds2012J6062

NicolazziELPiccioliniMStrozziFSchnabelRDLawleyCPiraniABrewFStella

A 2014 SNPchiMp a database to disentangle the SNPchip jungle in bovine

livestockBMCGenomics15123doi1011861471J2164J15J123

PeiYLiJZhangLPapasianCJDengH2008Analysesandcomparisonofaccuracyof

differentgenotypeimputationmethodsPLoSONE3551

VanRadenPMOrsquoConnellJRWiggansGRWeigelKA2011Genomicevaluationswith

manymoregenotypesGenetSelEvol43

94

Tableamp1ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleamppercentageampofampwronglyampim

putedamp

allelesamp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

20

07

21

07

21

07

32

09

32

09

32

09

15

09

15

09

15

09

24

11

25

11

25

11

B2amp

34

23

08

23

09

22

08

36

11

36

12

36

11

15

07

15

09

14

06

23

08

23

08

23

08

C3amp

13

98

41

98

41

98

41

51

10

52

10

51

10

14

06

14

06

14

06

23

06

23

06

24

06

D4 amp

12

88

49

89

49

88

49

54

11

56

12

54

11

16

11

17

12

16

11

26

15

26

15

26

14

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

Tableamp2ampIm

putationampaccuracyampacrossampassem

bliesampforampPedImputeampFindH

apampFImputeampandampBeagleampallelicJR2amp

PedImputeamp

FindHapamp

FImputeamp

Beagleamp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

BTAU

amp42amp

UMDamp31amp

BTAU

amp46amp

Scenarioamp

Namp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

Meanamp

SDamp

A1amp

84

941

20

940

20

941

20

907

24

908

25

907

24

952

26

952

27

952

27

925

33

923

33

925

33

B2amp

34

935

24

934

25

935

24

896

30

896

34

895

32

954

20

954

21

954

20

929

24

928

25

928

24

C3amp

13

759

89

761

88

758

87

848

33

845

30

848

32

955

18

955

19

955

19

931

19

928

19

927

19

D4 amp

12

782

113

777

113

781

112

841

32

832

35

839

33

949

34

947

38

949

35

921

44

919

45

921

44

1 Sireandm

aternalgrandsiregenotyped

2 Siregenotypedmate

rnalgrandsireno

tgenotyped

3 Sireno

tgenotypedmate

rnalgrandsiregenotyped

4 Sireandm

aternalgrandsireno

tgenotyped

95

Supplementaryfiles

YoucanfindChapterͷsupplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter05)orscantheQRcode

SupplementaryFigureS1ErrorpercentageacrossmethodsforBTAU42onBTA1

SupplementaryFigureS2ErrorpercentageacrossmethodsforUMD31onBTA1

SupplementaryFigureS3ErrorpercentageacrossmethodsforBTAU46onBTA1

SupplementaryFigureS4Linkagedisequilibrium(LD)decayoverdistanceinItalian

SimmentalandHolstein

ThedottedlinewithblackdotsindicatesLDdecayoverdistanceinItalianHolstein

whereasthedashedlinewithwhitediamondsindicatesthesamevaluesforItalian

SimmentalItalianHolsteingenotypeswereasubsetofthedatausedinNicolazzietal

(2013)

96

CHAPTER6

QuestfordeleteriousrecessivevariantsinItalianHolsteinbullscombiningexomesequencingandhighdensitySNPanalysis

MMilanesi1etal

1IstitutodiZootecnicaUCSCviaEmiliaParmense8429122PiacenzaItaly

____________________Manuscript

Abstract

Outbreed species carry a genetic load of deleterious recessive variants Those

that are lethal survive only in the heterozygous state in the population while

otherssimplyreducethefitnessofindividualshomozygousforthevariantThe

long=termdestinyofthesemutationsistodecreaseinfrequencyduetonegative

selectionandeventuallydisappearbecauseofgeneticdriftHowever theymay

bemaintainedinlivestockpopulationsandevenincreaseinfrequencywhenin

linkage with favourable alleles for traits under selection or occurring in the

genomeofsireshavinghighgeneticvalueforproductivetraitsWesearchedfor

these variants combining exome sequence data from 20 Italian Holstein bulls

sampledfromoppositetailsofthemaleandfemalefertilityEBVdistributionthe

HighDensity(HD)SNPgenotypingof1009progenytestedItalianHolsteinbulls

and information existing in public databases Exome sequencing detected

202956 variants Among these 5167 were classified as ldquodeleteriousrdquo when

annotatedwithSIFTandAnnovarThesewerefilteredbypickingthosemapping

in regions out of equilibrium and lack of homozygotes in the progeny tested

Holstein bulls or having an asymmetric distribution in bulls plus=variant and

minus=variant for fertility EBV The resulting 193 variants in 163 genes were

furtherassessedbycomparativegenomicshaplotypeanalysisandfunctionFor

eachassessmentapositiveornegativescorewasgiventoevidenceinfavouror

against thevariant tobedeleteriousA totalof35variantshadapositive total

score These genes are involved in development cell motility and immune

responseSomeof themareknowntobeexpressed in the testiclesandappear

importantforspermfertilityinmodelspeciesothersareresponsibleforgenetic

defectsinhumanandmousecarryingmutationssimilartotheonesdetectedin

cattleThesegenevariantsarecandidatetobeamongthosethatconstitutethe

ldquogeneticloadrdquoofdeleteriousrecessivegenesintheHolsteinpopulation

100

Introduction

The presence of deleterious recessive variants is well tolerated by outbred

speciesWhenthesevariantsappeartheymayspreadandsurvivealongtimein

the population in the heterozygous state (Simmons and Crow 1977) It is

estimatedthatoutbredindividualshumanincludedcarryonaverageagenetic

loadofmorethan100loss=of=functionvariantsintheirgenome(MacArthuretal

2012)

Theeffectofdeleterious recessivevariantsappearswhendemographicevents

eggeneticbottlenecksreducethegeneticvariabilityofthepopulationorwhen

relatedindividualsareintentionallymatedforbreedingpurposesInthesecases

recessive deleterious reduce the fitness of individuals carrying them in the

homozygous state and in extreme case induce very severe genetic defects

premature death and abortion (McClure et al 2014 Sahana et al 2013

VanRadenetal2011)Theythereforecontributetoexplainthephenomenonof

inbreeding depression and its negative effect on fertility (Charlesworth and

Willis2009)

The search for deleterious recessive variants is particularly interesting in

Holstein cattle Thisbreed is experiencing anegative genetic trend for fertility

traits (Jamrozik et al 2005) it is worth noting that this is occurring while

inbreedingisunderstrictmonitoringandistheoreticallyincreasingataverylow

pace(about005peryear=Mrodeetal2009)Inadditionmatingisplannedto

minimise relationship between animals and the additive genetic variance for

mosttraitsndashincludingmilkproduction=isapparentlynotdecreasinginspiteof

the selection pressure applied by modern schemes (Biffani et al 2002)

indicatingthatoveralldiversityisonlyslightlydecreasing

SelectiontoincreasefertilityischallengingSelectioncriteriagenerallyadopted

to improve this trait are far from the biological mechanisms that govern

reproduction Inaddition thesemechanismsarecomplexandverysensitive to

environmental(management)conditions(Laneetal2013Wathesetal2014)

Asa result fertility traitshave loworvery lowheritability (Pryceetal1997)

andthereforelowresponsetoselectionandslowgeneticprogress

101

Forthesereasons thequest formolecularvariants influencing fertility traits is

on=goingsincetheavailabilityofmoleculartoolsforDNAanalysisAnumberof

investigations tested the effect of functional candidate genes running genome

wide association analyses to find QTLs for fertility traits using hundreds

microsatellite markers in structured families (Ashwell et al 2004) or tens of

thousand SNP panels in whole populations (Aston and Carrell 2009) The

availability of medium density SNP panels allowed the search for deleterious

recessivesevenwithouttheuseofphenotypictraitsbyscanningthegenomein

searchofregionslackingoneofthetwohomozygousclasses(Sahanaetal2013

VanRaden et al 2011) This approach has highlighted a number of genomic

regions likely carrying deleterious recessives This information has immediate

application inmolecular assisted breeding and a relevant scientific interest in

thesearchofcausativevariantsandassessmentoftheirfunctions

Thedevelopmentofnewsequencingtechnologieshasgreatlydecreasedthecost

of sequencing so that nowadays the sequencing of few target individuals is

becoming the gold standard to seek for causal mutations for rare monogenic

defectsinhumansandotherspecies(ChunandFay2009)Oftenonlytheexome

the protein coding part of the genome is sequenced searching for those

mutationsthatseverelyalterproteinstructureandorfunction(Bamshadetal

2011 Li et al 2013) A typical mammalian exome consists in a few tens of

millionnucleotides(around2ofthereferencegenome)comparedtothefew

billionofthewholegenomeThereforewiththesamenumberofreadsexome

sequencinghastheadvantageofamoreaccurategenotypingdueto thehigher

sequencing depth an easier data management lower computation load and

easier interpretation (Bieseckeretal2011Majewskietal2011) Indeed the

numbervariants found inexomesareone=twoordersofmagnitude lowerthan

thatdetectedinwholegenomesandmutationsaredetectedinannotatedgenes

orimmediatelynearbyOntheotherhandtheexomeapproachcanonlyidentify

variantson annotatedgenesVariants inunknownexonsor that occur innon=

coding regulatory features will be missed (Bamshad et al 2011) In humans

these are turning out to be of outmost importance for gene regulation and

downstreamphenotypic effects (ENCODEConsortium2012) but they are still

verypoorlyannotatedinlivestock

102

Herewesequencedtheexomesof20Holsteinbullsprogenytestedselectedfrom

the tails of male and female fertility EBVs (Estimated Breeding Value) Since

livestockfertilityisacomplextraitandcausativemutationscannotbesoughtby

the approach used in monogenic diseases we integrated the sequence

information with high density SNP chip data obtained on a much larger

populationofItalianHolsteinbullstodetectgenescarryingallelescandidateto

be deleterious recessives and confirm their likely deleterious effect by

comparativegenomics

Materialsandmethods

1SamplingandDNAextraction

Semen(straws)andbloodsampleswereprovidedbyItaliancertificatedartificial

insemination centres (UE Directive 88407CEE) thus ethics committee

approvalforthisstudyisnotrequired

A total of 20 ItalianHolstein progeny tested bullswere sampled for complete

exome sequencing Among them 19were chosen from the tails of female and

male fertilityEBV(EstimatedBreedingValue)distribution (Table1) tohave5

animals fromeach tail One animalwas to be in theminus=variant tail of both

traitsBullswereselectedasunrelatedaspossibleSamplingwaspossiblethanks

tothecollaborationwithANAFI(ItalianHolsteinBreederAssociation)

ANAFIdevelopedageneticevaluationfor fertilitycombiningdifferenttraitsas

explained by Biffani and colleagues (Biffani et al 2005) The traits are ldquodays

from calving to first inseminationrdquo ldquocalving intervalrdquo ldquofirst=service non return

rateto56daysrdquoldquoangularityrdquoandldquomatureequivalentmilkyieldat305daysrdquo

MalefertilityisexpressedasanEstimatedRelativeConceptionRate(ERCR)and

isameasureofconceptionrateofaservicesirerelativetoservicesiresofherd

mates(Soggiuetal2013)

103

Table1FemaleandmalefertilityEBVsofthe20animalssequenced

Animal EBVFemaleFertility EBVMaleFertility GroupfriA01 97 =111 NAfriA02 98 =166 MalLfriA03 115 0 FemHfriA04 114 0 FemHfriA05 114 0 FemHfriA06 86 0 FemLfriA07 98 195 MalHfriA08 105 213 MalHfriA09 95 =154 MalLfriA10 95 149 MalHfriA11 102 264 MalHfriA12 91 =475 FemLMalLfriA13 105 =179 MalLfriA14 102 207 MalHfriA15 99 =405 MalLfriA16 85 0 FemLfriA17 87 0 FemLfriA18 121 0 FemHfriA19 117 0 FemHfriA20 88 0 FemL

A total of 1009 Italian Holstein bulls were chosen from the entire population

balancing theirpresence in the tailsof theEBV forkgprotein (PROT) somatic

cell count (SCC) fertility (FERT) and functional longevity (LONG) for high

densitygenotyping(TableS1)

DNAwasisolatedfromwholebloodandsemenstrawssamplesinLGSlaboratory

(CremonaItalywwwlgscrit)usingtheGenomixkitaccordingtoLGSinternal

protocolDNAsforexomecaptureandsequencingwereextractedindependently

tosatisfy therequirement indicatedby IGA technologies (Udine Italy)at least

12mgofDNAataconcentration100ngulwithR260280higherthan18and

R260230near19

2Moleculardataproductionandqualitychecking

21Exomecaptureandsequencing

104

Exomes were captured using the ldquoSureSelected All exon Bovinerdquo (Agilent

Technologies) kit in early version Probes coordinates are based on UMD31

bovinereferencesequence(Ziminetal2009)anddesignedagainstNCBIexons

as well as predicted exons UTRs and miRNAs Number and distribution of

probesperchromosomearedetailedinTableS2

rdquoOvation Ultralow Library Systemsrdquo protocol was followed for library

preparation After quantification with Qubit (Life technologies) and quality

controlwith2100Bioanalyzer(AgilentTechnologies)DNAwassonicatedwith

Bioruptor (Diagenode) to obtain fragments between 150 to 200 bp in length

blunt end=repaired adenylated ligated with paired=end adaptors and purified

followingtheprotocolprovidedbythemanufacturer

LibrarieswerehybridizedwithldquoSureSelectedAllexonBovinerdquoprobesusingthe

Encore Target Capture Module (Nugen) SureSelect Target Enrichment and

Herculase II FusionEnzymewithdNTPCombo (AgilentTechnology) kit Then

capturedDNAwasamplifiedandtaggedwithlibraryspecificindexesQualityof

capturedsampleswastestedwithQubitand2100Bioanalyzer

Paired=end 2X100bp sequencing was performed on Illumina HiSeq2500

(IlluminaSanDiegoCA)atanexpectedcoverageof50X

22Sequencealignment

Adapters and reads shorter than 36bp were trimmed using Trimmomatic

(version032)(Bolgeretal2014)ForeachFastqfilethealignmentandpairing

of the reads to theUMD31 reference sequencewas performedwithBurrows=

WheelerAligner(BWA)(version077=r441)(LiandDurbin2009)applyingalso

trimmingoption (=q5) SAM files resultedwere converted intoBAM file using

SAMtools (version0118) (Lietal2009)Foreach individualBAM fileswere

merged in a single one with Picard (version 1111)

(httpsgithubcombroadinstitutepicard)Duplicatereadswereidentifiedand

markedwithPicardFinallyindividualBAMfileswereindexedwithSAMtools

105

Sequencedepthwasevaluatedsimultaneouslyonall20sequencedanimalsand

onlyonregionstargetedbyprobesusingDepthOfCoverageofGATK(McKennaet

al2010)

23ExomeVariantcallingandqualitycheck

Singlenucleotidepolymorphisms(SNPs)insertionsanddeletions(InDels)were

calledsimultaneouslyonall20sequencedanimalsandonlyonregionstargeted

byprobesusingUnifiedGenotyperofGATK(DePristoetal2011)

All variantswere filteredusingVariantFiltration ofGATK to removeoneswith

lowmappingandorsequencingqualityThechoiceoffiltersandthresholdswas

made in accordance to ldquoGATK best practicerdquo for exome sequencing and

distributionoffiltervalues

bull ldquoSNPclusterrdquoexcludedvariantifmorethan3variantswerefoundin10

bp

bull ldquoBaseQualityRankSumTestrdquoexcludedifitrsquoslowerthen=5ormorethan

5

bull ldquoMappingQualityRankSumTestrdquoexcludedif itrsquoslowerthen=7ormore

than6

bull ldquoRead Position Rank Sum Testrdquo excluded if itrsquos lower then =4 or more

than4

bull ldquoFisherStrandrdquoexcludedifitrsquosmorethan30

bull ldquoDepthrdquoexcluded if itrsquos lower then60X(ieaminimumvalueof3Xper

animals)ormorethan3500X

bull ldquoQualityrdquoexcludedifitrsquoslowerthen40

bull ldquoMappingqualityrdquoexcludedifitrsquoslowerthen30

Filters were applied to all variants but only bi=allelic SNPs were used in the

following analyses Genotypes with ldquoGenotype Depthrdquo lower than 5 and

ldquoGenotype Qualityrdquo lower than 20 were discarded The remaining ones were

converted intoldquoABrdquocode(A=referencealleleandB=alternativeallele)Then

data were transformed in PLINK (Purcell et al 2007) format using a home=

made python script To prepare the working dataset the final quality control

106

removedvariantswithmore than25missingdata (that iswithmore than5

animals out of 20 without genotypes) monomorphic (minor allele frequency

MAFlt1)ornotmappinginautosomesBasicpopulationgeneticsparameters

werecalculatedtoevaluatethepresenceofcrypticstructureinthedatathatmay

influencetheoutcomeofthedownstreamanalyses

In total 24390 SNPs identified in the exomes were also included in the HD

SNPchip panel HD genotype from nineteen out to twenty of the sequenced

animals sequenced was used to compare technologies Genotypes at common

locationswerecomparedacrosstechnologies toevaluatesequencequalityand

theeffectofsequencedepthongenotypecalling

24BovineHDGenotypingBeadChipdataandqualitycheck

AllanimalsweregenotypedusingtheBovineHDGenotypingBeadChip(Illumina

San Diego CA) Genotypes were quality controlled and filtered with standard

threshold Markers with more than 5 missing data and MAF lt 1 were

excluded Animals with more than 5missing data were also excluded Only

autosomal SNPs were retained Thereafter genotypes were phased and the

remainingmissingdata imputedusingBeaglev332(BrowningandBrowning

2007)

Ofthe20animalssequenced12hadalsoBovineHDGenotypingBeadChipand7

BovineSNP50 Genotyping BeadChip version 2 genotypes available For

concordance and haplotype analysis we imputed the 50K data to 800K as

previouslydescribedabove

3Strategyfordetectingcandidatedeleteriousvariants

Wesearchedforcandidatedeleteriousvariantsfollowinganintegratedapproach

that joined the analysis of i) exomes of a few Italian Holstein bulls having

plusvariant and minusvariant EBVs for a target trait ii) HD genotyping of a

population of 1009 progeny tested Italian Holstein bulls iii) the comparative

genomic analysis of 100 vertebrate species The strategy followed for the

107

analysisisshowninFigure1Inthefirststep(Search)welookedforcandidate

deleterious variants in the exomes In step2 (Filter)we filtered candidatesby

assessingwhichof thesevariantseitherhadanon=randomdistributionamong

animals having extreme EBVs or co=mapped with genomic regions carrying

outliersignals inthelargepopulationofprogenytestedbulls Instep3(Asses)

candidateswere further evaluated by i) assessing their conservation across a

panel of 100 vertebrate species ii) searching for associated haplotypes and

evaluating their frequencies in the population of progeny tested bulls iii)

evaluatingtheirassociationwithgeneticdefectsinhumanandmodelspeciesiv)

examining their expression profile and function when known Methods are

belowdescribedindetail

Figure1Strategyfollowedtoidentifydeleteriousvariants

108

31Step1Searchforcandidatedeleteriousvariants

Quest for candidate deleterious variants in exome sequence data by

variantannotation

WeannotatedallvariantsusingANNOVAR(Wangetal2010)andVEP(Variant

Effect Predictor) (McClure et al 2014) software They use the Ensembl gene

annotation database (UMD31 reference sequence release 74) to investigate

locationandpotentialeffectoncodingsequencesofallvariantsVEPintegrates

the SIFT algorithm (Kumar et al 2009) to predict the effect of amino acid

substitutiononproteinfunctionForeachsequencechangesequencehomology

and physic=chemical similarity between the reference amino acid and the

alternative is considered providing a score that evaluates the mutation

tolerabilityVariantswithscorelowerthan005wereclassifiedasldquodeleteriousrdquo

inaccordingtosoftwarerecommendations

SIFTalsocomparedallvariants found in theexomeswith thosepresent in the

dbSNPdatabase(version140)toidentifythosethatwerenovel

32Step2Filteringofcandidatedeleteriousvariants

2a Analysis of deleterious variant distribution among animals having

extremesEBVvaluesformaleandfemalefertilitytraits

Association between variants and the classification in the tails of the EBV

distributionwas estimated byFisherexact testwhich compares frequencies of

alleles (a vs A) in the two tails not assumingH=W equilibriumgenotypic test

that compares the frequency of genotypes (aa vs Aa vs AA) and the recessive

geneactiontestwhichtestsforthedistributionofonehomozygousclassversus

theothers(aavsaAAA)AlltestsareimplementedinPLINKfunctionsThe5

significance threshold was set empirically by permutation (N=100000) either

point=wise that is not corrected for multiple testing (EMP1 = Empirical

Probability1)andthereafterfamily=wisecorrectedformultipletesting(EMP2=

EmpiricalProbability2)

109

GenesexceedingEMP1were considered suggestive and thoseexceedingEMP2

significant candidates to be deleterious recessives and their characteristics

conservationacrossspeciesandfunctionfurtherinvestigated

2b Quest for regions candidate to carry deleterious variants in the

populationanalysedwiththeHDBeadChip

Basic population genetics parameters and Multi Dimensional Scaling were

calculated with GenABEL R package (Aulchenko et al 2007) to evaluate the

presenceofcrypticstructure inthedatathatmayinfluencetheoutcomeofthe

downstreamanalyses

ExpectedandobservedallelicandgenotypicfrequenciesandFisherexacttestof

Hardy=Weinberg proportions (Janis E Wigginton 2005) were calculated with

PLINKon theHDSNPdatasetValueswere averaged in slidingwindowsof10

markersThisvalue(10markers)waschosenaftertestingdifferentwindowsize

as represent a good compromise between noise reduction and resolution (not

shown)

Candidate regions containing deleterious mutations were identified following

twodifferentapproaches

In one approach hereafter refereed to as ldquoOut of Hardy=Weinbergrdquo approach

(OHW)markerssignificantlydeviatingfromofHardy=WeinbergequilibriumatP

le84710=8 (nominalvaluePle005Bonferronicorrected formultiple testing)

werefirstidentifiedWindowswithatleastthreesignificantmarkersbelowthis

thresholdandhavingahomozygotedeficitwereconsideredcandidatestocarry

deleterious mutations In this OHW approach we applied very stringent

constraints to reduce false positive discoveries due to the Wahlund effect

inducedbypopulationsubstructure(seeresultsandFigureS5)

In thesecondapproachhereafterreferred toas ldquoLackOfHomozygotesrdquo (LOH)

approach the differences between observed and expected frequencies of

homozygotesfortheminorallelewerecomputedandaveragedacrossmarkers

included in each sliding window Values were standardized and only regions

carrying three overlapping windows below =3Z score value were considered

110

candidates to carry deleterious mutations Thereafter significant windows

carryingatleastonemarkerincommonwerejoinedtodefineregionboundaries

ExomesequencingknowledgeandHDBeadChipdatawerecrossed inorder to

identifygenescarryingseverevariantslocatedwithinsignificantregionsortheir

proximity (50Kb upstream or downstream) These genes were considered

candidates to be deleterious recessives and their characteristics conservation

acrossspeciesandfunctionwasfurtherinvestigated

33Step3Assessmentofcandidatedeleteriousvariants

Genes carrying variations classified as deleterious by SIFT or ANNOVAR and

eitherassuggestivebyfertilityEBVtailsanalysisorassignificantbyOHWeLOH

analyses of the Holstein bull population were submitted to comparative

genomics population genetics and functional analyses In order to prioritize

genesvariants discoveredwe scored each evaluation using a subjective scale

ranging from =3 to 3 to measuring the evidence in favour (scores 1 to 3) or

against(scores=1to=3)thedeleteriouseffectoftheSNPsThescore0inallcases

indicated the absence of evidence or that a specific assessment failed The

completescorescalethereforerangedfrom=9to9

3aComparativegenomics

For comparative genomics purposes the position of eachmutationwas ldquolifted

overrdquo from UMD31 bovine coordinates to hg19 human coordinates using the

specifictoolatUCSC(Kentetal2002=httpgenomeucscedu)Orthologous

proteins were aligned using the UCSC genome browser and amino acid

conservation assessed in a set of 100 vertebrate species that included human

primates (11 species) euarchontoglires (egmouse14 species) laurasiatheria

(egsheep25species)afrotheira(egelephant6species)othermammals(eg

platypus5species)birds(egchicken14species)sarcopterygii(egturtle8

species) fish (eg zebrafish 16 species) Results of amino acid conservation

analysiswererecordedonlyifhomologousproteinfromatleast50speciescould

beretrievedandalignedThereafteraminoacidconservationwasscored3ifthe

111

aminoacidwasconservedinallvertebrates2ifconservedinallmammalsand1

ifconservedinthelargemajority(atleast90)ofmammalsAscoreof=1was

given when the amino acid was not conserved among species and a score =3

when both variants observed in cattle were carried by other species Finally

wheneithertheliftoverprocessortheproteinalignmentfailedthecomparison

wasconsiderednoninformativeandgivenascore0

3bSearchofhaplotypesassociatedtodeleteriousvariantsandanalysisof

HDpopulationdata

To acquire further evidence we searched for the haplotypes that carry the

deleterious variations and estimated their frequency in the

homozygousheterozygousstatuswithinthebullpopulationgenotypedwiththe

HDBeadChip

Weusedamultistepapproachtofindhaplotypescarryingdeleteriousvariantsi)

from the phased HD BeadChip data we extracted the haplotypes of the 20

sequencedanimalsintheregionofthevariant(100markersoneachside)ii)in

these 20 animals we searched the shorter haplotype able to distinguish

homozygous andor heterozygous animals for the deleterious variant from all

other haplotypes iii) we assessed the frequency of homozygotes and

heterozygotes for thehaplotype in thewholepopulationofbulls characterised

withtheHDBeadChip

Scoresinthiscasewere=3inthecaseofuniquehaplotypesshowinganumberof

homozygotesnonsignificantlydifferentfromthenumberexpectedfromHardy=

Weinbergproportions3inthecaseofuniquehaplotypesshowinganumberof

homozygotes significantly lower than expected 1 in the case of unique

haplotypesneverhomozygousandrare0whenevernouniquehaplotypecould

befound

3cFunctionalanalyses

With PantherDB (Mi et al 2010) resources we classified our gene list into

functionalcategoriesaccordingtotheGeneOntologyclassificationAllsuggestive

112

andcandidateEnsemblGeneIDsubmittedtothebulkanalysiswererecognizedand annotated Beyond the GO term characterization in order to understandpotentialdetrimentaleffectsofmutationsandalsopinpointpotentialcandidatesfor fertility traits we retrieved information for each gene according to thefollowing parameters i) the involvement in known Mendelian or complexdiseases fromOMIMdatabase ii) theavailabilityofknock=outmice iii) and theeventual expression in reproductive tissues from the Gene Expression Atlas(httpwwwebiacukgxa)(Table4)

Alsoforthesedataweusedthesamescaleattributingarangefrom=3pointstogenes having no relation with fertility viability and genetic defects and norelevant phenotypic effects in knock=out mouse models to 3 to gene closelyrelated to fertility and viability associated to genetic defects in human andaffectingfertilityorsurvivalwhenknocked=outInparticularascoreranging=1to+1wasattributedtogenesassociatedtohumansyndromes(=1=noassociationrecordedinOMIMdatabase0=noinformationinOMIM1=associationinOMIM)A score with same range to information collected from knock=out mousephenotypes(=1=knock=outmouseshowsnoimpairedphenotype0=noknockoutmouse has been produced 1= knock=out mouse shows impaired phenotype)Thesamescorewasattributedtomolecularfunctions(=1=functionunrelatedtofertilityandviability0=functionindirectlyrelatedasimmunityandmetabolism1=functiondirectlyrelatedasfertilityanddevelopment)

Results

Exomecaptureandsequencing

A totalof225414probeswereused for capturing thecattleexome (5355Mbabout 2 of UMD31 reference genomes) Mitochondrial (MT) and Ychromosome were not included in probe design The number of probes perchromosomespannedfrom2500onBTA27toalmost12000inBTA3(TableS2)

113

The average exome sequencing depth was 4058 slightly lower than the

expected50XAllanimalshadatleast24Xaveragegenomessequencedbutina

same animal the sequencing depth varied across chromosomes and regions

withinchromosomesThelowestchromosomeaveragevalueobservedwas10X

inBTA25oftheshallowersequencedanimalAveragecoverageperanimaland

chromosomeandoverallmeansareshowninFiguresS1

In total1613461402 raw100bppair=ends sequence readswereproducedA

total of 1425514112 100bp reads were mapped and properly paired to the

reference genome Paired reads per animal ranged between 41973510 and

138840392

Insynthesissequencingdepthwasvariableasexpectedandalmostallanimals

hadsomeregionsofscarceornosequencedepthbutoverallallexomeregions

were sufficiently covered to permit a comprehensive analysis of molecular

variantsoftheanimalsinvestigated

Step1Searchforcandidatedeleteriousvariants

Variantdetectionandannotation

Thesequencingdepthofallvariantscalledonthealignmentofallreadsacross

animalsfollowedaChi=squaredistributionAveragedepthwas79694spanning

from2Xto5000Xwithpeakfrequencyclassat150=250X(FigureS2)

A multistep approach was followed to clean the exome sequence dataset

Cleaning process steps and the number onmarkers retained in each step are

describedinTable2

114

Table2Typeandnumberofmarkersdetectedandretainedafterdatasetcleaning

TypeofvariationNumberofvariations(N)

Rawdata

Aftervariantcleaning

Aftergenotypecleaning(onlyautosomal)

Complex 1213 963 0Deletion 6938 5031 0Insertion 5510 3656 0MultiallelicComplex 5710 3995 0MultiallelicSNP 803 512 0SNP 242988 203079 164277Total 263162 217236 164277whencomparedtothereferencegenome

Atotalof263162variantswereidentifiedAmongthese217236(8255)wereretainedafterapplyingthevariantfilters(seematerialsandmethodssection)

According to GATK classification the vast majority (9348) of variationsidentified were biallelic SNPs followed by insertions and deletions (InDels40) more complex variations and multiallelic SNPs (252) Indels andcomplexvariationsalthoughlessfrequentthanSNPsaffectedarelevantportionoftheexome(49589bp)Thenumberofvariantsdetectedperchromosomewassignificantly correlated to the number of probes used (r=08761P=228x10=10)andofbasescaptured(r=08424P=53x10=19)perchromosomehoweverBTA23BTA27 BTAX had an outlier behaviour the former two containing highervariation(450and445variantsperKb)andthelatterlower(084variantsperKb)thantheaverage(322variantsperKb)

Bialleic SNPswere further filtered for genotype quality (sequence depth gt 5Xandsequencequalitygt20)andBTAXmarkersThefinalautosomalSNPdatasetcounted164277SNPsAmongthese17829biallelicSNPs(109)describedforthefirsttimehavebeensubmittedtodbSNP

Variantsrsquo annotation is shown in Table 3a Variantswere detected in differentgenedomains(exonsintrons3rsquountranslatedregions)aswellasupstreamanddownstreamgenesMorethan31ofthevariationsweredetectedinexonsandamong them more than 41 have the potential to have an effect on genefunctionbyaffectingsplicingoralterproteinstructure(Table3b)

115

Table 3 A) number of autosomal SNPs annotated in each gene region B) number and

classificationofautosomalSNPsalteringgenefunction

A)

Generegion NofSNPs(autosomes)Downstream 1170Exonic 51293Intergenic 49901Intronic 58116ncRNA_exonic 160ncRNA_intronic 3ncRNA_splicing 2Splicing 184Upstream 1445Upstreamdownstream 54UTR3 1345UTR5 604Total 164277

B)

EffectofexonicSNPs NofSNPs(autosomes)Nonsynonymoussinglenucleotidevariation 20839Stopgainsinglenucleotidevariation 213Stoplosssinglenucleotidevariation 10Synonymoussinglenucleotidevariation 30226Unknown 5Total 51293

whencomparedtothereferencegenome

Amongthe51293exonicSNPs4166in2978geneswereclassifiedldquodeleteriousrdquo

by SIFT Among these 3210 in 2356 genes were observed at least twice

ANNOVARidentified223stopgain(213)orloss(10)variantsin217genes

Theaverageconcordanceat24390SNPs in commonbetweenexomesand the

HD Beadchip was 9716 (Table S3) Discordance was due to either alleles

detected by sequencing andnot detected by genotyping (exome=heterozygote

Beadchip=homozygote) and vice=versa (exome=homozygote

Beadchip=heterozygote) Discordances were randomly distributed among

markers and inversely correlated to sequencing depth (Figure S3) indicating

that most errors are likely occurring during sequencing even if occasional

116

Beadchip failure indetectingsomeallelecannotbeexcludedTwoanimalshad

anoutlierbehaviouronehaving846andthesecond719concordanceBoth

animalswereretainedforvariantdiscoverywhiletheywereexcludedfromthe

analysesofEBVtailsWithoutthesetwooutlieranimalsthemeanconcordance

increasesto9938

Multidimensional scaling confirmed the absence of a substructure in animals

sequenced(FigureS4)

Step2Filteringofcandidatedeleteriousvariants

AnalysisofEBVtails

Fisherexact test isgenerallyused to test for theassociationbetweenavariant

and a Mendelian disease Samples are divided in cases and controls and the

association between alleles or genotypes and the disease is tested under

different inheritance models Here we used the same method to test for

association between allelesgenotypes and animals from opposite tails of the

EBVdistributionfori)malefertility(N=4high+4low)ii)femalefertility(N=5

high+4low)andiii)maleandfemalefertility(N=9high+8low)

Genotypes were tested under all inheritance models implemented in PLINK

Giventhelownumberofsamplesnovariantwassignificantaftercorrectionfor

multiple comparisons (EMP2) and only suggestive variants could be found

(significant at 5 at EMP1)We considered suggestive of carrying deleterious

allelesalistof101genescarrying112variants(TableS4)

Detectionofregionscarryingdeleteriousmutationfrom800KSNPdata

After applying quality control parameters 26653 and 146991 markers were

removed from the original dataset for excessive missing data and MAF

respectivelyInaddition17individualsremovedforlowgenotypingrateAswith

exomes only autosomal SNPswere retained for the analyses thus obtaining a

workingdatasetof590680SNPsand992animals

117

MDS(FigureS5)analysis indicates that thedatasethassomesubstructure that

mightaffectdownstreamanalysesWeaccountedfortheriskoffindingmarkers

outofHardy=WeinbergproportionsduetotheWahlundeffectbychoosingvery

stringentsignificancethresholdintheOHWanalyses

To detect regions candidate to carry deleterious recessivemutations we used

two approaches the ldquoOutside Hardy Weinbergrdquo (OHW) and the ldquoLack Of

Homozygotesrdquo (LOH) approaches (see materials and methods) The two

approaches complement each other in the detection of regions carrying less

homozygotes than expected OHW identifies regions in which a few markers

have extreme homozygote deficit while LOH detects regions in which many

markershaveamoderatedeficit

In the SNP dataset 1884 markers were significantly out of Hardy=Weinberg

equilibrium(Ple84710=8)Atotalof485windowswereidentifiedwithatleast

3 markers out of 10 exceeding this threshold Among these 84 contained

markerssignificantbecauseofhomozygotedeficitThese84windowsidentified

11OHWcandidateregionsin9chromosomes(TableS5)

TheLOHapproachidentified2335windowsbelow=3Zscoreofthestandardised

differencebetweenobservedandexpected frequencyofhomozygotes forMAF

Theseidentified326LOHcandidateregionsspreadonallautosomes(TableS6)

Onlyregionsformedbyatleast3singlewindowswereretainedreducingLOH

candidateregionsto240

Deleterious variants (SIFT deleterious and stop codon gainedlost) were

intersectedwithsignificantOHWandLOHregionSevenofthesemappedwithin

OHW sliding windows 36 within LOH sliding windows Among these 5 were

located in regions significant in both OHW and LOH Other 51 deleterious

variants mapped in the immediate vicinity (+=50Kb) of significant windows

These 83 variants affect 66 unique genes candidate to carry deleterious

recessivealleles(TableS7)

Five regionswere common toOHWandLOHoneonBTA4 (LOH=74959610=

75151417 OHW=74970653=75151417) two on BTA7 (LOH=9626435=

10099512 OHW=9628735=10099512 LOH=10575691=11171132

118

OHW=10486424=11087441) one on BTA15 (LOH=80299924=80928311

OHW=80299924=80921274) and one on BTA18 (LOH=28122373=

28185525OHW=28106022=28164693)

Step3Assessingcandidatedeleteriousvariants

After the two filtering step previously described a total of 191 SNPs were

retained out of the 3433 classified as deleterious by SIFT and Annovar and

analysed further These are carried by 163 genes Eighteen genes carried 2

candidateSNPsthreegenes3SNPsandonegene5SNPsInaddition4SNPsin

four geneswere identified as potential candidates in both the analysis of EBV

tailsandofbullpopulation

A totalof12SNPs introducedstopcodonswhile the remaining inducedamino

acid substitutionsor splicing variants In the filtereddataset theproportionof

stop codons on total variation (63) was only slightly higher than the

proportion found in the entire dataset (51 225 stop codon out of 4391

deleteriousSNPs)

Comparativegenomics

Proteinmultiplealignmentwassuccessfulin146outof191comparisonsIn26

casesthemostfrequentaminoacidincattlewasveryhighlyconservedeitherin

allvertebrates(N=9)orinallmammals(N=17)Inother23casestheaminoacid

was conserved in at least 90 of mammalian species In the last class were

generally included amino acids that changed in mammalian species

phylogenetically verydistant from cattle (eg platypus and related species) In

97 instances the amino acid could change quite freely and occasionally both

variantsobservedincattlewerecarriedbyotherspecies

HaplotypeanalysisofHDpopulationdata

The search for haplotypes associated to the191 variants for further testing in

the population identified 101 unique and invariant haplotypes common to all

119

carriersofatargetvariantanddifferentfromallthosefoundinnoncarriersIn

additional36casesanoncompletelyinvarianthaplotypecouldbeassociatedto

themutationConverselyin54instancesareliablehaplotypecouldnotbefound

(table S8) In total 38 haplotypes had no homozygous individuals in the

population Thiswas expected in the case of 33 rather rare haplotypes (fewer

than 100 heterozygotes observed in the population) Conversely 5 haplotypes

were rather frequent (from 170 to almost 300 heterozygotes observed in the

population)andtheexpectationwastofind78to222homozygousindividuals

in thepopulation insteadof thenone foundAnothervarianthadaremarkable

high frequency (35) and a clear outlier behaviour Only 11 homozygous

individualswereobservedoutofthe121expected

Functionalanalysisofcandidategenes

To assess the potential deleterious effects of the variants identified we also

retrieved functional information for each gene according to the following

parametersi)associationtoknownMendelianorcomplexdiseaseslistedinthe

OMIM database ii) availability and effect of gene knock=out in mice iii)

expression profileswith particular attention in reproductive tissues from the

GeneExpressionAtlas(Table4)

Twenty genes out of 163 have a recognized phenotype in humans These are

mostlyseveresyndromesUnfortunatelyonlyafewgeneshaveanull=miceline

producedwithphenotypicdataavailable

According to the expression data available from Ensembl these genes are

generally (with someexceptions)expressed inavarietyof tissueswitha clear

tendencytobehighlyexpressedintestis

120

Tableamp4Informationon163genesanalyzedconcerningeventualphenotypesinmanorknockoutsinmouseHeaderampcodesAbrainBcolonCheartDkidney

EliverFlungGskeletalmuscleHspleenItestisColoursrepresentthegenelevelofexpressionindifferenttissuefromlow(whitegrey)tohigh(blue)NCin

FunctioncolumnmeansldquoNotClassifiedrdquo

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000000079

CCSAP

Cells

9

1

03

1

07

2

06

2

2

NC

ENSBTAG00000000149

USP40

Cells

6

7

2

7

5

7

2

8

9

fertility

ENSBTAG00000000414

FUT5

Nomatch

37

1

cellular

process

ENSBTAG00000000918

ERMARD

Periventricular

nodularheterotopia

6Xlinked

periventricular

heterotopia6q

terminaldeletion

syndrome

Periventricular

nodularheterotopia

Nomatch

5

4

4

9

6

7

4

9

17

cellular

process

ENSBTAG00000000998

SLC39A6

Cells

6

9

2

7

4

10

1

10

9

cellular

process

ENSBTAG00000001133

VWA8

Mice

2

3

6

7

10

3

3

4

4

blood

metabolism

ENSBTAG00000001140

IAH1

httpswwwmousephenotypeor

gdatagenesMGI1914982sect

ionassociations

BOTHSEXadiposetissue

behaviourneurological

growthsizebodyFEMALE

homeostasismetabolismMALE

limbsdigitstailskeleton

9

18

30

60

23

19

12

17

30

cellular

process

ENSBTAG00000001262

IRGQ

Cells

6

1

5

3

06

2

5

04

2

immunity

ENSBTAG00000001286

ELMO3

httpswwwmousephenotypeor

gdatagenesMGI2679007sect

ionassociations

Decreasedstartlereflex

02

27

14

7

7

02

1

fertility

ENSBTAG00000001291

OR8H3

Nomatch

fertility

ENSBTAG00000001500

FIGNL1

Cells

08

2

01

05

07

08

04

2

3

cellular

process

ENSBTAG00000001505

GRK4

Cells

10

7

09

4

6

3

2

6

218

signalling

ENSBTAG00000002220

SPTLC1

Hereditarysensory

andautonomic

neuropathytype1

No

17

27

3

28

12

28

5

19

6

metabolism

ENSBTAG00000002288

NT5DC4

Nomatch

02

cellular

process

ENSBTAG00000002289

NLRP14

No

04

cellular

process

ENSBTAG00000002660

CCDC11

Situsinversustotalis

Nomatch

3

1

06

2

07

3

03

2

54

fertility

121

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000002809

CAPN10

AutosomalRecessive

MentalRetardation

DIABETESMELLITUS

NONINSULIN

DEPEND

ENT1

Cells

2

13

41

33

29

immunity

ENSBTAG00000002966

DNAJC13

Youngadultonset

Parkinsonism

No

5

52

94

62

84

fertility

ENSBTAG00000003231

LIM2

Pulverulentcataract

Cells

01

02

01

02

cellular

process

ENSBTAG00000003508

CFAP69

httpswwwmousephenotypeor

gdatagenesMGI2443778sect

ionassociations

Abnormalcirculatinginsulinlevel

MaleInfertilityDecreasedmean

plateletvolum

eIncreasedleanbody

mass

203

07

1

01

7NC

ENSBTAG00000003523

UGT2B15

Nomatch

34

metabolism

ENSBTAG00000004081

FAT3

No

2

03

01

01

05

signalling

ENSBTAG00000004555

LRP2

DONN

AIBARROW

SYND

ROME

Intellectualdisability

Cells

56

3

01

disease

ENSBTAG00000004772

THEM

4

Cells

8

16

295

27

52

63

metabolism

ENSBTAG00000004860

SLC27A6

Cells

6

02

10

602

207

01

immunity

ENSBTAG00000005021

SEMA5A

Monosom

y5p

Cells

9

04

89

13

08

08

6immunity

ENSBTAG00000005170

GPR114

Mice

101

02

01

09

6

01

fertility

ENSBTAG00000005248

OFCC1

No

01

01

03

01

01

01

03

NC

ENSBTAG00000005340

SERGEF

Cells

7

10

35

15

58

23

cellular

process

ENSBTAG00000005357

VPS11

Mice

14

13

619

717

815

15

cellular

process

ENSBTAG00000005466

C8H8orf58

Nomatch

05

01

08

02

103

02

04

NC

ENSBTAG00000005514

TRIO

Cells

17

66

10

210

88

7immunity

ENSBTAG00000005633

RGNEF

httpswwwmousephenotypeor

gdatagenesMGI1346016sect

ionassociations

Vertebralfusion

36

214

05

801

08

1cellular

process

ENSBTAG00000006021

CEP250

Mice

3

22

43

41

88

metabolism

ENSBTAG00000006027

USP34

Mice

11

63

74

88

912

fertility

ENSBTAG00000006532

CDK5RAP2

Autosomalrecessive

primary

microcephaly

httpswwwmousephenotypeor

gdatagenesMGI2384875sect

ionassociations

skeletonimmunesystem

growthsizebodyhem

atopoietic

system

5

34

92

74

784

developm

ent

ENSBTAG00000007001

SLC26A1

No

03

01

02

32

903

01

07

metabolism

ENSBTAG00000007340

LOC101906349

Nomatch

NC

ENSBTAG00000007568

MEPE

Cells

cellular

process

ENSBTAG00000007910

EXOC7

Cells

13

24

23

22

12

22

15

22

28

cellular

process

122

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000008650

CAMK1D

Cells

24

2

01

06

1

2

03

7

3

signalling

ENSBTAG00000008797

Uncharacterized

Nomatch

09

04

07

04

NC

ENSBTAG00000008871

DDX4

Cells

02

01

01

132

cellular

process

ENSBTAG00000008996

SFI1

Mice

2

1

06

3

04

3

2

5

NC

ENSBTAG00000009014

UPK1B

Mice

2

2

2

5

4

4

2

7

3

fertility

ENSBTAG00000009030

NOX5

Nomatch

04

01

8

01

immunity

ENSBTAG00000009033

TEX37

httpswwwmousephenotypeor

gdatagenesMGI1921471sect

ionassociations

Abnormalstartlereflex

03

83

NC

ENSBTAG00000009064

CSF2RB

Congenitalpulmonary

alveolarproteinosis

Mice

01

2

04

06

2

23

03

17

05

signalling

ENSBTAG00000009144

BPIFA2A

Nomatch

6

06

05

metabolism

ENSBTAG00000009337

PNMAL1

Cells

8

05

01

3

02

03

03

04

3

Immunity

ENSBTAG00000009444

SLC15A5

Cells

01

cellular

process

ENSBTAG00000009568

KANK2

Cells

1

17

30

9

3

31

9

10

3

cellular

process

ENSBTAG00000009569

DOCK6

AdamsOliver

syndrome

Cells

2

5

10

12

5

15

5

6

21

metabolism

ENSBTAG00000009836

CHGA

Mice

140

555

01

2

02

01

07

1

metabolism

ENSBTAG00000010522

Uncharacterized

Nomatch

01

02

02

03

2

metabolism

ENSBTAG00000010601

DOLK

Familialisolated

dilated

cardiomyopathy

No

4

8

6

22

9

6

5

5

12

metabolism

ENSBTAG00000010847

FOXN4

Cells

01

01

metabolism

ENSBTAG00000010954

ART3

Cells

02

1

55

01

08

2

10

06

71

metabolism

ENSBTAG00000011011

SSH2

No

3

09

5

4

2

2

6

7

10

cellular

process

ENSBTAG00000011387

NAT9

Cells

6

7

3

8

6

8

4

6

6

metabolism

ENSBTAG00000011420

CA9

Cells

04

3

02

2

03

1

01

2

56

metabolism

ENSBTAG00000011433

RGP1

httpswwwmousephenotypeor

gdatagenesMGI1915956sect

ionassociations

Completepreweaninglethality

7

19

11

18

8

12

9

8

25

cellular

process

ENSBTAG00000011622

C18orf21

Mice

9

10

6

6

2

10

5

10

34

NC

ENSBTAG00000011683

NUMB

Cells

21

25

8

35

17

47

6

8

19

immunity

ENSBTAG00000011684

RAB11FIP1

Cells

6

01

8

09

11

02

1

4

NC

ENSBTAG00000012053

ACCSL

No

03

01

cellular

process

ENSBTAG00000012314

LDLR

Homozygousfamilial

hypercholesterolemia

Cells

8

33

11

21

17

24

2

15

2

cellular

process

123

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000012326

LOC101906218

Nomatch

11

01

01

09

01

NC

ENSBTAG00000012458

PSAPL1

Cells

01

NC

ENSBTAG00000012565

PCNX

No

3

32

32

33

48

developm

ent

ENSBTAG00000012833

ATG2B

No

4

21

21

21

29

NC

ENSBTAG00000012921

KIF24

Mice

05

07

03

08

02

202

214

metabolism

ENSBTAG00000013568

UEVLD

Cells

5

41

53

42

214

metabolism

ENSBTAG00000013685

KRT31

Mice

cellular

process

ENSBTAG00000013753

DDRGK1

Mice

22

32

18

56

55

25

14

22

37

NC

ENSBTAG00000013789

OR6K6

No

NC

ENSBTAG00000013838

ALLC

Mice

02

01

78

metabolism

ENSBTAG00000013916

HERC2

Mentalretardation

autosomalrecessive

38SKINHAIREYE

PIGM

ENTATION

VARIATIONIN1

Developm

entaldelay

withautismspectrum

disorderandgait

instability

No

8

74

83

74

65

metabolism

ENSBTAG00000014001

MAP1A

Cells

136

23

22

202

43

352

cellular

process

ENSBTAG00000014058

LDB2

No

28

33

12

825

410

3cellular

process

ENSBTAG00000014284

ALPK2

No

01

10

01

304

02

cellular

process

ENSBTAG00000014750

EPB41L4A

httpswwwmousephenotypeor

gdatagenesMGI103007secti

onassociations

Decreasedcirculatingglucoselevel

Abnormalconeelectrophysiology

Immunesystem

s1

22

20

07

12

13

43

NC

ENSBTAG00000014752

AKAP11

httpswwwmousephenotypeor

gdatagenesMGI2684060sect

ionassociations

Nopheno

24

72

63

306

68

cellular

process

ENSBTAG00000014768

ZNF786

Cells

3

23

42

48

33

cellular

process

ENSBTAG00000014794

SCYL3

Cells

6

10

59

613

516

26

cellular

process

ENSBTAG00000015027

Clusterin

httpswwwmousephenotypeor

gdatagenesMGI88423sectio

nassociations

Aggressionvertebralfusion

05

03

03

02

05

3NC

124

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000015210

LOXHD1

DEAFNESS

AUTOSOMAL

RECESSIVE77

Autosomalrecessive

nonsyndromic

sensorineuDominant

lateonsetFuchs

cornealdystrophy

deafnessautosomal

recessivetype77

No

3

03

NC

ENSBTAG00000015294

COX10

Leighsyndrome

Mitochondrial

complexIVdeficiency

(MTC4D)

Cells

4

6

24

5

2

3

14

3

16

cellular

process

ENSBTAG00000015335

BAI3

No

38

03

8

01

01

2

signalling

ENSBTAG00000015490

HS1BP3

Cells

1

10

5

9

4

8

2

1

37

cellular

process

ENSBTAG00000015602

C7H5orf45

Nomatch

10

cellular

process

ENSBTAG00000015839

MAP4

No

27

85

104

35

11

100

82

46

61

cellular

process

ENSBTAG00000015912

DMKN

No

01

04

03

NC

ENSBTAG00000015980

FASN

AutosomalRecessive

MentalRetardation

Cells

19

29

7

12

6

39

6

7

22

metabolism

ENSBTAG00000015985

GPR20

httpswwwmousephenotypeor

gdatagenesMGI2441803sect

ionassociations

Anatomicaldefectsvarioustissues

03

02

08

01

09

signalling

ENSBTAG00000016495

LINS

AutosomalRecessive

MentalRetardation

Cells

6

3

1

5

3

3

1

5

12

NC

ENSBTAG00000016691

LACC1

Mice

2

9

1

9

9

8

08

7

27

NC

ENSBTAG00000017096

ERICH1

Cells

6

4

2

4

2

5

2

5

13

metabolism

ENSBTAG00000017165

MATN2

No

2

7

2

20

08

3

2

2

7

cellular

process

ENSBTAG00000017418

RRP1B

Cells

6

4

4

5

3

7

3

12

159

cellular

process

ENSBTAG00000017436

ALS2CR11

Cells

01

37

disease

ENSBTAG00000017505

PAXIP1

Cells

2

3

2

4

2

4

2

3

12

signalling

ENSBTAG00000017576

GPR144

Nomatch

02

signalling

ENSBTAG00000018528

USP45

No

1

1

04

1

05

06

02

2

04

cellular

process

ENSBTAG00000018810

THBS2

INTERVERTEBRAL

DISCDISEASE

Cells

1

2

04

09

1

09

04

1

2

fertility

ENSBTAG00000019072

PSD4

Cells

7

8

1

1

7

02

6

4

cellular

process

125

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000019265

AGK

Sengerssyndrom

e

Congenitalcataract

hypertrophic

cardiomyopathy

mitochondrial

myopathy

Cells

7

36

43

52

65

metabolism

ENSBTAG00000019752

BPIFA2B

Nomatch

01

1metabolism

ENSBTAG00000019799

FCAM

R

Cells

07

01

02

01

2

7immunity

ENSBTAG00000019961

ATP4B

Cells

12

03

1

05

cellular

process

ENSBTAG00000020414

DHX37

No

3

33

73

73

413

cellular

process

ENSBTAG00000021073

KIAA1549

PILOCYTIC

ASTROCYTOM

ASOMATIC

httpswwwmousephenotypeor

gdatagenesMGI2669829sect

ionassociations

behaviourneurological

homeostasism

etabolism

hematopoieticsystem

3

04

23

01

21

04

3NC

ENSBTAG00000021233

OR5B17

No

fertility

ENSBTAG00000021375

OR10Q1

No

fertility

ENSBTAG00000021643

EFCAB12

Cells

01

1

9cellular

process

ENSBTAG00000021645

MBD4

Cells

8

43

82

51

10

6cellular

process

ENSBTAG00000022509

DNAH

9

No

02

5

05

5metabolism

ENSBTAG00000022858

LOC783561

Nomatch

fertility

ENSBTAG00000026247

PKHD1L1

Cells

02

01

cellular

process

ENSBTAG00000026323

LYSB

Nomatch

68

metabolism

ENSBTAG00000026350

SPATC1

Cells

05

06

01

01

01

39

NC

ENSBTAG00000027390

RTFDC1

Mice

28

19

21

20

922

13

20

49

NC

ENSBTAG00000031115

Uncharacterized

Nomatch

01

cellular

process

ENSBTAG00000031718

OGFR

Mice

6

17

612

517

39

1fertility

ENSBTAG00000032264

Uncharacterized

Nomatch

52

4

22

NC

ENSBTAG00000032481

DAPL1

No

1

6

07

01

13

10

cellular

process

126

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000032527

ERCC6

CEREBROOCULOFACI

OSKELETAL

SYND

ROME1

COCKAYNE

SYND

ROMEBDe

SanctisCacchione

syndromeMACULAR

DEGENERATION

AGERELATED5UV

SENSITIVE

SYND

ROME1COFS

syndromeCockayne

syndrometype1

Cockaynesyndrome

type2Cockayne

syndrometype3UV

sensitivesyndrome

Cockaynesyndrome

typeB(CSB)De

SanctisCacchione

syndrome(DSC)UV

sensitivesyndrome

(UVS)cerebrooculo

facioskeletal

syndrometype1

(COFS1)

Mice

3

205

22

207

42

cellular

process

ENSBTAG00000032821

SCEL

No

01

02

17

04

NC

ENSBTAG00000033954

PRPS1L1

No

35

metabolism

ENSBTAG00000034991

SLX4IP

Cells

6

21

32

207

33

NC

ENSBTAG00000035226

TOR1AIP1

Cells

9

12

513

816

716

83

NC

ENSBTAG00000035309

OR4F15

No

fertility

ENSBTAG00000035708

LOC527795

Nomatch

01

01

23

metabolism

ENSBTAG00000035985

LOC101909743

OR8K1

No

fertility

ENSBTAG00000036019

RIPPLY3

httpswwwmousephenotypeor

gdatagenesMGI2181192sect

ionassociations

hemopoieticsystem

01

04

01

9

01

malformation

ENSBTAG00000037399

Uncharacterized

Nomatch

2

92

4

24

05

5signalling

ENSBTAG00000038606

DBDR

No

02

01

signalling

ENSBTAG00000038831

PHYHD1

No

2

18

21

23

30

18

16

15

1metabolism

ENSBTAG00000039110

OR8U1

No

fertility

ENSBTAG00000039117

FAM179B

Cells

14

206

31

308

33

cellular

process

ENSBTAG00000039520

SIRPB1

Cells

4

08

32

516

05

12

01

signalling

127

Geneamp

Geneampnam

eampHum

anampdefectamp

Knock5OutsampMouseamp

MouseampPhenotypeamp

AampBamp

CampDamp

EampFamp

GampHamp

IampFunctionamp

ENSBTAG00000040568

Uncharacterized

Nomatch

4

307

54

33

25

cellular

process

ENSBTAG00000045558

LOC100299084

Nomatch

1

22

110

10

09

52

fertility

ENSBTAG00000045588

LOC100298356

Nomatch

metabolism

ENSBTAG00000045709

LOC514057

Nomatch

fertility

ENSBTAG00000046175

Uncharacterized

Nomatch

01

01

cellular

process

ENSBTAG00000046228

Olfactory

receptor

Nomatch

fertility

ENSBTAG00000046239

C10orf12

No

03

104

07

108

06

209

cellular

process

ENSBTAG00000046285

OR8I2

No

fertility

ENSBTAG00000046360

Uncharacterized

Nomatch

NC

ENSBTAG00000046423

LOC618007

Nomatch

NC

ENSBTAG00000046527

LOC100139830

Nomatch

fertility

ENSBTAG00000046590

FAM181A

Cells

09

44

NC

ENSBTAG00000046593

Uncharacterized

Nomatch

20

metabolism

ENSBTAG00000046727

Olfactory

receptor

Nomatch

04

fertility

ENSBTAG00000047150

RTEL

Cells

4

23

64

55

616

cellular

process

ENSBTAG00000047166

SH2D7

Mice

03

101

01

1

02

02

2immunity

ENSBTAG00000047174

Uncharacterized

Nomatch

03

7105

02

01

271

05

03

NC

ENSBTAG00000047317

CL43

Nomatch

01

120

01

immunity

ENSBTAG00000047587

Uncharacterized

Nomatch

22

09

09

35

03

4

06

2cellular

process

ENSBTAG00000047661

FABP12

httpswwwmousephenotypeor

gdatagenesMGI1922747sect

ionassociations

Growthsizebody

01

02

69

metabolism

ENSBTAG00000048021

PNMAL2

No

04

03

2

03

03

02

NC

ENSBTAG00000048153

Uncharacterized

Nomatch

2

05

01

02

01

cellular

process

128

Discussion(

Deleterious recessive variants arenaturally present in outbred animal species

Theydonotcreateproblemsaslongastheyremainatlowfrequencyinrandom

mating populations Under these conditions these variants remain in the

heterozygousstateanddonotproducephenotypiceffectsHoweverassoonas

their frequency increases andmatingoccursbetween related individuals their

deleteriouseffectsbecomeapparentandhighlyaffectthewelfareandviabilityof

animalscarryingtheminthehomozygousstate

In industrialdairy cattlebreedscrossbreeding isgenerallyavoidedandbreeds

areinpracticegeneticisolatesinwhichartificialinsemination(AI)iswidelyused

for rapidly spreading thegeneticprogress in thepopulationAsa consequence

elitesireshavemanythousandandsometimeshundredsofthousanddaughters

Under these conditions mating between relatives becomes practically

unavoidable in the longterm In fact inspiteof theclosecontrolof inbreeding

andofthematingsystemgeneticdefectsregularlyappeartothesurfaceSome

examples are the Complex Vertebral Malformation (CVM) and the Bovine

Leucocyte Adhesion Deficiency (BLAD) in Holstein (Schuumltz et al 2008) and

WeaverdiseaseinBrown(McClureetal2013)Inthesecasestheeffectsofthe

recessivedeleteriousallelesareclearandmoleculardiagnostictoolshavebeen

developedtoavoidtheirfurtherspreadingandtoinitiatetheeradicationprocess

However recessive deleterious alleles may produce their effects in a more

deceitful way for example by affecting fertility inducing early lethality or

influence disease susceptibility All these effects are difficult to trackwith the

current methods of phenotype measurements but have an impact on the

economyofthelivestocksectorandonanimalwelfare

InthispaperwesoughtfordeleteriousrecessiveallelesinItalianHolsteincattle

This breed is recently experiencing a decrease in the genetic trend of fertility

traits (Pryce et al 2014) therefore we started our search from the exome

sequences of 20 sires 19 of which having extreme EBV values for male and

femalefertility

130

The choiceof sequencingexomeswasa compromisebetween completenessof

informationaccuracyofvariantandgenotypedetectionandcostExomesclearly

give incomplete information since they miss all variation in regulatory sites

outside gene boundaries (ENCODE Consortium 2012) These may be very

relevantforthecontroloftraitvariationbuttheyarestillverypoorlyannotated

in livestock On the other hand sequencing exomes yields millions instead of

billionsbasepairsadeepersequencingoftargetregionsandaconsequentmore

reliablevariantandgenotypecalling(FigureS1andS2)Theanalysisofexome

data is also less demanding in terms of computer time (thousands instead of

millions variants are to be analysed) and easier in terms of biological

interpretation Exome sequencing revealed to be very effective in the

identificationofanumberofmutationsaffectinghumanhealth(Bamshadetal

2011 Yang et al 2013) All thesewere defects caused bymutations in single

genes producing clear phenotypic effects havingMendelian inheritance These

conditionsdonotholdinourinvestigationthereforewehadtodevisestrategies

to integrate exome data with other information collected from the Holstein

populationandotherspecies

Theoverallprocedureforidentifythecandidatedeleteriousrecessivewasbased

on the rationale that deleterious recessive alleles in coding sequences are

expectedtobei)causedbyseverealterationoftheproteinsencodedbygenes

ii)mostlyifnotexclusivelycarriedattheheterozygousstateinthepopulation

iii)unevenlydistributedinanimalsofhighandlowfertilityiv)likelyconserved

indifferentspeciesbecauseofselectionconstraintv)likelyassociatedtogenetic

defectsinanyspecieswhenseverelydamaged

A stepwise procedure was set up for reducing candidates to a number

manageable for functional analysis (Figure1) In the first stepmutationswere

discoveredinthesecondfilteredandinthirdassessed

Thisprocedureidentifiesonlysomeofthedeleteriousrecessivesexistinginthe

populationanalysedSincewesequencedonlyexomesandnotwholegenomes

ouranalysesislimitedtocodingsequencesandatthemomenttoSNPvariation

Wethereforemissregulatoryvariationsoutsidegenesand theeffectsof indels

andmorecomplexvariations

131

ThenumberofanimalssequencedislimitedDeleteriousrecessivesareexpected

toberareinthepopulationandwehavesampledonly20animalsfromasubset

ofsiresauthorizedtoAI inItalyAlthoughwehavetakenamongthemthose in

theminus tail of fertility trait thatmighthave ahigher genetic load itrsquos likely

thatwehavemissedothervariantsexistinginthepopulationOntheotherhand

variantscarriedbyAIauthorizedbullsaretheonesthatwillhavealargerimpact

onthepopulationifwidelyspreadbyelitebullsandthereforetheyarethemost

interestingfromabreedingpointofview

Most deleterious variants are expected to be very rare and therefore not

detected by the approach here adopted or detected and afterwards discarded

duringthefilteringprocesses

Sequencing is prone to errors Some real variants may have been discarded

duringQCofsequencedataduetopoorsequencequalityorpoorcoverageofthe

region

Someofthevariationsclassifiedasldquonondeleteriousrdquobythesoftwareusedmay

indeedhave an effect In fact synonymousmutation arenon`neutralwhenever

thedifferenttRNAsforasameaminoacidhavedifferentrelativeabundanceand

translationrates(Goymer2007)

1(Searching(candidate(deleterious(variants(in(exomes(

ExomesproducedafairamountofdataAlltogether5355Mbweresequenced

These represent 9998 of the exome sequences targeted by probes Among

theseonly35hadasequencingdepthacrossanimalslt60XSequencingdepth

is a key factor for variant detection and even more so when the genotype of

sequencedanimalsistobeinferredInvestigatingtheconcordancebetweenthe

genotypes detected at more than 20K SNP by both Beadchip and exome

sequencingwe observed a clear inverse relationship between sequence depth

andnumberofdiscordances(TableS3andFigureS3)Theaveragesequencing

depth expected (50X) and realised in practice (4058) in this investigation

yieldedratheraccurategenotypessincetheaverageconcordanceobservedwas

over97and increased toover99whentwooutlieranimalswereremoved

132

from the dataset The reason for the outlier behaviour of these samples waslikely a suboptimal quality of DNA submitted to sequencing Overall almost215000 variants passed quality controls and among them 192440 autosomalSNPs have been investigated in detail As found in other investigations mostvariants were rare (Peloso et al 2014) For example the MAF distributionobservedin46904SNPsclassifiedneutralhasameanof0197andamedianof0167values(FigureS6)

ThefirststepinourresearchwastoidentifySNPscausingseverealterationsinthe proteins coded by genes These were relevant changes in conformationalterationsofsplicingsitesortruncationsofproteinsToaccomplishthistaskwereliedontwosoftwareSIFTandANNOVARthatidentified4902suchvariationsin3423genes Interestingly the subsetofdeleteriousvariantshas significantlylowermean (0156)andmedian (010) compared toneutral variations (FigureS6) This subset is therefore at least enriched in variants under purifyingselectionhavingadeleteriouseffectatthepopulationlevel

Thenumberofdeleteriousallelesdetectedishighanditisunlikelytheyallhavea relevant phenotypic effect even if most variants are rare In fact variantsdeleterious for a protein are not automatically deleterious for an organism asthe protein change induced by the variant although relevant may not affectproteinfunctionvariationmayoccurinonememberofagenefamilysothatitsfunction may be compensated by other family members the protein functionmay be compensated by other proteins or the affected metabolic pathwaycompensated by alternative pathways The challenge was therefore to filter asubsetofvariantslikelyhavingaphenotypiceffecttobeproposedforvalidationatalargerscaleToaccomplishthistaskwesoughtforadditionalinformationinexomesequencesandinalargepopulationofHolsteinsiresgenotypedwiththebovineHDBeadchip

( (

133

2(Filtering(deleterious(variants(

The second step consisted in filtering the several thousand deleterious SNPsidentified in exomes to search for SNPs and genes that worth deeperinvestigationWeusedtwostrategiestoaccomplishthistask

21$ Distribution$ of$ deleterious$ variants$ in$ fertility$ EBV$ plus9variant$ and$

minus9variant$animals$

Fertility traits are controlled by a high number of genes and are highlygenetically heterogeneous Even using a selected subset of variants identifiedfrom exome sequencing classic association analyses would need sample sizesmuch larger that the one available in this investigation to have a reasonablestatistical power (Stitziel et al 2011) The first approachwe appliedwas theanalysisof SNPallelic andgenotypicdistributionbetween theminusandplus`variant tails of male and female fertility EBVs Given the very low number ofanimals sequencedwedecided to use a simplemodel to exclude from furtheranalyses all markers showing no evidence of association with the EBV tailsratherthantofindsignificantassociations

We divided samples in cases (minus`variant animals for fertility EBV) andcontrols (plus`variant animals) and tested all possible models of inheritanceimplemented in thePlinkpackage for tackling simple genetic defectsWe thenused an empirical point`wise significant threshold based on permutations tohighlight SNPs having alleles or genotypes showing an uneven distributionbetweencasesandcontrolsandconsideredtheseasworthfurtherinvestigationAtotalof112SNPsin101genespassedthisfilteringprocess

$

22$Detection$of$regions$carrying$deleterious$mutation$from$800K$SNP$data

The analysis of exomes alone was therefore insufficient to tag recessivedeleteriousgenesinItalianHolsteinandwesoughtforadditionalinformationin1009ItalianHolsteinbullsgenotypedathighdensitythatarepartofthegenomicselection program run by ANAFI The rationale of the approach is that

134

deleterious recessives are expected to be absent or rare in the homozygous

status in the population Hence they may be tagged searching for genome

regions having markers or haplotypes lacking or having significantly less

homozygotesthanexpectedunderaneutralmodel

We used two complementary approaches to identify these candidate regions

One(OHW)searchedforverystrongsignalsproducedbysinglemarkersoutof

Hardy`Weinberg equilibrium at Ple847x10`8 A minimum of three significant

markersinaslidingwindowsof10wassettoreachsignificancetocapturemore

reliablesignalsavoidexcessivesinglemarkernoiseandaccount forapossible

Wahlund effect due to presence of the genetic structure in the population

difficulttoavoidinthesetofprogenytestedbullsanalysedThesecondapproach

(LOH) captured more moderate signals but from multiple markers in larger

regions Both approaches were designed to detect signals in regions in which

deleteriousvariantsarestillinsegregationandarenotexcessivelyrare

Asexpectedthetwoapproachesidentifiedregionsthatonlypartiallyoverlapped

(TablesS5S6eS7)Inparticular5regionsoutofthe11identifiedbyOHWwere

in common to the 240 identified by LOH Two of these regions overlap with

previous investigations that used the BovineSNP50 Beadchip Sahana et al

(2013) found a region candidate to carrypotential lethal haplotypes inNordic

Holstein on BTA7 in the region 6708007`10907840 This interval overlaps

withbothBTA7regionsdetectedinthisstudyVanRadenetal(2011)detected

inUSHolsteinalargeregiononBTA15spanning76Mb`82M(onBTA30)that

includedLRP4 gene responsible for themulefootdefectOuranalyseswith the

HDBeadchipindicatesontheonesidethatthelargeregiondetectedbySahana

etal(2013)mayconsistintwonearbyregionsOntheothersidethesignalwe

detectedonBTA15doesnot includetheLRP4 locusthat iscompletely fixed in

theItalianbullspopulationanalysed

In total83mutations in66genes fell in eitherorbothOHWandLOHregions

These were joined to the SNPs output of the EBV tail analyses for further

assessment

(

135

3(Assessing(filtered(deleterious(variants(

Different characteristics of the filtered deleterious variants were assessed to

identify among them strong candidates to submit to a large`scale validation

studyTheassessmentphasepermittedtogainadditionalinformationinfavour

or against SNP and gene candidature to be deleterious We quantified this

evidenceusingasubjectivescore

31$Comparative$genomics$analysis$

In the case of comparative genomics the score quantified the level of

conservationof thealternativeaminoacidresidues inducedbySNPvariants in

ItalianHolsteincattleComparisonwasrunagainstaset100vertebratespecies

that included 62 mammals The level of conservation is a good proxy for the

expected severity of a mutation In the case of the 46 SNPs that induced the

change of an amino acid highly conserved in the species investigatedwemay

hypothesize a strong selection pressure in favour of the maintenance or the

invariant amino acid at that position Therefore the amino acid change or the

stopcodonweobserved incattlemayaffectnotonlytheproteinstructureand

function but also the phenotype of homozygous animals Interestingly in all

thesecasestheSNPallelecodingforthenonconservedvariantswastheminor

allele

Converselyinallcasesofhighvariabilityofthetargetaminoacidacrossspecies

(asmanyassevendifferentaminoacidshavesometimesbeenobserved)poorly

supports the presence of phenotypic effects due to its change Sometimes the

informationcollectedisagainstthepresenceofadeleteriousmutationasinthe

case of the presence in different species of both alleles identified by exome

sequencingThis suggests thatneither of these is greatly affecting the viability

andfitnessofanimalscarryingthem

Wealsoobservedanumberoffailures(N=45)intheliftoverofSNPcoordinates

fromcattletohumanorinproteinalignmentThishappenedforexampleinthe

caseofuncharacterized cattleproteins thathavenoorthologous in thehuman

genome and of someolfactory receptors that belong to a highly variable gene

136

family Even if the failure of alignment is an indication of poor proteinconservationacrossspeciestheconservationofthetargetaminoacidcouldnotbeassessesandinthesecaseswepreferredtoconsidertheinformationmissingandtoscoreitaccordingly

32$Haplotype$analysis$

Asecondassessmentofthefilteredcandidateswasontheirbehaviourinthesirepopulation The ideal test would be the direct genotyping of the SNPs in thepopulationand theevaluationof theirallelicandgenotypicproportions In theabsenceofthisinformationwefirstinferredtheHDhaplotypeassociatedtoSNPallelesandassessedthebehaviourofthehaplotypeinthepopulationassumingthisasasurrogateoftheSNPbehaviourSincedeleteriousallelesareexpectedtoberareandthis isalsowhatweobserved fromcomparativegenomicanalysesweinvestigatethefrequencyonlyofthehaplotypecarryingtheminorSNPalleleagainsttheexpectationto findthecorrespondinghaplotypeneverhomozygousor a number of times much lower than expected from its frequency in thepopulation The strategy yielded some interesting confirmation but was onlypartiallysuccessful

Firstofalltheidentificationofauniquehaplotypecarryingtherarevariantwasnot always possible or straightforward In some cases carriers of a same rarevariantdidnotshareexactlythesamehaplotypeWeacceptedsomevariationifthe haplotype remained clearly distinguishable from those of animals notcarryingtherarevariantbutatthesametimereducedthestringencyofthetestInquiteanumberofcasesasamehaplotypecarriedbothcandidatedeleteriousSNP variants This may occur for example in the case of recent mutations IntheselastcasesthetestcouldnotbecarriedoutFinallymostrarevariantswereassociatedtorarehaplotypesthatarenotlikelytohavehomozygousindividualsinthepopulationindependentlyoftheSNPeffect

Thisassessmentwas thereforeonly significant fora fewhaplotypes thathadalarge number of heterozygous individuals and no or only a very fewhomozygotesandforthosethathadalargenumberofhomozygotes

137

33$Functional$analysis$

The final assessment was the analysis of the function of genes carrying the

candidate deleterious SNP This has some limitation since knowledge of gene

function is only partial and mainly gained from investigations on species

different from cattle In any case we considered the association to genetic

defectsinhumanandtheinductionofseverephenotypesinmouseknockoutas

a positive indication of the likely phenotypic effect of the severe protein

alterationweobservedincattleInadditiongenefunctionandexpressionprofile

was evaluated in search for evidence of gene involvement in fertility

developmentorprocessesfundamentalforcellandorganismviability

Amonggenesidentified22areassociatedtohumansyndromes(Table4)Eleven

syndromesareneurologicaldiseasesaffectingbraindevelopmentand inducing

mental retardation (eg Donnay`Barrow Syndrome and Autosomal recessive

primary microcephaly) the others influence development (eg

Cerebrooculofacioskeletal Syndrome 1 and Adams`Oliver syndrome) or organ

function(egCongenitalpulmonaryalveolarproteinosis)

Most of the same defects translated into Holstein would induce physical and

behavioural anomalies likely causingyoungbulls tobe excluded fromprogeny

testing Itshould in factbeconsideredthattheanimalsweanalysedareonthe

onesidethosethatwillmostinfluencethegeneticmakeoffuturegenerationbut

are not a random sample of Holstein animals These bulls have been progeny

testedandauthorisedtobeusedinartificialinseminationinItalyCandidatesto

progenytestingderivefromthematingofthebest1siresandbest2damsof

the Italian Holstein population At the age of 4 months they are taken to the

Italian Holstein breeder association (ANAFI) test station where they are

subjected to performance test sanitary controls and behavioural tests

Thereafterthequalityoftheirsemenischeckedbeforeauthorizationisgranted

toenterinprogenytestInthissampleofanimalsitisthereforeexpectedtofind

never at the homozygous state mutations causing lethality physical

malformation abnormal behaviour and semen infertility Also those having a

138

majoreffectonsemenqualityandhighsusceptibilitytodiseasesareexpectedto

berarelycarriedatthehomozygousstatus

Association with human syndromes is a strong evidence of the potential

phenotypiceffectofmutationsalteringthegenestructureHoweveritshouldbe

considered that variants found in humans are different fromvariants found in

cattle and that the severity of the phenotype induced by a deleterious gene is

often variable even within species Therefore we evaluated this information

togetherwiththeotherindirectevidencesofvariantstobedeleterious

WealsoenquiredthewebsiteoftheInternationalMousePhenotypeConsortium

(httpswwwmousephenotypeorg) to search for the effects of null alleles in

mice The few genes forwhichnull`mice line is available are involved in basic

functionsfortheorganismthatspanfromvision(IAH1andEPB41L4A)toaging

(RGP1)andtissuesdevelopment(FABP12RIPPLY3RGNEF)orfertility(CFAP96)

Aswithgenesassociatedtohumansyndromesmostoftheabnormalphenotypes

observedinknockmousewouldhamperyoungbullstoenterprogenytesting

Gene Ontology (GO) was evaluated on the entire gene set No enrichment on

particularcategoriesGOtermswasobservedThemostrepresentedcategories

in Biological Processes were as expected ldquometabolic processrdquo and ldquocellular

processrdquo however lower level terms ldquoreproductionrdquo and ldquodevelopmental

processrdquo are represented with meaningful hits The analysis of Molecular

Function indicates thatmany of the hits are enzymes (ldquocatalytic activityrdquo) and

haveregulatoryroles(ldquobindingrdquoandldquoenzymeregulatoractivityrdquo)Thisresultcan

be interpreted indifferentwaysOn theone sidedeleterious allelesmay exert

their deleterious effect in a number of different cellular and developmental

processes the correct process is one while there is many options to make it

wrongAlsothenumberofgenesinvestigatedisrelativelylowandformanyof

them there is little or no information available (eg 11 genes code for a

completelyuncharacterizedprotein)Inadditionsomeofthemmayturnoutnot

tobereallydeleteriousandaddnoisetotheGOanalysis

As a final step all the previous information (GO expression patterns known

function)havebeenused to estimate a scoremeasuring the importanceof the

gene function forcattle survivalandreproductionThescoringof functionwas

139

particularlydifficultGOcategoriesweregeneralandallfunctionstaggedhavearelevant importance for animal survival andproductionTherefore in this casewedidnrsquotusethewholescorerangeplanned(from`1to1)andgavenonegativescoresApositivescoreof1wasattributedonly togenesplayingan importantroleinfertilityanddevelopment

$

34$Strong$candidate$deleterious$variants$

Thescoringprocessyielded21variantswithatotalscore=035variantswithpositive and 135 with negative values (Figure 2 and Table 5) Although theprocesswasnoterrorfreeasitreliedonincompleteinformationandhadsomedegreeofsubjectivityparticularlyinthechoiceofrelativeweightsitidentifiedaset of variants that may be considered strong candidates to have deleteriouseffects in the Italian Holstein breed Variants were evaluated individually andthose in a same genes had sometimes different scores For example the fivevariantscarriedbyALPK2hadscoresrangingfrom`7to`2

140

Figure2Graphicalrepresentationoftotalscoreforeachvariants

minus505

Variation

Score

1_645923001_645985561_1381698111_1464490851_150972144

2_7478962_270362092_270455392_375933832_555609862_904645543_75418123_110403993_110404003_188945763_1137877703_1203356984_53767714_265405974_265406204_749514994_1033779064_1057246734_1130233264_1178417155_445557055_445557425_757447955_940308955_940309556_207762566_382803486_863518016_927092906_1075769256_1078851146_1090916426_1166055226_1186534167_13339757_56553577_97149147_167941797_168358027_168621947_197404097_197409057_263293538_602606098_603324848_658345648_704101558_760297748_770028908_872798628_1117618168_1128416959_83888819_83889249_237928389_510133929_1047396559_10515702910_171251510_1584300310_2802235210_2802311010_8290312110_8517371111_4630576711_4675390311_4740957711_5979141011_7832201011_8794387511_9548610911_9945389411_9946100112_1190618112_1247843312_1248084512_1413689712_5304908112_9076120913_381566413_1178793413_1178793813_5247168413_5393493013_5393494713_5393985313_5454042513_5504774013_5998820513_6304575013_6316961013_6539670814_197749414_364069514_3556894514_4688575014_5718402214_6863536315_18267915_18331715_18340415_248692

15_3018920915_3504878815_4630194115_7503314015_7503764515_8024448915_8025571015_8028875915_8038974515_8048559315_8055593115_8094968915_8275455615_8292533216_456201716_3831963616_6257435517_5309539517_5311265517_6609153117_7237753218_2562355018_3495651318_4637204118_5215480618_5408127718_5413095518_5781976819_2144410319_3107215219_3111397219_3279084319_4225178519_5139519419_5619117519_5726625120_763980120_2341043120_5888423620_6417173921_623911921_2034587821_2034628921_2683773321_3100359921_5527419521_5582646321_5817338421_5912071921_6048248421_6048251021_6284314222_5241595922_5242173422_5242204622_5242610422_5242666822_5695165722_5697484722_5779767423_4588505224_2132460424_2145336424_4676490924_5032719524_5815817824_5815862024_5815878424_5820400324_5820447925_3743318026_1811648726_2571599626_2576457027_503107827_3283255728_28422828_169040328_3571880728_4408176529_198588129_753052329_753066329_26397931

141

Table5SingleanalysesscoreandtotalforeachcandidatevariantsChrchrom

osom

ePosphysicalposition

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

164592300

ENSBTAG00000009014

UPK1B

02

I1

01

2

164598556

ENSBTAG00000009014

UPK1B

0I1

I1

00

I2

1138169811

ENSBTAG00000002966

DNAJC13

03

10

15

1146449085

ENSBTAG00000017418

RRP1B

I3

I1

I1

00

I5

1150972144

ENSBTAG00000036019

RIPPLY3

I3

I1

I1

I1

1I5

2747896

ENSBTAG00000013916

HERC2

30

10

04

227036209

ENSBTAG00000004555

LRP2

I3

I1

10

0I3

227045539

ENSBTAG00000004555

LRP2

I3

31

00

1

237593383

ENSBTAG00000032481

DAPL1

00

I1

00

I1

255560986

ENSBTAG00000032264

Uncharacterized

3I3

I1

00

I1

290464554

ENSBTAG00000017436

ALS2CR11

0I1

I1

00

I2

37541812

ENSBTAG00000046175

Uncharacterized

00

00

00

311040399

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

311040400

ENSBTAG00000013789

OR6K6

I3

I3

I1

00

I7

318894576

ENSBTAG00000004772

THEM

40

I3

I1

00

I4

3113787770

ENSBTAG00000000149

USP40

11

I1

01

2

3120335698

ENSBTAG00000002809

CAPN10

I3

21

00

0

45376771

ENSBTAG00000001500

FIGNL1

01

I1

00

0

426540597

ENSBTAG00000033954

PRPS1L1

1I3

I1

00

I3

426540620

ENSBTAG00000033954

PRPS1L1

1I1

I1

00

I1

474951499

ENSBTAG00000003508

CFAP69

I3

I3

I1

10

I6

4103377906

ENSBTAG00000021073

KIAA1549

0I1

11

01

4105724673

ENSBTAG00000019265

AGK

01

10

02

142

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

4113023326

ENSBTAG00000014768

ZNF786

I3

I3

I1

00

I7

4117841715

ENSBTAG00000017505

PAXIP1

0I1

I1

00

I2

544555705

ENSBTAG00000026323

LYSB

I3

3I1

00

I1

544555742

ENSBTAG00000026323

LYSB

I3

0I1

00

I4

575744795

ENSBTAG00000009064

CSF2RB

I3

I3

10

0I5

594030895

ENSBTAG00000009444

SLC15A5

I3

I1

I1

00

I5

594030955

ENSBTAG00000009444

SLC15A5

I3

I3

I1

00

I7

620776256

ENSBTAG00000008797

Uncharacterized

I3

I3

00

0I6

638280348

ENSBTAG00000007568

MEPE

0I3

I1

00

I4

686351801

ENSBTAG00000003523

UGT2B15

1I1

I1

00

I1

692709290

ENSBTAG00000010954

ART3

I3

2I1

00

I2

6107576925

ENSBTAG00000038606

DBDR

0I3

I1

00

I4

6107885114

ENSBTAG00000001505

GRK4

I3

1I1

00

I3

6109091642

ENSBTAG00000007001

SLC26A1

1I3

I1

00

I3

6116605522

ENSBTAG00000014058

LDB2

01

I1

00

0

6118653416

ENSBTAG00000012458

PSAPL1

I3

I1

I1

00

I5

71333975

ENSBTAG00000015602

C7H5orf45

I3

I3

I1

00

I7

75655357

ENSBTAG00000045588

LOC100298356

1I3

I1

00

I3

79714914

ENSBTAG00000046423

LOC618007

00

I1

00

I1

716794179

ENSBTAG00000012314

LDLR

02

10

03

716835802

ENSBTAG00000009568

KANK2

01

I1

00

0

716862194

ENSBTAG00000009569

DOCK6

01

10

02

719740409

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

719740905

ENSBTAG00000000414

FUT5

I3

I3

I1

00

I7

143

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

726329353

ENSBTAG00000004860

SLC27A6

0I1

I1

00

I2

860260609

ENSBTAG00000011420

CA9

I3

I1

I1

00

I5

860332484

ENSBTAG00000011433

RGP1

I3

0I1

10

I3

865834564

ENSBTAG00000035708

LOC527795

00

I1

00

I1

870410155

ENSBTAG00000005466

C8H8orf58

I3

I3

I1

00

I7

876029774

ENSBTAG00000015027

Clusterin

00

I1

10

0

877002890

ENSBTAG00000012921

KIF24

12

I1

00

2

887279862

ENSBTAG00000002220

SPTLC1

00

10

01

8111761816

ENSBTAG00000006532

CDK5RAP2

I3

I3

11

1I3

8112841695

ENSBTAG00000013838

ALLC

01

I1

00

0

98388881

ENSBTAG00000015335

BAI3

00

I1

00

I1

98388924

ENSBTAG00000015335

BAI3

00

I1

00

I1

923792838

ENSBTAG00000046593

Uncharacterized

03

00

03

951013392

ENSBTAG00000018528

USP45

02

I1

00

1

9104739655

ENSBTAG00000018810

THBS2

03

10

15

9105157029

ENSBTAG00000000918

ERMARD

I3

21

00

0

10

1712515

ENSBTAG00000014750

EPB41L4A

0I1

I1

I1

0I3

10

15843003

ENSBTAG00000009030

NOX5

I3

I3

I1

00

I7

10

28022352

ENSBTAG00000035309

OR4F15

I3

1I1

00

I3

10

28023110

ENSBTAG00000035309

OR4F15

I3

I1

I1

00

I5

10

82903121

ENSBTAG00000012565

PCNX

I3

2I1

01

I1

10

85173711

ENSBTAG00000011683

NUM

BI3

0I1

00

I4

11

46305767

ENSBTAG00000002288

NT5DC4

0I3

I1

00

I4

11

46753903

ENSBTAG00000019072

PSD4

0I1

I1

00

I2

144

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

11

47409577

ENSBTAG00000009033

TEX37

I3

I2

I1

10

I5

11

59791410

ENSBTAG00000006027

USP34

02

I1

01

211

78322010

ENSBTAG00000015490

HS1BP3

00

I1

00

I1

11

87943875

ENSBTAG00000001140

IAH1

I3

3I1

10

011

95486109

ENSBTAG00000017576

GPR144

I3

I3

I1

00

I7

11

99453894

ENSBTAG00000038831

PHYHD1

11

I1

00

111

99461001

ENSBTAG00000010601

DOLK

12

10

04

12

11906181

ENSBTAG00000001133

VWA8

I3

1I1

00

I3

12

12478433

ENSBTAG00000014752

AKAP11

1I3

I1

I1

0I4

12

12480845

ENSBTAG00000014752

AKAP11

1I1

I1

I1

0I2

12

14136897

ENSBTAG00000016691

LACC1

I3

I3

I1

00

I7

12

53049081

ENSBTAG00000032821

SCEL

I3

I3

I1

00

I7

12

90761209

ENSBTAG00000019961

ATP4B

I3

I3

I1

00

I7

13

3815664

ENSBTAG00000034991

SLX4IP

00

I1

00

I1

13

11787934

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

11787938

ENSBTAG00000008650

CAMK

1D

I3

0I1

00

I4

13

52471684

ENSBTAG00000013753

DDRGK1

02

I1

00

113

53934930

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

53934947

ENSBTAG00000039520

SIRPB1

00

I1

00

I1

13

53939853

ENSBTAG00000039520

SIRPB1

0I3

I1

00

I4

13

54540425

ENSBTAG00000047150

RTEL

I3

0I1

00

I4

13

55047740

ENSBTAG00000031718

OGFR

01

I1

00

013

59988205

ENSBTAG00000027390

RTFDC1

01

I1

00

013

63045750

ENSBTAG00000009144

BPIFA2A

0I3

I1

00

I4

145

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

13

63169610

ENSBTAG00000019752

BPIFA2B

0I1

I1

00

I2

13

65396708

ENSBTAG00000006021

CEP250

0I3

I1

00

I4

14

1977494

ENSBTAG00000026350

SPATC1

I3

I3

I1

00

I7

14

3640695

ENSBTAG00000015985

GPR20

00

I1

00

I1

14

35568945

ENSBTAG00000037399

Uncharacterized

30

I1

00

2

14

46885750

ENSBTAG00000047661

FABP12

0I1

I1

00

I2

14

57184022

ENSBTAG00000026247

PKHD1L1

I3

3I1

00

I1

14

68635363

ENSBTAG00000017165

MATN2

01

I1

00

0

15

182679

ENSBTAG00000046228

Olfactoryreceptor

00

I1

0o

I1

15

183317

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

183404

ENSBTAG00000046228

Olfactoryreceptor

00

I1

00

I1

15

248692

ENSBTAG00000045558

LOC100299084

00

I1

0o

I1

15

30189209

ENSBTAG00000005357

VPS11

02

I1

00

1

15

35048788

ENSBTAG00000005340

SERGEF

0I3

I1

00

I4

15

46301941

ENSBTAG00000002289

NLRP14

I3

I3

I1

00

I7

15

75033140

ENSBTAG00000012053

ACCSL

1I3

I1

00

I3

15

75037645

ENSBTAG00000012053

ACCSL

I3

I3

I1

00

I7

15

80244489

ENSBTAG00000046527

LOC100139830

I3

0I1

00

I4

15

80255710

ENSBTAG00000046285

OR8I2

10

I1

00

0

15

80288759

ENSBTAG00000001291

OR8H3

1I1

I1

00

I1

15

80389745

ENSBTAG00000045709

LOC514057

13

I1

00

3

15

80485593

ENSBTAG00000035985

LOC101909743OR8K1

1I1

I1

00

I1

15

80555931

ENSBTAG00000022858

LOC783561

I3

0I1

00

I4

15

80949689

ENSBTAG00000039110

OR8U1

1I3

I1

00

I3

146

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

15

82754556

ENSBTAG00000021375

OR10Q1

01

I1

00

015

82925332

ENSBTAG00000021233

OR5B17

I3

0I1

00

I4

16

4562017

ENSBTAG00000019799

FCAM

RI3

I3

I1

00

I7

16

38319636

ENSBTAG00000014794

SCYL3

I3

0I1

00

I4

16

62574355

ENSBTAG00000035226

TOR1AIP1

00

I1

00

I1

17

53095395

ENSBTAG00000020414

DHX37

1I1

I1

00

I1

17

53112655

ENSBTAG00000020414

DHX37

0I3

I1

00

I4

17

66091531

ENSBTAG00000010847

FOXN4

0I1

I1

00

I2

17

72377532

ENSBTAG00000008996

SFI1

0I3

I1

00

I4

18

25623550

ENSBTAG00000005170

GPR114

0I3

I1

01

I3

18

34956513

ENSBTAG00000001286

ELMO

33

3I1

11

718

46372041

ENSBTAG00000015912

DMKN

0

1I1

00

018

52154806

ENSBTAG00000001262

IRGQ

I3

1I1

00

I3

18

54081277

ENSBTAG00000009337

PNMA

L1

0I3

I1

00

I4

18

54130955

ENSBTAG00000048021

PNMA

L2

I3

1I1

00

I3

18

57819768

ENSBTAG00000003231

LIM2

1

I3

10

0I1

19

21444103

ENSBTAG00000011011

SSH2

I3

I3

I1

00

I7

19

31072152

ENSBTAG00000022509

DNAH

9I3

0I1

00

I4

19

31113972

ENSBTAG00000022509

DNAH

90

0I1

00

I1

19

32790843

ENSBTAG00000015294

COX10

12

10

04

19

42251785

ENSBTAG00000013685

KRT31

I3

0I1

00

I4

19

51395194

ENSBTAG00000015980

FASN

1I1

10

01

19

56191175

ENSBTAG00000007910

EXOC7

3I3

I1

00

I1

19

57266251

ENSBTAG00000011387

NAT9

12

I1

00

2

147

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

20

7639801

ENSBTAG00000005633

RGNEF

1I1

I1

10

0

20

23410431

ENSBTAG00000008871

DDX4

1I3

I1

00

I3

20

58884236

ENSBTAG00000005514

TRIO

I3

0I1

00

I4

20

64171739

ENSBTAG00000005021

SEMA5A

01

10

02

21

6239119

ENSBTAG00000016495

LINS

02

10

03

21

20345878

ENSBTAG00000012326

LOC101906218

00

I1

00

I1

21

20346289

ENSBTAG00000012326

LOC101906218

I3

0I1

00

I4

21

26837733

ENSBTAG00000047587

Uncharacterized

10

00

01

21

31003599

ENSBTAG00000047166

SH2D7

0I3

I1

00

I4

21

55274195

ENSBTAG00000039117

FAM179B

0I3

I1

00

I4

21

55826463

ENSBTAG00000014001

MAP1A

0I3

I1

00

I4

21

58173384

ENSBTAG00000009836

CHGA

I3

I1

I1

00

I5

21

59120719

ENSBTAG00000046590

FAM181A

01

I1

00

0

21

60482484

ENSBTAG00000017096

ERICH1

0I3

I1

00

I4

21

60482510

ENSBTAG00000017096

ERICH1

01

I1

00

0

21

62843142

ENSBTAG00000012833

ATG2B

0I3

I1

00

I4

22

52415959

ENSBTAG00000015839

MAP4

00

I1

00

I1

22

52421734

ENSBTAG00000015839

MAP4

1I1

I1

00

I1

22

52422046

ENSBTAG00000015839

MAP4

I3

I3

I1

00

I7

22

52426104

ENSBTAG00000047174

Uncharacterized

I3

00

00

I3

22

52426668

ENSBTAG00000047174

Uncharacterized

10

00

01

22

56951657

ENSBTAG00000021645

MBD4

1I1

I1

00

I1

22

56974847

ENSBTAG00000021643

EFCAB12

0I1

I1

00

I2

22

57797674

ENSBTAG00000031115

Uncharacterized

01

00

01

148

Mutation

Score

Chr

Pos

Gene

Genenam

eHaplotypes

Conservation

Hum

andefect

Mousemodel

Function

TotalScore

23

45885052

ENSBTAG00000005248

OFCC1

00

I1

00

I1

24

21324604

ENSBTAG00000000998

SLC39A6

02

I1

00

1

24

21453364

ENSBTAG00000011622

C18orf21

I3

I3

I1

00

I7

24

46764909

ENSBTAG00000015210

LOXHD1

0I3

10

0I2

24

50327195

ENSBTAG00000002660

CCDC11

I3

11

01

0

24

58158178

ENSBTAG00000014284

ALPK2

I3

I1

I1

00

I5

24

58158620

ENSBTAG00000014284

ALPK2

0I1

I1

00

I2

24

58158784

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204003

ENSBTAG00000014284

ALPK2

I3

I3

I1

00

I7

24

58204479

ENSBTAG00000014284

ALPK2

I3

2I1

00

I2

25

37433180

ENSBTAG00000040568

Uncharacterized

00

I1

00

I1

26

18116487

ENSBTAG00000046239

C10orf12

0I1

I1

00

I2

26

25715996

ENSBTAG00000010522

Uncharacterized

10

00

01

26

25764570

ENSBTAG00000046727

Olfactoryreceptor

00

I1

00

I1

27

5031078

ENSBTAG00000007340

LOC101906349

0I1

I1

00

I2

27

32832557

ENSBTAG00000011684

RAB11FIP1

10

I1

00

0

28

284228

ENSBTAG00000000079

CCSAP

0I1

I1

00

I2

28

1690403

ENSBTAG00000048153

Uncharacterized

10

00

01

28

35718807

ENSBTAG00000047317

CL43

1I1

I1

00

I1

28

44081765

ENSBTAG00000032527

ERCC6

I3

01

00

I2

29

1985881

ENSBTAG00000004081

FAT3

30

I1

00

2

29

7530523

ENSBTAG00000046360

Uncharacterized

01

00

01

29

7530663

ENSBTAG00000046360

Uncharacterized

1I1

00

00

29

26397931

ENSBTAG00000013568

UEVLD

I3

I3

I1

00

I7

149

Interestingly13ofthe22genesassociatedtohumansyndromescarryvariants

appearing in the short list of the 35 positives and only 6 in that of 135 with

negative values This in spite of the low weight (from 1 to 1) given to this

parameterEightofthe35positivevariantsareinproteinsstilluncharacterized

Sixvariantshadascoreof4orhigherThehighestscore(totalscore=7)wasfor

ELMO3 (Engulfment and cell motility gene 3) This gene is involved in cell

motility and migration ELMO3 is also a homolog of C( elegans Ced12 that is

required for many developmental processes included gonadal morphogenesis

Mutant for this gene in C( elegans suffer from abnormal development usually

with lethal consequences or sublethal developmental defects including small

embryosdelayeddevelopmentandreducedfertility(Gumiennyetal2001)In

human it is expressed in testis prostate and ovary

(httpwwwgenecardsorgcgibincarddispplgene=ELMO3)

ThegeneDNAJC13(DnaJ(Hsp40)homologsubfamilyCmember13)hadatotal

score of 5 This gene is involved in the onset of Parkinson disease DNAJC13

regulates the dynamics of clathrin coats on early endosomes Cellular analysis

shows that the mutation Arg855Ser identified by VilarintildeoGuumlell et al (2014)

confers a gainoffunction that impairs endosomal transportApossible roleof

DNAJC13geneinTourettesyndromechronicticdisorderisunderinvestigation

(Sundarametal2011)

A second genehad a scoreof 5THBS2 (Thrombospondin2) It belongs to the

thrombospondin family and is a powerful inhibitor of tumour growth and

angiogenesis It is highly expressed in testis in fetal Leydig cell Hirose et al

(2008) found a significant association between an intronic SNP in the THBS2

gene and lumbar disc herniation Thrombospondin2 is also a matricellular

proteinfoundinhumanserumDeletionofTSP2causesagedependentdilated

cardiomyopathy (Hanatani et al 2014) Mutant mice for this gene display a

variety of mutant phenotype Kyriakides et al 1998 suggest that THBS2

modulates the cell surface properties of mesenchymal cells affecting cell

functionssuchasadhesionandmigration

150

Three genes had a score of 4 all three are associated to human syndromes

Mutation in HERC2 (HECT and RLD domain containing E3 ubiquitin protein

ligase2)isinvolvedinautosomalrecessivementalretardation38(Puffenberger

etal2012)Jietal(2000)identifiedinmutantmouseneuromuscularsecretory

vesicledefectsandspermacrosomedefectsMoreovergeneticvariationsinthis

gene are associated with skinhaireye pigmentation variability Mutations in

DOLK gene (dolichol kinase) are associated with dolichol kinase deficiency

(Lefeberetal2011)COX10(cytochromecoxidaseassemblyhomolog10)gene

is associated to mitochondrial complex IV deficiency causing Leigh syndrome

(Antonickaetal2003)

Conclusions)

Exomesequencingthereforeprovidedvaluableinformationoncodingsequence

variants and on their potential effect on protein function As a group variants

classifiedasdeleteriouswererarerthanvariantsclassifiedasneutralSincethis

isanexpectedeffectofpurifyingselectionthissetofvariantwaslikelyenriched

forvariantshavingaphenotypiceffect

Theinclusionintheanalysesofplusandminusvariantanimalsforfertilitytraits

likelypermittedtosequencethemostimportantvariantsinfluencingthesetraits

butthelownumberofanimalssequencedcombinedwiththecomplexityofthe

traitpermittedtofindonlyasuggestiveassociationbetweenSNPsandtraitsItrsquos

likely that amuch larger number of individuals is to be sequenced if genome

wideassociationisthestrategyoneswanttopursue

We explored an alternative strategy to confirm suggestivedeleterious variants

based on indirect evidence of the variant effect by assessing i) evolutionary

constraintsassociatedtotheaminoacidresiduecodedbythevariantii)variant

behaviour in a large population In the absence of direct genotyping of the

candidatevariantsweusedhaplotypeinformationassurrogateiii)functionand

effectsinotherspecies

151

We attempted to quantify these criteria through a score subjective and only

indicativebutpermitted tohighlighted35genes so farneglected in cattlebut

deservingfurtherattentionandpotentiallyhavingdeleteriouseffectsinHolstein

andothercattlebreedsThesemaybepartoftherecessivedeleterioussetthat

constitutethegeneticloadofItalianHolstein

References)AntonickaHLearySCGuercinGHAgarJNHorvathRKennawayNG

HardingCOJakschMShoubridgeEA2003MutationsinCOX10

resultinadefectinmitochondrialhemeAbiosynthesisandaccountfor

multipleearlyonsetclinicalphenotypesassociatedwithisolatedCOX

deficiencyHumMolGenet122693ndash2702doi101093hmgddg284

AshwellMSHeyenDWSonstegardTSVanTassellCPDaYVanRaden

PMRonMWellerJILewinHA2004DetectionofQuantitativeTrait

LociAffectingMilkProductionHealthandReproductiveTraitsin

HolsteinCattleJournalofDairyScience87468ndash475

doi103168jdsS00220302(04)731860

AstonKICarrellDT2009GenomeWideStudyofSingleNucleotide

PolymorphismsAssociatedWithAzoospermiaandSevere

OligozoospermiaJournalofAndrology30711ndash725

doi102164jandrol109007971

AulchenkoYSRipkeSIsaacsADuijnCMvan2007GenABELanRlibrary

forgenomewideassociationanalysisBioinformatics231294ndash1296

doi101093bioinformaticsbtm108

BamshadMJNgSBBighamAWTaborHKEmondMJNickersonDA

ShendureJ2011ExomesequencingasatoolforMendeliandisease

genediscoveryNatRevGenet12745ndash755doi101038nrg3031

BieseckerLGShiannaKVMullikinJC2011Exomesequencingtheexpert

viewGenomeBiology12128doi101186gb2011129128

152

BiffaniSMarusiMBiscariniFCanavesiF2005Developingagenetic

evaluationforfertilityusingangularityandmilkyieldascorrelatedtraits

InterbullBulletin063

BiffaniSSamoreacuteABCanavesiF2002Inbreedingdepressionforproduction

reproductionandfunctionaltraitsinItalianHolsteincattleInstitut

NationaldelaRechercheAgronomique(INRA)pp0ndash4

BolgerAMLohseMUsadelB2014TrimmomaticAflexibletrimmerfor

IlluminaSequenceDataBioinformaticsbtu170

doi101093bioinformaticsbtu170

BrowningSRBrowningBL2007RapidandAccurateHaplotypePhasingand

MissingDataInferenceforWholeGenomeAssociationStudiesByUseof

LocalizedHaplotypeClusteringAmJHumGenet811084ndash1097

CharlesworthDWillisJH2009ThegeneticsofinbreedingdepressionNat

RevGenet10783ndash796doi101038nrg2664

ChunSFayJC2009Identificationofdeleteriousmutationswithinthree

humangenomesGenomeRes191553ndash1561

doi101101gr092619109

ConsortiumTEP2012AnintegratedencyclopediaofDNAelementsinthe

humangenomeNature48957ndash74doi101038nature11247

DePristoMABanksEPoplinRGarimellaKVMaguireJRHartlC

PhilippakisAAdelAngelGRivasMAHannaMMcKennaA

FennellTJKernytskyAMSivachenkoAYCibulskisKGabrielSB

AltshulerDDalyMJ2011Aframeworkforvariationdiscoveryand

genotypingusingnextgenerationDNAsequencingdataNatGenet43

491ndash498doi101038ng806

GoymerP2007SynonymousmutationsbreaktheirsilenceNatRevGenet8

92ndash92doi101038nrg2056

GumiennyTLBrugneraEToselloTrampontACKinchenJMHaneyLB

NishiwakiKWalkSFNemergutMEMacaraIGFrancisRSchedl

TQinYVanAelstLHengartnerMORavichandranKS2001CED

12ELMOaNovelMemberoftheCrkIIDock180RacPathwayIs

RequiredforPhagocytosisandCellMigrationCell10727ndash41

doi101016S00928674(01)005207

153

HanataniSIzumiyaYTakashioSKimuraYArakiSRokutandaTTsujita

KYamamotoETanakaTYamamuroMKojimaSTayamaS

KaikitaKHokimotoSOgawaH2014Circulatingthrombospondin2

reflectsdiseaseseverityandpredictsoutcomeofheartfailurewith

reducedejectionfractionCircJ78903ndash910

HiroseYChibaKKarasugiTNakajimaMKawaguchiYMikamiY

FuruichiTMioFMiyakeAMiyamotoTOzakiKTakahashiA

MizutaHKuboTKimuraTTanakaTToyamaYIkegawaS2008

AfunctionalpolymorphisminTHBS2thataffectsalternativesplicingand

MMPbindingisassociatedwithlumbardischerniationAmJHum

Genet821122ndash1129doi101016jajhg200803013

httpsgenomeucsceduindexhtml[WWWDocument]nd

JamrozikJFatehiJKistemakerGJSchaefferLR2005EstimatesofGenetic

ParametersforCanadianHolsteinFemaleReproductionTraitsJournalof

DairyScience882199ndash2208doi103168jdsS00220302(05)728952

JanisEWiggintonDJC2005AnoteonexacttestsofHardyWeinberg

equilibriumAmericanjournalofhumangenetics76887ndash93

doi101086429864

JiYRebertNAJoslinJMHigginsMJSchultzRANichollsRD2000

StructureoftheHighlyConservedHERC2GeneandofMultiplePartially

DuplicatedParalogsinHumanGenomeRes10319ndash329

KentWJSugnetCWFureyTSRoskinKMPringleTHZahlerAM

HausslerandD2002TheHumanGenomeBrowseratUCSCGenome

Res12996ndash1006doi101101gr229102

KumarPHenikoffSNgPC2009Predictingtheeffectsofcodingnon

synonymousvariantsonproteinfunctionusingtheSIFTalgorithmNat

Protoc41073ndash1081doi101038nprot200986

KyriakidesTRZhuYHSmithLTBainSDYangZLinMTDanielson

KGIozzoRVLaMarcaMMcKinneyCEGinnsEIBornsteinP

1998Micethatlackthrombospondin2displayconnectivetissue

abnormalitiesthatareassociatedwithdisorderedcollagenfibrillogenesis

anincreasedvasculardensityandableedingdiathesisJCellBiol140

419ndash430

154

LaneEACroweMABeltmanMEMoreSJ2013Theinfluenceofcowand

managementfactorsonreproductiveperformanceofIrishseasonal

calvingdairycowsAnimReprodSci14134ndash41

doi101016janireprosci201306019

LefeberDJdeBrouwerAPMMoravaERiemersmaMSchuurs

HoeijmakersJHMAbsmannerBVerrijpKvandenAkkerWMR

HuijbenKSteenbergenGvanReeuwijkJJozwiakAZuckerN

LorberALammensMKnopfCvanBokhovenHGruumlnewaldSLehle

LKapustaLMandelHWeversRA2011AutosomalRecessive

DilatedCardiomyopathyduetoDOLKMutationsResultsfromAbnormal

DystroglycanOMannosylationPLoSGenet7e1002427

doi101371journalpgen1002427

LiHDurbinR2009FastandaccurateshortreadalignmentwithBurrows

WheelertransformBioinformatics251754ndash1760

doi101093bioinformaticsbtp324

LiHHandsakerBWysokerAFennellTRuanJHomerNMarthG

AbecasisGDurbinR1000GenomeProjectDataProcessingSubgroup

2009TheSequenceAlignmentMapformatandSAMtoolsBioinformatics

252078ndash2079doi101093bioinformaticsbtp352

LiMXKwanJSHBaoSYYangWHoSLSongYQShamPC2013

PredictingMendelianDiseaseCausingNonSynonymousSingle

NucleotideVariantsinExomeSequencingStudiesPLoSGenet9

e1003143doi101371journalpgen1003143

MacArthurDGBalasubramanianSFrankishAHuangNMorrisJWalter

KJostinsLHabeggerLPickrellJKMontgomerySBAlbersCA

ZhangZConradDFLunterGZhengHAyubQDePristoMA

BanksEHuMHandsakerRERosenfeldJFromerMJinMMuXJ

KhuranaEYeKKayMSaundersGISunerMMHuntTBarnes

IHAAmidCCarvalhoSilvaDRBignellAHSnowCYngvadottirB

BumpsteadSCooperDNXueYRomeroIGWangJLiYGibbs

RAMcCarrollSADermitzakisETPritchardJKBarrettJCHarrow

JHurlesMEGersteinMBTylerSmithC2012Asystematicsurvey

155

oflossoffunctionvariantsinhumanproteincodinggenesScience335

823ndash828doi101126science1215040

MajewskiJSchwartzentruberJLalondeEMontpetitAJabadoN2011

WhatcanexomesequencingdoforyouJMedGenet48580ndash589

doi101136jmedgenet2011100223

McClureMCBickhartDNullDVanRadenPXuLWiggansGLiuG

SchroederSGlasscockJArmstrongJColeJBVanTassellCP

SonstegardTS2014BovineExomeSequenceAnalysisandTargeted

SNPGenotypingofRecessiveFertilityDefectsBH1HH2andHH3Reveal

aPutativeCausativeMutationinSMC2forHH3PLoSONE9e92769

doi101371journalpone0092769

McClureMKimEBickhartDNullDCooperTColeJWiggansGAjmone

MarsanPColliLSantusELiuGESchroederSMatukumalliLVan

TassellCSonstegardT2013FineMappingforWeaverSyndromein

BrownSwissCattleandtheIdentificationof41ConcordantMutations

acrossNRCAMPNPLA8andCTTNBP2PLoSONE8e59251

doi101371journalpone0059251

McKennaAHannaMBanksESivachenkoACibulskisKKernytskyA

GarimellaKAltshulerDGabrielSDalyMDePristoMA2010The

GenomeAnalysisToolkitAMapReduceframeworkforanalyzingnext

generationDNAsequencingdataGenomeRes201297ndash1303

doi101101gr107524110

MiHDongQMuruganujanAGaudetPLewisSThomasPD2010

PANTHERversion7improvedphylogenetictreesorthologsand

collaborationwiththeGeneOntologyConsortiumNuclAcidsRes38

D204ndashD210doi101093nargkp1019

MrodeRKearneyJFBiffaniSCoffeyMCanavesiF2009Short

communicationGeneticrelationshipsbetweentheHolsteincow

populationsofthreeEuropeandairycountriesJournalofDairyScience

925760ndash5764doi103168jds20081931

PelosoGMAuerPLBisJCVoormanAMorrisonACStitzielNOBrody

JAKhetarpalSACrosbyJRFornageMIsaacsAJakobsdottirJ

FeitosaMFDaviesGHuffmanJEManichaikulADavisBLohman

156

KJoonAYSmithAVGroveMLZanoniPRedonVDemissieS

LawsonKPetersUCarlsonCJacksonRDRyckmanKKMackey

RHRobinsonJGSiscovickDSSchreinerPJMychaleckyjJC

PankowJSHofmanAUitterlindenAGHarrisTBTaylorKD

StaffordJMReynoldsLMMarioniREDehghanAFrancoOHPatel

APLuYHindyGGottesmanOBottingerEPMelanderOOrho

MelanderMLoosRJFDugaSMerliniPAFarrallMGoelA

AsseltaRGirelliDMartinelliNShahSHKrausWELiMRader

DJReillyMPMcPhersonRWatkinsHArdissinoDZhangQWang

JTsaiMYTaylorHACorreaAGriswoldMELangeLAStarrJM

RudanIEiriksdottirGLaunerLJOrdovasJMLevyDChenYDI

ReinerAPHaywardCPolasekODearyIJBoreckiIBLiuY

GudnasonVWilsonJGvanDuijnCMKooperbergCRichSSPsaty

BMRotterJIOrsquoDonnellCJRiceKBoerwinkleEKathiresanS

CupplesLA2014AssociationofLowFrequencyandRareCoding

SequenceVariantswithBloodLipidsandCoronaryHeartDiseasein

56000WhitesandBlacksTheAmericanJournalofHumanGenetics94

223ndash232doi101016jajhg201401009

PryceJEVeerkampRFThompsonRHillWGSimmG1997Genetic

aspectsofcommonhealthdisordersandmeasuresoffertilityinHolstein

FriesiandairycattleAnimalScience65353ndash360

doi101017S1357729800008559

PryceJEWoolastonRBerryDPWallEWintersMButlerRShafferM

2014WorldTrendsinDairyCowFertilityin10ThWorldCongressof

GeneticsAppliedtoLivestockProduction

PuffenbergerEGJinksRNWangHXinBFiorentiniCShermanEA

DegrazioDShawCSougnezCCibulskisKGabrielSKelleyRI

MortonDHStraussKA2012Ahomozygousmissensemutationin

HERC2associatedwithglobaldevelopmentaldelayandautismspectrum

disorderHumMutat331639ndash1646doi101002humu22237

PurcellSNealeBToddBrownKThomasLFerreiraMARBenderD

MallerJSklarPdeBakkerPIWDalyMJShamPC2007PLINKA

157

ToolSetforWholeGenomeAssociationandPopulationBasedLinkage

AnalysesAmJHumGenet81559ndash575

SahanaGNielsenUSAamandGPLundMSGuldbrandtsenB2013Novel

HarmfulRecessiveHaplotypesIdentifiedforFertilityTraitsinNordic

HolsteinCattlePLoSONE8e82909doi101371journalpone0082909

SchuumltzEScharfensteinMBrenigB2008Implicationofcomplexvertebral

malformationandbovineleukocyteadhesiondeficiencyDNAbased

testingondiseasefrequencyintheHolsteinpopulationJDairySci91

4854ndash4859doi103168jds20081154

SimmonsMJCrowJF1977MutationsAffectingFitnessinDrosophila

PopulationsAnnualReviewofGenetics1149ndash78

doi101146annurevge11120177000405

SoggiuAPirasCHusseinHACanioMDGaviraghiAGalliAUrbaniA

BonizziLRoncadaP2013UnravellingthebullfertilityproteomeMol

BioSyst91188ndash1195doi101039C3MB25494A

StitzielNOKiezunASunyaevS2011Computationalandstatistical

approachestoanalyzingvariantsidentifiedbyexomesequencing

GenomeBiology12227doi101186gb2011129227

SundaramSKHuqAMSunZYuWBennettLWilsonBJBehenME

ChuganiHT2011ExomesequencingofapedigreewithTourette

syndromeorchronicticdisorderAnnNeurol69901ndash904

doi101002ana22398

VanRadenPMOlsonKMNullDJHutchisonJL2011Harmfulrecessive

effectsonfertilitydetectedbyabsenceofhomozygoushaplotypesJournal

ofDairyScience946153ndash6161doi103168jds20114624

VilarintildeoGuumlellCRajputAMilnerwoodAJShahBSzuTuCTrinhJYuI

EncarnacionMMunsieLNTapiaLGustavssonEKChouP

TatarnikovIEvansDMPishottaFTVoltaMBeccanoKellyD

ThompsonCLinMKShermanHEHanHJGuentherBL

WassermanWWBernardVRossCJAppelCresswellSStoesslAJ

RobinsonCADicksonDWRossOAWszolekZKAaslyJOWuR

MHentatiFGibsonRAMcPhersonPSGirardMRajputMRajput

158

AHFarrerMJ2014DNAJC13mutationsinParkinsondiseaseHumMolGenet231794ndash1801doi101093hmgddt570

WangKLiMHakonarsonH2010ANNOVARfunctionalannotationofgeneticvariantsfromhighthroughputsequencingdataNuclAcidsRes38e164ndashe164doi101093nargkq603

WathesDCPollottGEJohnsonKFRichardsonHCookeJS2014HeiferfertilityandcarryoverconsequencesforlifetimeproductionindairyandbeefcattleAnimal8Suppl191ndash104doi101017S1751731114000755

YangYMuznyDMReidJGBainbridgeMNWillisAWardPABraxtonABeutenJXiaFNiuZHardisonMPersonRBekheirniaMRLeducMSKirbyAPhamPScullJWangMDingYPlonSELupskiJRBeaudetALGibbsRAEngCM2013ClinicalWhole

ExomeSequencingfortheDiagnosisofMendelianDisordersNewEnglandJournalofMedicine3691502ndash1511doi101056NEJMoa1306555

ZiminAVDelcherALFloreaLKelleyDRSchatzMCPuiuDHanrahanFPerteaGTassellCPVSonstegardTSMarccedilaisGRobertsMSubramanianPYorkeJASalzbergSL2009Awholegenome

assemblyofthedomesticcowBostaurusGenomeBiology10R42doi101186gb2009104r42

159

Supplementary)files)

YoucanfindChapter6supplementaryfilesat

- DocTa(DoctoralThesisArchive)(httptesionlineunicattit)

- GoogleDrive(httptinyurlcomPhDMMChapter06)orscantheQRcode

)

Supplementary)Figure)S1)Coverage)analyses)

A)Averagecoverageperanimals

B)Boxplotofchromosomescoverageperanimal

C)BoxplotofanimalcoverageperchromosomeBluelinerepresenttheexpected

50Xcoverage

Supplementary)Figure)S2)Distribution)of)variant)sequencing)depth)

)

Supplementary) Figure) S3) Correlation) between) sequencing) depth) and)

number) of) discordant) genotypes) In) red) values) for) two) outlier) animals)

having)concordance)lt846))

)

Supplementary)Figure)S4)Multidimensional)scaling)of)Italian)Holstein)bulls)

sequenced))

)

Supplementary)Figure)S5)Multidimensional)scaling)of)Italian)Holstein)bulls)

analysed)with)the)800K)SNPchip))

160

Supplementary) Figure) S6) MAF) distribution) on) neutral) (blue)) and)

deleterious)(red))variants))

)

Supplementary) Table) S1) EBV) distribution) on) Italian) Holstein) population)

and)SNPchip)dataset)

Sheet1EBVthresholdvaluetodefineplusandminustailforPROT(kgprotein)

SCC(somaticcellcount)FERT(fertility)andLONG(functional longevity) All

ItalianHolsteinFAbullswereincludeinthisanalysis

Sheet2Numberofdatasetanimalsineachtailforeachtrait

Supplementary)Table)S2)Exome)probes)statistics)number)and)bp)covered)

per)chromosomes)

Supplementary) Table) S3) Concordance) between) animal) genotypes) in)

sequencing)and)SNPchip)data)

Error1 HD data homozygotes ndash Exome data heterozygotes Error2 HD data

heterozygotesndashExomedatahomozygotes

Supplementary) Table) S4) List) of) suggestive) genes) deriving) from) the)

analysis)of)exomes)of)animals) in)the)tails)of)male)and)female)fertility)EBV)

distributions))

Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

161

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)dataset

bull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)dataset

bull LOW_CRSNPcallratefromlowfertility(maleandfemale)dataset

bull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotype

bull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)dataset

bull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)dataset

bull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)dataset

bull HIGH_CRSNPcallratefromhighfertility(maleandfemale)dataset

bull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallele

bull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotype

bull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallele

bull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)dataset

bull Male_TREND inheritance models implemented in PLINK trend test on male fertility

dataset

bull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydataset

bull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydataset

bull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

dataset

bull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydataset

bull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydataset

bull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

162

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table) S5) Regions) candidate) to) carry) deleterious) variants)

identified)by)OHW)approach))

Supplementary)Table) S6) Regions) candidate) to) carry) deleterious) variants)

identify)by)LOH)approach))

Ingreyregionsremovedinfurtheranalysis

Supplementary)Table)S7)List)of)candidate)deleterious)variants)mapping)in)

OHW)and)LOH)regions))Header(code

bull Bfreq_totFrequencyofBalleleinentiredataset

bull Bfreq_MHFrequencyofBalleleinhighmalefertilitydataset

bull Bfreq_MLFrequencyofBalleleinhighfemalefertilitydataset

bull Bfreq_FHFrequencyofBalleleinlowmalefertilitydataset

bull Bfreq_FLFrequencyofBalleleinlowmalefertilitydataset

bull CallRate_totSNPcallratendashentiredataset

bull CallRate_MHSNPcallratendashhighmalefertilitydataset

bull CallRate_MLSNPcallratendashlowmalefertilitydataset

bull CallRate_FHSNPcallratendashhighfemalefertilitydataset

bull CallRate_FLSNPcallratendashhighmalefertilitydataset

bull HWeqHardyWeinbergequilibriumpvalue

bull Geno_HomoMinorNumberofanimalswithhomozygotegenotypeforminorallele

bull Geno_HetNumberofanimalswithheterozygotegenotype

bull Geno_HomoMajorNumberofanimalswithhomozygotegenotypeformajorallele

bull HW_insideDeleteriousmutationsinsideldquoOutofHardyWeinbergrdquoregion

bull HW_outsideDeleteriousmutationsoutsideldquoOutofHardyWeinbergrdquoregion

bull DH_insideDeleteriousmutationsinsideldquoLackofHomozygosityrdquoregion

bull DH_outsideDeleteriousmutationsoutsideldquoLackofHomozygosityrdquoregion

163

bull GeneClass Classifications of gene 1 deleterious mutations inside OHW and LOHregion2deleteriousmutationsinsideOHWorLOHregion

bull 2deleteriousmutationsoutsideOHWandorLOHregionbull LOW_MinorAlleleminoralleleforlowfertility(maleandfemale)datasetbull LOW_MAFminorallelefrequencyfromlowfertility(maleandfemale)datasetbull LOW_CRSNPcallratefromlowfertility(maleandfemale)datasetbull LOW_HomoMinorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull LOW_Het Number of animals from low fertility (male and female) dataset with

heterozygotegenotypebull LOW_HomoMajorNumberofanimalsfromlowfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull LOW_HWeqPvalue HardyWeinberg equilibrium pvalue from low fertility (male and

female)datasetbull HIGH_MinorAlleleminorallelefromhighfertility(maleandfemale)datasetbull HIGH_MAFminorallelefrequencyfromhighfertility(maleandfemale)datasetbull HIGH_CRSNPcallratefromhighfertility(maleandfemale)datasetbull HIGH_HomoMinorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeforminorallelebull HIGH_Het Number of animals from high fertility (male and female) dataset with

heterozygotegenotypebull HIGH_HomoMajorNumberofanimalsfromhighfertility(maleandfemale)datasetwith

homozygotegenotypeformajorallelebull HIGH_HWeqPvalueHardyWeinberg equilibriumpvalue fromhigh fertility (male and

female)datasetbull Male_TREND inheritance models implemented in PLINK trend test on male fertility

datasetbull Male_GENOTYPES inheritancemodels implemented inPLINKgenotypic testonmale

fertilitydatasetbull Male_ RECESSIVE inheritance models implemented in PLINK recessive test on male

fertilitydatasetbull Female_TRENDinheritancemodelsimplementedinPLINKtrendtestonfemalefertility

datasetbull Female_GENOTYPES inheritance models implemented in PLINK genotypic test on

femalefertilitydatasetbull Female_RECESSIVEinheritancemodelsimplementedinPLINKrecessivetestonfemale

fertilitydatasetbull MaleampFemale_TREND inheritance models implemented in PLINK trend test on male

andfemalefertilitydataset

164

bull MaleampFemale_GENOTYPES inheritancemodels implemented in PLINK genotypic test

onmaleandfemalefertilitydataset

bull MaleampFemale_RECESSIVEinheritancemodelsimplementedinPLINKrecessiveteston

maleandfemalefertilitydataset

bull EMP1_MaleEmpiricalpvalueonmalefertilitydataset

bull EMP1_FemaleEmpiricalpvalueonfemalefertilitydataset

bull EMP1_MaleampFemaleEmpiricalpvalueonmaleandfemalefertilitydataset

Supplementary)Table)S8)Results)of)haplotype)analysis)))Header(code

bull Chromchromosome

bull PosphysicalpositiononUMD31genome

bull Exome_homonumberofhomozygotefordeleteriousmutationsequencedanimals

bull Exome_hetnumberofheterozygotesequencedanimalsfordeleteriousmutation

bull Outcome NotFound It was impossible to associated a haplotype to the mutation

NotOpt Non completely invariant haplotype could be associated to themutationOK

Putativehaplotype associate tomutationwas found In few case alternativehaplotype

wasreported

bull LenlengthofhaplotypeinnumberofSNPs

bull Pop_numnumbersofselectedhaplotypedetectedinHolsteinpopulation

bull Pop_homo numbers of homozygote animals in Holstein population for selected

haplotype

bull Pop_het numbers of heterozygotes animals in Holstein population for selected

haplotype

bull ExpExpectednumberofhomozygoteanimal

bull Alternative if only one heterozygote sequenced animals for deleteriousmutationwas

foundalltwopossiblehaplotypeswheretested

Supplementary)Table)S9)Descriptive)results)for)the)Biological)Process)(BP)))

NumberofgenesbelongingtoBiologicalProcessGOterms)

Supplementary) Table) S10) Descriptive) results) for) the)Molecular) Function)

(MF))

NumberofgenesbelongingtoMolecularFunctionGOterms

165

CHAPTER7

Conclusion

Since the domestication event occurred in the Fertile Crescent around 10000yearsagothebovinespecieswasselectedbyhumankindtofulfilhisneedsThecreationofbreedsoccurredaround200yearsagointensifiedthedifferentiationbetween populations and promoted the development of intensive selectionschemesTraditionalAandnowadaysgenomicAselectionsareproducingpositivegenetictrendsinmanyproductivetraitsdespitethestillpoorknowledgeaboutphenotype biology A deeper understanding of genome organization and genebiology would increase the accuracy of genomic evaluation by incorporatingpriorknowledgeThisthesisexploresthebovinegenomebyhighAthroughputtechnologiesAsuchasNextGenerationSequencingandHighADensityGenotypingAwithestablishedand innovative procedures to investigate the biology of complex traits and toprovidenewdataandmoleculartoolstobovinebreedingThese goals have been achieved by complementary approaches and differentmethodsInChapter2selectionsignatureswereinvestigatedindependentlyintofive Italian breeds using Illumina BovineSNP50 BeadChip Then amultiAbreedapproach was applied clustering breeds into dairy (Italian Holstein ItalianBrownand Italian Simmental) andbeef (Marchigiana Piedmontese and ItalianSimmental) groups Regions under recent positive selection shared by breedswithasameproductiveaptitudewereinvestigatedinfurtherdetailbyretrievinggenes in the region and analysing their function by ontology and pathwayanalysis In the dairy group selection signals appear nearby genes related tomammarygland(metabolismandresistancetomastitis)andfeedingadaptationIn thebeefgroupgenes inselectedregionsare involved inanimalgrowthandmeatquality(textureandjuiciness)Hencecandidategenesidentifiedbelongtonetworks and pathways important in directing a breed towards either beef ordairyproductionIn Chapter 3 a GWAS approach was used in dairy cattle to correlate SNP tocomplex phenotypes having an economic value Three independent ldquoclassicalrdquosingleAmarkerregressionswereruninthreeItaliandairycattle(ItalianHolsteinItalianBrownandItalianSimmental)UsingthetraditionalsingleSNPapproach

168

only few SNPs were above the nominal genomeAwide significance thresholdwhile applying a geneAbased method identified significant candidate genesassociatedwithmilkphysiology andmammary glanddevelopment in all threepopulationsInterestinglygenesfoundindifferentbreedshavesimilarfunctionbut are not the same suggesting that breed history demography selectioncriteriaandstochasticeventsasgeneticdrifthaveanimpactonthedefinitionofthemainmoleculartargetofselectionschemesCurrent intensive selection strategies rapidly spread the genetic gain in thepopulationbutholdtheassociatedriskofdisseminatingdefectivegenescausinggenetic defects (eg mulefoot complex vertebral malformation and others) ordecreasing fertility In Chapter 6 different technologies and approaches werecombined to obtain filter and prioritize genes candidate to be deleteriousExome sequence datawere exploited to identify deleterious variants affectingItalianHolsteincattleinasetof20animalsextremeforfertilitytraitsMorethan5000 candidate deleterious SNP variants were identified To bypass the smallnumberlimitationandthelackofpowerofanystatisticanalysison20animalsastepwise approachwas used to first filter and then prioritize these candidatedeleterious variants Polymorphisms were retained if having asymmetricdistribution in bulls from opposite tails of fertility EBV distributions andormapping in genomic regions outlier for lack of homozygosity in a 1009 bullpopulation genotyped with the Illumina BovineHD BeadChip A total of 191variants in163genespassed theprocessesThesewereassessedby searchingadditional evidences in favour or against their deleterious effect throughdifferentapproachesi)checkingthevariantconservationacross100vertebratespecies by comparative genomics ii) finding haplotypes associated withcandidatemutationsandevaluatingtheirfrequencyinHolsteinsiii)integratinghumanandmouse functionalannotationsA scorewasproposed toweight theresult of these assessments and rank the variants 35 of them had positivescores These occurred in genes involved in basic biological mechanisms asfertilitydevelopmentand immunityandresultedenriched ingenesassociatedtogeneticdefectsreportedinhumanThesemutationsarestrongcandidatestobe recessive deleterious variants worth to be further investigated in a larger

169

populationtobetterunderstandtheirbiologyimpactandpossibleapplicationinbreedingTo increase the reliability of results based on genomic data and to reduce theimpact of subjective choices in data analysis new toolsapproaches weredeveloped(iethemultiAbreedapproachinSelectionSignaturendashChapter2thepostAGWAS geneAbased approach ndash Chapters 3 and 4) and the existent onescriticallyevaluated(imputationmethodsndashChapter5)In Chapter 4 the postAGWAS geneAbased method used in Chapter 3 wasimplemented in the MUGBAS (MUlti species GeneABased Association Suite)software In contrast to existing software MUGBAS is species and annotationfree AmultipleAtesting correctionwas included in the package in addition tocomputing parallelization for faster processing and plot representation forbetterunderstandingThesoftwareusessingleAmarkerGWASdataandagivenannotation to estimate geneAregionAwise association pAvalues As previouslymentioned in Chapter 3 the ldquoclassicalrdquo GWAS approach (single SNP) foundsignals only in ItalianHolsteinwhileMUGBAS geneAbased approach identifiedsignificantsignalsinallbreedsinvestigatedIn Chapter 5 the impact of different bovine reference sequences on genotypeimputation was investigated In this work Illumina BovineSNP50 BeadChipgenotypesof ItalianSimmentalbullswerereAmappedonthethreemostrecentreferencegenomeassembliesWecomparedfourdifferent imputationmethodsfromlow(IlluminaBovineLDv10Beadchip)tohighdensityThetestpopulationwasdividedintofoursubpopulationsonthebasisofrelationshipandgenotypedataavailabilityUpdatingSNPcoordinatesonthe three testedcattlereferencegenome assemblies determined only a slight variation on imputation resultswithinmethodsOntheotherhandlargedifferencesintermsofaccuracywereobservedamongthefourimputationmethodstestedGenetic structure of industrial breeds was evaluated assessing LinkageDisequilibrium (LD) Since LD decay is correlated to intensity of selection weexpected different behaviours in the breeds analysed Indeed as evident inChapter 2 ndash Figure 7 breeds subjected to lower selection intensity (ie

170

MarchigianaPiedmonteseandItalianSimmental)displayalowerpersistenceof

LDatgreaterdistancescomparedtothosesubjectedtohigherselectionpressure

(ieItalianHolsteinandItalianBrown)Chapter3investigatesLDintheDGAT1

region of dairy cattle breeds In Italian Simmental LD is lower compared to

Italian Holstein and Italian Brown In spite of this significant association

betweentheDGAT1regionandmilktraitswasfoundbothinItalianSimmental

andinItalianHolsteinInItalianBrownnoassociationwasfoundbecauseDGAT1

isfixed

Data and results presented in this thesis represent a clear example of the

progress of molecular technologies occurred in last few years spanning from

mediumdensitygenotypingtonextgenerationsequencing

Thebiologylandscapesofsomecomplextraits(iemilkproductionandquality

fertility)were improved finding candidate genes andpathwaysnotpreviously

associated to the phenotype in the bovine species Moreover insights on

deleterious mutations were inferred through a deep and integrated analysis

usingNGSdata

This new knowledge could be used to improve cattle breeding For example

deleteriousrecessivevariantscouldbeincludedinbullevaluationsbybreedersrsquo

associations to reduce the genetic load of deleterious genes in parallel to

improvingproductivity

While technological progress has been impressivewe are still only scratching

thesurfaceofanimalbiologyNewfrontiersarebeingopenedbywholegenome

sequencing of many animals (eg in the 1000 bull genome project) and by

stepping beyond sequencing (a number of epigenomic projects in animals are

following the footprints of the human ENCODE project) The trend is towards

unravelling complexity and data production ability is to be flanked by data

storingandprocessingandaboveallbynewmethodsandtoolsfordataanalysis

171

ACKNOWLEDGMENTS

ldquoNon egrave la destinazione ma il viaggio che contardquo Questi tre anni sono stati un gran bel viaggio di crescita personale e ricchi diesperienze Non egrave stato un viaggio solitarioma condiviso con vecchie e nuoveconoscenzechemisembradoverosomenzionareInnanzitutto vorrei ringraziare il prof Paolo AjmoneMarsan chemi ha dato lapossibilitagrave di fare questa esperienza Grazie per gli insegnamenti e per leopportunitagravechemihaoffertoSperodinonavertraditolesueaspettativeAgli attuali colleghi Lorenzo Stefano Licia Elisa Riccardo ed Elia e a quellipassati Nicola Fatima Raffaele Ezequiel e Barbara va il mio piugrave sincero edaffettuoso ringraziamento per aver condiviso anche solo in parte questomiopercorso Grazie per avermi sopportato (e non egrave poco) guidato (non solo nelcampolavorativo)efattovivereinunmodospecialelrsquoambientedilavoroUnsincerograzievaatuttiimieicolleghidelXXVIIcicloAgrisystemRicorderogravesempretuttiimomentichehopotutocondividereconvoi UngrazieancheachihafattopartedellamiaesperienzasvedeseGraziedicuoreaCarlFlavioChristianEricaatuttiiragazziitalianiatuttiicolleghildquosvedesirdquoea tutte le altre persone che ho conosciuto Mi avete fatto vivere appienolrsquoesperienzaesteraefattotornareacasaconqualcheconsapevolezzainpiugrave UnsentitoringraziamentoatuttiglialtricolleghiconcuihocollaboratoinquestianniRingrazio tutti gli amici e conoscenti che mi sono stati vicino per avermisupportatoesopportatoognunoamodoproprio Ultimomanonperimportanza ilringraziamentochevaatuttalamiafamigliaper avermi aiutato spronato sostenuto sempre e in particolare in questaesperienzaSemplicementegrazie PS Adirlatuttaancheladestinazionenonegravepoicosigravemale

173

  • INDEX
  • CHAPTER 1 - Introduction
  • CHAPTER 2 - Relative extended haplotype homozygosity (rEHH) signals across breeds reveal dairy and beef specific signatures of selection
  • CHAPTER 3 - Searching new signals for productive traits in three Italian cattle breeds
  • CHAPTER 4 - MUGBAS a species free gene based ased association suite
  • CHAPTER 5 - Imputation accuracy is robust to cattle reference genome updates
  • CHAPTER 6 - Quest for deleterious recessive variants in Italian Holstein bulls combining exome sequencing and high density SNP analysis
  • CHAPTER 7 - Conclusion
  • ACKNOWLEDGMENTS
Page 8: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 9: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 10: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 11: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 12: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 13: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 14: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 15: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 16: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 17: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 18: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 19: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 20: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 21: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 22: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 23: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 24: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 25: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 26: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 27: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 28: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 29: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 30: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 31: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 32: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 33: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 34: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 35: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 36: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 37: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 38: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 39: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 40: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 41: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 42: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 43: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 44: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 45: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 46: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 47: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 48: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 49: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 50: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 51: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 52: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 53: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 54: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 55: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 56: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 57: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 58: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 59: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 60: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 61: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 62: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 63: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 64: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 65: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 66: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 67: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 68: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 69: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 70: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 71: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 72: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 73: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 74: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 75: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 76: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 77: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 78: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 79: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 80: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 81: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 82: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 83: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 84: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 85: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 86: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 87: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 88: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 89: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 90: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 91: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 92: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 93: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 94: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 95: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 96: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 97: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 98: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 99: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 100: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 101: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 102: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 103: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 104: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 105: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 106: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 107: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 108: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 109: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 110: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 111: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 112: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 113: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 114: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 115: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 116: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 117: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 118: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 119: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 120: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 121: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 122: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 123: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 124: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 125: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 126: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 127: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 128: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 129: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 130: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 131: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 132: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 133: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 134: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 135: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 136: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 137: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 138: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 139: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 140: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 141: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 142: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 143: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 144: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 145: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 146: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 147: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 148: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 149: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 150: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 151: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 152: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 153: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 154: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 155: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 156: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 157: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 158: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 159: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 160: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 161: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 162: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 163: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 164: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 165: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School
Page 166: High-throughput sequencing technologies and genomics ...tesionline.unicatt.it/bitstream/10280/6074/3/00_tesiphd...Scuola di Dottorato per il Sistema Agro-alimentare Doctoral School