research article open access chiropteran types i and ii … · 2017. 4. 5. · research article...

12
RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing traces by a statistical gene-family assembler Thomas B Kepler 1* , Christopher Sample 2 , Kathryn Hudak 2 , Jeffrey Roach 1 , Albert Haines 1 , Allyson Walsh 3 , Elizabeth A Ramsburg 2 Abstract Background: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. Results: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. Conclusion: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible. Background Novel human pathogens appear at a continually increas- ing rate. The majority of these agents are zoonotic, and have their origins in wildlife [1]. Among wildlife sources, bats, mammals of the order Chiroptera, are perhaps the most striking, serving as the primary natural reservoirs for many human pathogens [2-5], including several viruses of notorious lethality in humans: Nipah virus [6], Hendra virus [7], Ebola and Marburg Viruses [8,9], and Rabies virus. Rabies virus is regarded as the most lethal of all human pathogens, and has been found in bats for over a century [10]. Rabies and other Lyssaviruses are now found in many other animals, but phylogenetic stu- dies indicate that these viruses first evolved in bats and transferred to animals of the order Carnivora more recently [11]. SARS-like Coronaviruses have also been found in bats [12,13] and so far provide the closest link to the agent of human SARS. Infection by each of these viruses in bats appears to be asymptomatic or of greatly reduced pathology. Even Rabies virus, for which we have a great wealth of experi- mental data, and which is regarded as invariably lethal in other mammals, has been shown repeatedly to have decreased virulence in bats [14]. The association of so many human-lethal viruses with bats may be partially attributable to the fact that the * Correspondence: [email protected] 1 Center for Computational Immunology, Duke University Medical Center, Durham, NC, USA Kepler et al. BMC Genomics 2010, 11:444 http://www.biomedcentral.com/1471-2164/11/444 © 2010 Kepler et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by Springer - Publisher Connector

Upload: others

Post on 05-Jun-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

RESEARCH ARTICLE Open Access

Chiropteran types I and II interferon genesinferred from genome sequencing traces by astatistical gene-family assemblerThomas B Kepler1 Christopher Sample2 Kathryn Hudak2 Jeffrey Roach1 Albert Haines1 Allyson Walsh3Elizabeth A Ramsburg2

Abstract

Background The rate of emergence of human pathogens is steadily increasing most of these novel agentsoriginate in wildlife Bats remarkably are the natural reservoirs of many of the most pathogenic viruses in humansThere are two bat genome projects currently underway a circumstance that promises to speed the discovery hostfactors important in the coevolution of bats with their viruses These genomes however are not yet assembledand one of them will provide only low coverage making the inference of most genes of immunological interesterror-prone Many more wildlife genome projects are underway and intend to provide only shallow coverage

Results We have developed a statistical method for the assembly of gene families from partial genomes Themethod takes full advantage of the quality scores generated by base-calling software incorporating them into acomplete probabilistic error model to overcome the limitation inherent in the inference of gene family membersfrom partial sequence information We validated the method by inferring the human IFNA genes from the genometrace archives and used it to infer 61 type-I interferon genes and single type-II interferon genes in the batsPteropus vampyrus and Myotis lucifugus We confirmed our inferences by direct cloning and sequencing of IFNAIFNB IFND and IFNK in P vampyrus and by demonstrating transcription of some of the inferred genes by knowninterferon-inducing stimuli

Conclusion The statistical trace assembler described here provides a reliable method for extracting informationfrom the many available and forthcoming partial or shallow genome sequencing projects thereby facilitating thestudy of a wider variety of organisms with ecological and biomedical significance to humans than would otherwisebe possible

BackgroundNovel human pathogens appear at a continually increas-ing rate The majority of these agents are zoonotic andhave their origins in wildlife [1] Among wildlife sourcesbats mammals of the order Chiroptera are perhaps themost striking serving as the primary natural reservoirsfor many human pathogens [2-5] including severalviruses of notorious lethality in humans Nipah virus [6]Hendra virus [7] Ebola and Marburg Viruses [89] andRabies virus Rabies virus is regarded as the most lethalof all human pathogens and has been found in bats for

over a century [10] Rabies and other Lyssaviruses arenow found in many other animals but phylogenetic stu-dies indicate that these viruses first evolved in bats andtransferred to animals of the order Carnivora morerecently [11] SARS-like Coronaviruses have also beenfound in bats [1213] and so far provide the closest linkto the agent of human SARSInfection by each of these viruses in bats appears to be

asymptomatic or of greatly reduced pathology EvenRabies virus for which we have a great wealth of experi-mental data and which is regarded as invariably lethalin other mammals has been shown repeatedly to havedecreased virulence in bats [14]The association of so many human-lethal viruses with

bats may be partially attributable to the fact that the

Correspondence keplerdukeedu1Center for Computational Immunology Duke University Medical CenterDurham NC USA

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

copy 2010 Kepler et al licensee BioMed Central Ltd This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (httpcreativecommonsorglicensesby20) which permits unrestricted use distribution and reproduction inany medium provided the original work is properly cited

brought to you by COREView metadata citation and similar papers at coreacuk

provided by Springer - Publisher Connector

order Chiroptera to which the bats belong is the secondlargest mammalian order behind only Rodentia in totalspecies [15] On the other hand the unique lifestylesand ecological circumstances of bats makes plausible theidea that the coevolution of bats and their viruses pro-duced unusually virulent viruses and a host antiviralresponse tuned to suppress themIf the latter is the case it will be of great interest to

understand the ways in which the several componentsof the bat antiviral response differ from those in humansand other mammals The main hurdle to the pursuit ofthis goal is our paucity of knowledge of bat immunologyand almost complete lack of reagents to initiate suchstudies What we do have now that was not available afew years ago is the information obtained through large-scale sequencing projects This information resides forthe most part in partially-or unassembled genomesequences and is not practically available for such pur-poses without further technical developmentsOur aims in this paper are to describe the method we

have devised for the partial assembly of genome sequen-cing traces for the inference of gene families to validatethis method on the human interferon alpha family andto use it for the inference of the type-I interferonfamilies in the bats Myotis lucifugus (little brown bat)and Pteropus vampyrus (large flying fox)Most pathogenic viruses including all of the viruses

listed above possess one or more genes that directly antag-onize the type-I interferon response [16-19] indicating astrong coevolutionary pressure between these viruses andthe type-I interferon system In experimental infections ofmice with Rabies virus the investigators noted a markedcorrelation between survival and IFN production [20]Type I inteferons are the primary mediators of one of the

earliest stages of the antiviral response in mammals Theyare induced rapidly and secreted upon viral infection Cellsexposed to these cytokines enter an infection-refractorystate induced by signaling through the common type-Iinterferon receptor complex In this state transcriptionand translation of viral gene products is inhibited MHCclass I expression is upregulated RNA and protein degra-dation are accelerated (reviewed in depth in refs [21-23])There are 13 human functional IFNa genes as well as sin-gle functional genes for IFNb IFNε IFNω and IFN Thetype 1 interferons are highly pleiomorphic exhibiting bothdistinct and complex overlapping function [2124] All ofthe known mammalian type-I interferons genes are intron-less their transcripts are neither spliced nor edited beforetranslation and expression is controlled at the transcrip-tional level [232526]

Bat Genome ProjectsThere are two bat whole-genome sequencing projects inprogress but neither is complete at this point One of

them will provide coverage at a level that will not producea reliable assembly There are in fact a great many sequen-cing projects underway for which there is no intent to dofull coverage [27] The Myotis lucifugus genome sequen-cing project has completed sequencing to approximately7-fold coverage with 27486306 traces thus far The gen-ome is as of this writing being assembled httpwwwbroadinstituteorgscienceprojectsmammals-modelsbrown-batlittle-brown-bat The Pteropus vampyrus gen-ome sequencing project has produced 8051001 genometraces httpwwwhgscbcmtmceduproject-species-m-MegabathgscpageLocation=Megabat and is slated forcompletion with 2x coverage The Pteropus project isbased on samples from a single individual bat (personalcommunication)The consequence of incomplete coverage is the

exacerbation of one of the shortcomings of whole-gen-ome shotgun sequencing the difficulty of resolvingrepetitive DNA segments [28] If two sequencing tracescontain regions of similarity it may be difficult orimpossible to determine whether these traces arederived from the same underlying DNA or from twodistinct DNA segments that are themselves paralogousThis difficulty is not limited to the assembly of highlyrepetitious intergenic regions but to the inference ofgene families as wellThis latter problem is particularly unfortunate

because gene duplication is a major source of innovativepotential in evolution and the comparative study ofgene families among related species is therefore of greatinterest [29] Furthermore a large proportion of genesin eukaryotic genomes reside in families including thegenes that encode the type-I interferonsIn this paper we describe a method for the inference of

gene family members from unassembled sequencingtraces The method is conceptually straightforward andis based on an information-theoretic model that accountsfor both sequencing error and evolutionary divergenceproviding the means to encode the set of sequencingtraces We then seek those partial assemblies that makethe total description length of the combined set ofsequencing traces as small as possible This reconstruc-tion provides an estimate of the number of genes in thefamily and posterior probability mass functions on theDNA sequences of these genesWe first present the model and the algorithm we have

developed to minimize the description length and therebyinfer the structure of the gene family We validate themethods by reconstructing the human type-I interferonfamily genes from sequencing traces from the human gen-ome project We use this method to infer the type-I inter-feron families from the sequencing traces from the Myotislucifugus and Pteropus vampyrus genome sequencing pro-jects We examine these genes in comparison with the

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 2 of 12

orthologous families in humans and other mammalsFinally we confirm our inferences by cloning and sequen-cing genes from four of the interferon families

ResultsInference of Gene Families From Trace ArchivesIn this section we present the mathematical basis of theassembly method we have developed and provide a con-cise algorithmic implementationWe assume a set S of sequencing traces containing

regions with similarity to some given gene or genes Inpractice this seed gene would be an ortholog of thegenes one is attempting to infer one collects S usingsimilarity searching on the complete trace archive forthe species of interestThere are g genes in the family where g is an unknown

integer Each of these genes is related to a commonancestor a (whose sequence is also unknown) and differsfrom a by the accumulation of mutations includinginsertions and deletions Each sequencing trace resultsfrom direct templating from one of these genes and dif-fers from it by sequencing error only Figure 1 illustratesthe statistical model used for the inferential procedureGiven any two sequencing traces in S they are related

either through sharing the same immediate template(traces 5 and 6 in figure 1 for example) or indirectlythrough the ancestor a with each having a distinctimmediate template These two alternatives correspondto two different likelihoods In the first the differencesbetween the two traces are attributable to sequencingerror only and must be consistent with the reportedposition-specific quality scores In the latter the diver-gence of the underlying genes from their commonancestor provides an additional source of differencesThe difference in description lengths under these twomodels provides a criterion for choosing between the

models In fact the criterion we will use for determiningthe final set of assemblies is the minimization of thetotal description length of S under the model in figure1 Briefly the description length is the informationrequired to encode the data and the values of the modelparameters (See [30] for a complete treatment ofdescription length techniques for inference) For anysub-model for pairs of traces that involves mutationsfrom the common ancestor the description length mustaccount for the information required to encode themutation rate If y is the overlap in nucleotides betweenthe two sequencing traces the cost of encoding themutation rate is log yThe objective of the assembly procedure is the mini-

mization of the description length over all topologies ofthe form shown in figure 1 but there is no method forfinding this minimum short of exhaustive enumerationover all topologies which is ruled out by practical con-siderations for all but the smallest S We thereforeadopt a greedy progressive method familiar from its usein multiple sequence alignment At each stage we findthe pair of sequences (traces or partial assemblies) thatprovide the maximum saving in description length andassemble them replacing the pair by the new assemblythey have formed The process continues until there areno remaining moves that lead to a description lengthreductionThe following section provides the details for the

probability modelQuality scores and conditional probabilitiesThe base-calling software reads the raw sequencingtrace and for each position reports a single base and aquality score If the reported base is X and the reportedquality score is q we take the output to represent a pos-terior probability mass function (pmf) on the latentnucleotide state at that position given by

f Dq X

q( | )

( )

( )

=minus =⎧

⎨⎩

1

4

for

else(1)

where D represents the data conditioning the estimateand ε(q) is defined by

( ) q q= minus10 10 (2)

It is not necessary to specify a likelihood function oreven to describe what the nature of D is so we simplywrite p j

i( ) for the posterior probability mass functionon ξ in the jth position in the ith sequencing trace Forsimplicity in this presentation we will assume that theprior pmf on ξ is uniform on A C G T -We will also need the posterior pmf on the ancestral

state a which is obtained by specifying a model for theaccumulation of mutations from a to ξ and summing

Figure 1 Schematic of the statistical model The observedsequencing traces Xi result from the observation of genes ξ Thesegenes are related by descent from the common ancestor a

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 3 of 12

over the unobserved state ξ We take a simple reversiblemutation model

Prob( | )

=

minus =⎧⎨⎩

1

4

for

else(3)

and let

p pji

ji( | ) Prob( | ) ( )

= sum (4)

Consider the model partially described by having posi-tion i in trace 1 aligned with position j in trace 2 Theposterior pmf for the latent true nucleotide state is

pp p

ki j

ij

121 2

12

5( )

( ) ( )ξ

ξ ξ

ϕ= (5)

where k is the position in the assembly correspondingto i and j in traces 1 and 2 respectively ij

12 is given by

ij i j

A C G T

p p12 1 25equivisin minus

sum ( ) ( )

(6)

Since the criterion is the minimization of the descrip-tion length we compare any two models by the differencein their description lengths The alignment cost then isthe difference in description length between the model inwhich these positions are aligned (given the alignment)and the one in which they are independent and is give by

sij ij12 12= log (7)

Denote by A(k) the vector whose components are theindices for the aligned traces corresponding to the kthposition in the resulting assembly Then the cost of thealignment is

S sA k A k

k

A12 12

1 2= sum ( ) ( ) (8)

For the case where traces 1 and 2 are related indir-ectly through a common ancestor we similarly have

S sA k A k

k

A12 12

1 2( ) ( )( ) ( ) = sum (9)

with the appropriate and obvious definitions The quan-tity of primary importance for us is the cost differencebetween the direct and indirect models for a given pair oftraces

c S

S o

1212

12 1212

=

+

min

min min ( ) log ( )

AA

AA A

μμ μ

(10)

The logarithmic term is the cost of encoding themutation rate that minimizes the alignment score [30]the argument oA

12( ) is the number of nucleotides inthe overlap of traces 1 and 2 in the optimal alignmentie the number of degrees of freedom available to esti-mate μ When c is positive the indirect model is themore efficient when c is negative the direct model ismore efficientFor progressive alignment note that the posterior assem-

bly pmf and the cost are composed from the individualsequential alignments Suppose we first align traces 1 and2 to obtain assembly 4 and then align trace 3 to assembly4 to produce assembly 5 as shown in figure 2 We need torefine our notation for indexing by letting A ki

a( ) refer tothe position in trace i corresponding to position k inassembly a So for the example of figure 2 we have

A k A A ki i5 4

45( ) ( ( ))= (11)

It is then straightforward to show that the alignmentscores are additive

s

p p p

A k A k A k

A k A k A

15

25

35

15

15

1

123

1 2

2 5( ) ( ) ( )

( ) ( )

log

log ( ) ( )

equiv minus

minus 55

14

45

24

45

45

35

3

12 34

( )

( ( )) ( ( )) ( ) ( )

( )k

A A k A A k A k A ks s

sum

= +

(12)

Figure 2 Diagram indicating the labeling of traces and assembliesfor the discussion of progressive alignment in the text

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 4 of 12

and that the assembly posterior pmfs are obtained bycomposition

p p p p

s

k A k A k A k

A k A k A

5 2 1 2 3515

25

35

15

25

3

( ) ( ) ( ) ( )( ) ( ) ( )

( ) ( )

equiv

times 55

45

35

35

45

1231

4 3 345

( )

( ) ( ) ( ) ( )( ) ( )

k

A k A k A k A kp p s

⎡⎣⎢

⎤⎦⎥

= ⎡⎣⎢

minus

⎤⎤⎦⎥

minus1

(13)

We have shown that the total cost for the compositealignment is given by the sum of the sum of alignmentcosts but as with all progressive alignment methodsthe minimum-cost composite alignment may not be theglobal minimum cost alignmentThe complete procedure then is

1 Compute all of the minimum pairwise alignmentcosts cij using dynamic programming2 Assemble the two sub-assemblies with least align-ment cost and replace them with the new assemblythat results Compute the pairwise costs between theremaining subassemblies and the new assembly3 Continue at step 2 until total assembly cost canno longer be reduced

ValidationWe validated the method by inferring the human inter-feron-alpha (IFNA) gene family directly from the humanwhole-genome sequencing trace archives and compar-ing the results to IFNA gene sequences known from thefinished human genome (as well as from decades ofmolecular biology on the human IFNA) We performedthis procedure in three different waysFirst we collected relevant sequencing traces by per-

forming a BLAST search using the 567 nt-long protein-coding portion of the human IFNA2 transcript(NM_000605) and retrieved 291 traces with expectedvalues less than 10-6 We trimmed each sequence andapplied the assembly method described aboveThe procedure resulted in 15 final assemblies We

compared the inferred genes to the human genomereference assembly Each of these 15 assemblies mapsunambiguously to one of the 13 known IFNA genes orthe single whole pseudogene at an average error rate of60 times 10-4 per base (14 mismatches in 23374 bases) Inone case a pair of assemblies mapped to a single geno-mic gene IFNA17 There are 2 single nucleotide differ-ences between the two assemblies mapping to IFNA17and one dinucleotide difference Each of the singlenucleotide differences is strongly supported with multi-ple high-quality reads supporting the inference Thedinucleotide difference is less well supported with just asingle trace covering it in one assembly though the

corresponding bases in this trace are good quality Theevidence strongly suggests that the two assemblies domap to two different alleles of the same gene Acceptingthis conclusion reduces the error rate to less than 4 times10-4 per base The 10 remaining mismatches include asingle 4-nucleotide deletion relative to the genomicsequenceFor the next two validation procedures our intent was

to more accurately represent the situation in which theassembler would be used For this purpose we used oneof the Pvampyrus putative IFNA genes rather thanhuman IFNA2 as the BLAST query and selected a sub-sample of the retrieved sequences to simulate lowcoverageThere were 74 trace reads in the subset produced by

the Whitehead Institute Assembling these reads usingour method produced 13 distinct assemblies with two ormore reads in each mapping (as described above) to 12different human genomic regions 7 IFNAs humanIFNW1 LOC100130866 (annotated by NCBI as similarto IFNW1) 2 IFNA pseudogenes and 1 IFNW pseudo-gene One pair of distinct assemblies was mapped erro-neously to a single IFNA The subset of trace readsfrom the Venter Institute consisted of 103 sequencesproducing 22 distinct assemblies with 2 or more readsmapping to 22 different regions of the human genomeall 13 IFNAs 1 IFNA pseudogene IFNW1 and 6regions annotated as either IFNW-like or IFNW pseu-dogenes The alignments of these assemblies to the gen-ome revealed 3 mismatches in 31531 total bases Onepair of assemblies was erroneously mapped to a singleIFNA pseudogene

Type I interferons in Myotis lucifugus and PteropusvampyrusWe used the protein-coding portions of the humangenes IFNA2 IFNK IFNB and IFNE the bovine geneIFNT and murine gene IFNZ in blast searches againstthe whole genome shotgun sequence trace archives forPteropus vampyrus and Myotis lucifugus The sequencesretrieved in each of the blast searches were thenassembled using the methods described aboveWe obtained unique final assemblies for IFNB in each

bat Each of these final assemblies contained an intactORF which when BLASTed against the NCBI nucleo-tide sequence database returned multiple hits on thecorresponding gene in several mammals The greatestsimilarity was to horse pig and cow sequences withabout 80 nucleotide identityMyotis produced two final assemblies for IFNE both

of which contain intact ORFs and appear on closeexamination to be genuinely distinct sequences Ptero-pus produced a single final assembly which had anintact ORF

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 5 of 12

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 2: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

order Chiroptera to which the bats belong is the secondlargest mammalian order behind only Rodentia in totalspecies [15] On the other hand the unique lifestylesand ecological circumstances of bats makes plausible theidea that the coevolution of bats and their viruses pro-duced unusually virulent viruses and a host antiviralresponse tuned to suppress themIf the latter is the case it will be of great interest to

understand the ways in which the several componentsof the bat antiviral response differ from those in humansand other mammals The main hurdle to the pursuit ofthis goal is our paucity of knowledge of bat immunologyand almost complete lack of reagents to initiate suchstudies What we do have now that was not available afew years ago is the information obtained through large-scale sequencing projects This information resides forthe most part in partially-or unassembled genomesequences and is not practically available for such pur-poses without further technical developmentsOur aims in this paper are to describe the method we

have devised for the partial assembly of genome sequen-cing traces for the inference of gene families to validatethis method on the human interferon alpha family andto use it for the inference of the type-I interferonfamilies in the bats Myotis lucifugus (little brown bat)and Pteropus vampyrus (large flying fox)Most pathogenic viruses including all of the viruses

listed above possess one or more genes that directly antag-onize the type-I interferon response [16-19] indicating astrong coevolutionary pressure between these viruses andthe type-I interferon system In experimental infections ofmice with Rabies virus the investigators noted a markedcorrelation between survival and IFN production [20]Type I inteferons are the primary mediators of one of the

earliest stages of the antiviral response in mammals Theyare induced rapidly and secreted upon viral infection Cellsexposed to these cytokines enter an infection-refractorystate induced by signaling through the common type-Iinterferon receptor complex In this state transcriptionand translation of viral gene products is inhibited MHCclass I expression is upregulated RNA and protein degra-dation are accelerated (reviewed in depth in refs [21-23])There are 13 human functional IFNa genes as well as sin-gle functional genes for IFNb IFNε IFNω and IFN Thetype 1 interferons are highly pleiomorphic exhibiting bothdistinct and complex overlapping function [2124] All ofthe known mammalian type-I interferons genes are intron-less their transcripts are neither spliced nor edited beforetranslation and expression is controlled at the transcrip-tional level [232526]

Bat Genome ProjectsThere are two bat whole-genome sequencing projects inprogress but neither is complete at this point One of

them will provide coverage at a level that will not producea reliable assembly There are in fact a great many sequen-cing projects underway for which there is no intent to dofull coverage [27] The Myotis lucifugus genome sequen-cing project has completed sequencing to approximately7-fold coverage with 27486306 traces thus far The gen-ome is as of this writing being assembled httpwwwbroadinstituteorgscienceprojectsmammals-modelsbrown-batlittle-brown-bat The Pteropus vampyrus gen-ome sequencing project has produced 8051001 genometraces httpwwwhgscbcmtmceduproject-species-m-MegabathgscpageLocation=Megabat and is slated forcompletion with 2x coverage The Pteropus project isbased on samples from a single individual bat (personalcommunication)The consequence of incomplete coverage is the

exacerbation of one of the shortcomings of whole-gen-ome shotgun sequencing the difficulty of resolvingrepetitive DNA segments [28] If two sequencing tracescontain regions of similarity it may be difficult orimpossible to determine whether these traces arederived from the same underlying DNA or from twodistinct DNA segments that are themselves paralogousThis difficulty is not limited to the assembly of highlyrepetitious intergenic regions but to the inference ofgene families as wellThis latter problem is particularly unfortunate

because gene duplication is a major source of innovativepotential in evolution and the comparative study ofgene families among related species is therefore of greatinterest [29] Furthermore a large proportion of genesin eukaryotic genomes reside in families including thegenes that encode the type-I interferonsIn this paper we describe a method for the inference of

gene family members from unassembled sequencingtraces The method is conceptually straightforward andis based on an information-theoretic model that accountsfor both sequencing error and evolutionary divergenceproviding the means to encode the set of sequencingtraces We then seek those partial assemblies that makethe total description length of the combined set ofsequencing traces as small as possible This reconstruc-tion provides an estimate of the number of genes in thefamily and posterior probability mass functions on theDNA sequences of these genesWe first present the model and the algorithm we have

developed to minimize the description length and therebyinfer the structure of the gene family We validate themethods by reconstructing the human type-I interferonfamily genes from sequencing traces from the human gen-ome project We use this method to infer the type-I inter-feron families from the sequencing traces from the Myotislucifugus and Pteropus vampyrus genome sequencing pro-jects We examine these genes in comparison with the

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 2 of 12

orthologous families in humans and other mammalsFinally we confirm our inferences by cloning and sequen-cing genes from four of the interferon families

ResultsInference of Gene Families From Trace ArchivesIn this section we present the mathematical basis of theassembly method we have developed and provide a con-cise algorithmic implementationWe assume a set S of sequencing traces containing

regions with similarity to some given gene or genes Inpractice this seed gene would be an ortholog of thegenes one is attempting to infer one collects S usingsimilarity searching on the complete trace archive forthe species of interestThere are g genes in the family where g is an unknown

integer Each of these genes is related to a commonancestor a (whose sequence is also unknown) and differsfrom a by the accumulation of mutations includinginsertions and deletions Each sequencing trace resultsfrom direct templating from one of these genes and dif-fers from it by sequencing error only Figure 1 illustratesthe statistical model used for the inferential procedureGiven any two sequencing traces in S they are related

either through sharing the same immediate template(traces 5 and 6 in figure 1 for example) or indirectlythrough the ancestor a with each having a distinctimmediate template These two alternatives correspondto two different likelihoods In the first the differencesbetween the two traces are attributable to sequencingerror only and must be consistent with the reportedposition-specific quality scores In the latter the diver-gence of the underlying genes from their commonancestor provides an additional source of differencesThe difference in description lengths under these twomodels provides a criterion for choosing between the

models In fact the criterion we will use for determiningthe final set of assemblies is the minimization of thetotal description length of S under the model in figure1 Briefly the description length is the informationrequired to encode the data and the values of the modelparameters (See [30] for a complete treatment ofdescription length techniques for inference) For anysub-model for pairs of traces that involves mutationsfrom the common ancestor the description length mustaccount for the information required to encode themutation rate If y is the overlap in nucleotides betweenthe two sequencing traces the cost of encoding themutation rate is log yThe objective of the assembly procedure is the mini-

mization of the description length over all topologies ofthe form shown in figure 1 but there is no method forfinding this minimum short of exhaustive enumerationover all topologies which is ruled out by practical con-siderations for all but the smallest S We thereforeadopt a greedy progressive method familiar from its usein multiple sequence alignment At each stage we findthe pair of sequences (traces or partial assemblies) thatprovide the maximum saving in description length andassemble them replacing the pair by the new assemblythey have formed The process continues until there areno remaining moves that lead to a description lengthreductionThe following section provides the details for the

probability modelQuality scores and conditional probabilitiesThe base-calling software reads the raw sequencingtrace and for each position reports a single base and aquality score If the reported base is X and the reportedquality score is q we take the output to represent a pos-terior probability mass function (pmf) on the latentnucleotide state at that position given by

f Dq X

q( | )

( )

( )

=minus =⎧

⎨⎩

1

4

for

else(1)

where D represents the data conditioning the estimateand ε(q) is defined by

( ) q q= minus10 10 (2)

It is not necessary to specify a likelihood function oreven to describe what the nature of D is so we simplywrite p j

i( ) for the posterior probability mass functionon ξ in the jth position in the ith sequencing trace Forsimplicity in this presentation we will assume that theprior pmf on ξ is uniform on A C G T -We will also need the posterior pmf on the ancestral

state a which is obtained by specifying a model for theaccumulation of mutations from a to ξ and summing

Figure 1 Schematic of the statistical model The observedsequencing traces Xi result from the observation of genes ξ Thesegenes are related by descent from the common ancestor a

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 3 of 12

over the unobserved state ξ We take a simple reversiblemutation model

Prob( | )

=

minus =⎧⎨⎩

1

4

for

else(3)

and let

p pji

ji( | ) Prob( | ) ( )

= sum (4)

Consider the model partially described by having posi-tion i in trace 1 aligned with position j in trace 2 Theposterior pmf for the latent true nucleotide state is

pp p

ki j

ij

121 2

12

5( )

( ) ( )ξ

ξ ξ

ϕ= (5)

where k is the position in the assembly correspondingto i and j in traces 1 and 2 respectively ij

12 is given by

ij i j

A C G T

p p12 1 25equivisin minus

sum ( ) ( )

(6)

Since the criterion is the minimization of the descrip-tion length we compare any two models by the differencein their description lengths The alignment cost then isthe difference in description length between the model inwhich these positions are aligned (given the alignment)and the one in which they are independent and is give by

sij ij12 12= log (7)

Denote by A(k) the vector whose components are theindices for the aligned traces corresponding to the kthposition in the resulting assembly Then the cost of thealignment is

S sA k A k

k

A12 12

1 2= sum ( ) ( ) (8)

For the case where traces 1 and 2 are related indir-ectly through a common ancestor we similarly have

S sA k A k

k

A12 12

1 2( ) ( )( ) ( ) = sum (9)

with the appropriate and obvious definitions The quan-tity of primary importance for us is the cost differencebetween the direct and indirect models for a given pair oftraces

c S

S o

1212

12 1212

=

+

min

min min ( ) log ( )

AA

AA A

μμ μ

(10)

The logarithmic term is the cost of encoding themutation rate that minimizes the alignment score [30]the argument oA

12( ) is the number of nucleotides inthe overlap of traces 1 and 2 in the optimal alignmentie the number of degrees of freedom available to esti-mate μ When c is positive the indirect model is themore efficient when c is negative the direct model ismore efficientFor progressive alignment note that the posterior assem-

bly pmf and the cost are composed from the individualsequential alignments Suppose we first align traces 1 and2 to obtain assembly 4 and then align trace 3 to assembly4 to produce assembly 5 as shown in figure 2 We need torefine our notation for indexing by letting A ki

a( ) refer tothe position in trace i corresponding to position k inassembly a So for the example of figure 2 we have

A k A A ki i5 4

45( ) ( ( ))= (11)

It is then straightforward to show that the alignmentscores are additive

s

p p p

A k A k A k

A k A k A

15

25

35

15

15

1

123

1 2

2 5( ) ( ) ( )

( ) ( )

log

log ( ) ( )

equiv minus

minus 55

14

45

24

45

45

35

3

12 34

( )

( ( )) ( ( )) ( ) ( )

( )k

A A k A A k A k A ks s

sum

= +

(12)

Figure 2 Diagram indicating the labeling of traces and assembliesfor the discussion of progressive alignment in the text

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 4 of 12

and that the assembly posterior pmfs are obtained bycomposition

p p p p

s

k A k A k A k

A k A k A

5 2 1 2 3515

25

35

15

25

3

( ) ( ) ( ) ( )( ) ( ) ( )

( ) ( )

equiv

times 55

45

35

35

45

1231

4 3 345

( )

( ) ( ) ( ) ( )( ) ( )

k

A k A k A k A kp p s

⎡⎣⎢

⎤⎦⎥

= ⎡⎣⎢

minus

⎤⎤⎦⎥

minus1

(13)

We have shown that the total cost for the compositealignment is given by the sum of the sum of alignmentcosts but as with all progressive alignment methodsthe minimum-cost composite alignment may not be theglobal minimum cost alignmentThe complete procedure then is

1 Compute all of the minimum pairwise alignmentcosts cij using dynamic programming2 Assemble the two sub-assemblies with least align-ment cost and replace them with the new assemblythat results Compute the pairwise costs between theremaining subassemblies and the new assembly3 Continue at step 2 until total assembly cost canno longer be reduced

ValidationWe validated the method by inferring the human inter-feron-alpha (IFNA) gene family directly from the humanwhole-genome sequencing trace archives and compar-ing the results to IFNA gene sequences known from thefinished human genome (as well as from decades ofmolecular biology on the human IFNA) We performedthis procedure in three different waysFirst we collected relevant sequencing traces by per-

forming a BLAST search using the 567 nt-long protein-coding portion of the human IFNA2 transcript(NM_000605) and retrieved 291 traces with expectedvalues less than 10-6 We trimmed each sequence andapplied the assembly method described aboveThe procedure resulted in 15 final assemblies We

compared the inferred genes to the human genomereference assembly Each of these 15 assemblies mapsunambiguously to one of the 13 known IFNA genes orthe single whole pseudogene at an average error rate of60 times 10-4 per base (14 mismatches in 23374 bases) Inone case a pair of assemblies mapped to a single geno-mic gene IFNA17 There are 2 single nucleotide differ-ences between the two assemblies mapping to IFNA17and one dinucleotide difference Each of the singlenucleotide differences is strongly supported with multi-ple high-quality reads supporting the inference Thedinucleotide difference is less well supported with just asingle trace covering it in one assembly though the

corresponding bases in this trace are good quality Theevidence strongly suggests that the two assemblies domap to two different alleles of the same gene Acceptingthis conclusion reduces the error rate to less than 4 times10-4 per base The 10 remaining mismatches include asingle 4-nucleotide deletion relative to the genomicsequenceFor the next two validation procedures our intent was

to more accurately represent the situation in which theassembler would be used For this purpose we used oneof the Pvampyrus putative IFNA genes rather thanhuman IFNA2 as the BLAST query and selected a sub-sample of the retrieved sequences to simulate lowcoverageThere were 74 trace reads in the subset produced by

the Whitehead Institute Assembling these reads usingour method produced 13 distinct assemblies with two ormore reads in each mapping (as described above) to 12different human genomic regions 7 IFNAs humanIFNW1 LOC100130866 (annotated by NCBI as similarto IFNW1) 2 IFNA pseudogenes and 1 IFNW pseudo-gene One pair of distinct assemblies was mapped erro-neously to a single IFNA The subset of trace readsfrom the Venter Institute consisted of 103 sequencesproducing 22 distinct assemblies with 2 or more readsmapping to 22 different regions of the human genomeall 13 IFNAs 1 IFNA pseudogene IFNW1 and 6regions annotated as either IFNW-like or IFNW pseu-dogenes The alignments of these assemblies to the gen-ome revealed 3 mismatches in 31531 total bases Onepair of assemblies was erroneously mapped to a singleIFNA pseudogene

Type I interferons in Myotis lucifugus and PteropusvampyrusWe used the protein-coding portions of the humangenes IFNA2 IFNK IFNB and IFNE the bovine geneIFNT and murine gene IFNZ in blast searches againstthe whole genome shotgun sequence trace archives forPteropus vampyrus and Myotis lucifugus The sequencesretrieved in each of the blast searches were thenassembled using the methods described aboveWe obtained unique final assemblies for IFNB in each

bat Each of these final assemblies contained an intactORF which when BLASTed against the NCBI nucleo-tide sequence database returned multiple hits on thecorresponding gene in several mammals The greatestsimilarity was to horse pig and cow sequences withabout 80 nucleotide identityMyotis produced two final assemblies for IFNE both

of which contain intact ORFs and appear on closeexamination to be genuinely distinct sequences Ptero-pus produced a single final assembly which had anintact ORF

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 5 of 12

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 3: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

orthologous families in humans and other mammalsFinally we confirm our inferences by cloning and sequen-cing genes from four of the interferon families

ResultsInference of Gene Families From Trace ArchivesIn this section we present the mathematical basis of theassembly method we have developed and provide a con-cise algorithmic implementationWe assume a set S of sequencing traces containing

regions with similarity to some given gene or genes Inpractice this seed gene would be an ortholog of thegenes one is attempting to infer one collects S usingsimilarity searching on the complete trace archive forthe species of interestThere are g genes in the family where g is an unknown

integer Each of these genes is related to a commonancestor a (whose sequence is also unknown) and differsfrom a by the accumulation of mutations includinginsertions and deletions Each sequencing trace resultsfrom direct templating from one of these genes and dif-fers from it by sequencing error only Figure 1 illustratesthe statistical model used for the inferential procedureGiven any two sequencing traces in S they are related

either through sharing the same immediate template(traces 5 and 6 in figure 1 for example) or indirectlythrough the ancestor a with each having a distinctimmediate template These two alternatives correspondto two different likelihoods In the first the differencesbetween the two traces are attributable to sequencingerror only and must be consistent with the reportedposition-specific quality scores In the latter the diver-gence of the underlying genes from their commonancestor provides an additional source of differencesThe difference in description lengths under these twomodels provides a criterion for choosing between the

models In fact the criterion we will use for determiningthe final set of assemblies is the minimization of thetotal description length of S under the model in figure1 Briefly the description length is the informationrequired to encode the data and the values of the modelparameters (See [30] for a complete treatment ofdescription length techniques for inference) For anysub-model for pairs of traces that involves mutationsfrom the common ancestor the description length mustaccount for the information required to encode themutation rate If y is the overlap in nucleotides betweenthe two sequencing traces the cost of encoding themutation rate is log yThe objective of the assembly procedure is the mini-

mization of the description length over all topologies ofthe form shown in figure 1 but there is no method forfinding this minimum short of exhaustive enumerationover all topologies which is ruled out by practical con-siderations for all but the smallest S We thereforeadopt a greedy progressive method familiar from its usein multiple sequence alignment At each stage we findthe pair of sequences (traces or partial assemblies) thatprovide the maximum saving in description length andassemble them replacing the pair by the new assemblythey have formed The process continues until there areno remaining moves that lead to a description lengthreductionThe following section provides the details for the

probability modelQuality scores and conditional probabilitiesThe base-calling software reads the raw sequencingtrace and for each position reports a single base and aquality score If the reported base is X and the reportedquality score is q we take the output to represent a pos-terior probability mass function (pmf) on the latentnucleotide state at that position given by

f Dq X

q( | )

( )

( )

=minus =⎧

⎨⎩

1

4

for

else(1)

where D represents the data conditioning the estimateand ε(q) is defined by

( ) q q= minus10 10 (2)

It is not necessary to specify a likelihood function oreven to describe what the nature of D is so we simplywrite p j

i( ) for the posterior probability mass functionon ξ in the jth position in the ith sequencing trace Forsimplicity in this presentation we will assume that theprior pmf on ξ is uniform on A C G T -We will also need the posterior pmf on the ancestral

state a which is obtained by specifying a model for theaccumulation of mutations from a to ξ and summing

Figure 1 Schematic of the statistical model The observedsequencing traces Xi result from the observation of genes ξ Thesegenes are related by descent from the common ancestor a

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 3 of 12

over the unobserved state ξ We take a simple reversiblemutation model

Prob( | )

=

minus =⎧⎨⎩

1

4

for

else(3)

and let

p pji

ji( | ) Prob( | ) ( )

= sum (4)

Consider the model partially described by having posi-tion i in trace 1 aligned with position j in trace 2 Theposterior pmf for the latent true nucleotide state is

pp p

ki j

ij

121 2

12

5( )

( ) ( )ξ

ξ ξ

ϕ= (5)

where k is the position in the assembly correspondingto i and j in traces 1 and 2 respectively ij

12 is given by

ij i j

A C G T

p p12 1 25equivisin minus

sum ( ) ( )

(6)

Since the criterion is the minimization of the descrip-tion length we compare any two models by the differencein their description lengths The alignment cost then isthe difference in description length between the model inwhich these positions are aligned (given the alignment)and the one in which they are independent and is give by

sij ij12 12= log (7)

Denote by A(k) the vector whose components are theindices for the aligned traces corresponding to the kthposition in the resulting assembly Then the cost of thealignment is

S sA k A k

k

A12 12

1 2= sum ( ) ( ) (8)

For the case where traces 1 and 2 are related indir-ectly through a common ancestor we similarly have

S sA k A k

k

A12 12

1 2( ) ( )( ) ( ) = sum (9)

with the appropriate and obvious definitions The quan-tity of primary importance for us is the cost differencebetween the direct and indirect models for a given pair oftraces

c S

S o

1212

12 1212

=

+

min

min min ( ) log ( )

AA

AA A

μμ μ

(10)

The logarithmic term is the cost of encoding themutation rate that minimizes the alignment score [30]the argument oA

12( ) is the number of nucleotides inthe overlap of traces 1 and 2 in the optimal alignmentie the number of degrees of freedom available to esti-mate μ When c is positive the indirect model is themore efficient when c is negative the direct model ismore efficientFor progressive alignment note that the posterior assem-

bly pmf and the cost are composed from the individualsequential alignments Suppose we first align traces 1 and2 to obtain assembly 4 and then align trace 3 to assembly4 to produce assembly 5 as shown in figure 2 We need torefine our notation for indexing by letting A ki

a( ) refer tothe position in trace i corresponding to position k inassembly a So for the example of figure 2 we have

A k A A ki i5 4

45( ) ( ( ))= (11)

It is then straightforward to show that the alignmentscores are additive

s

p p p

A k A k A k

A k A k A

15

25

35

15

15

1

123

1 2

2 5( ) ( ) ( )

( ) ( )

log

log ( ) ( )

equiv minus

minus 55

14

45

24

45

45

35

3

12 34

( )

( ( )) ( ( )) ( ) ( )

( )k

A A k A A k A k A ks s

sum

= +

(12)

Figure 2 Diagram indicating the labeling of traces and assembliesfor the discussion of progressive alignment in the text

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 4 of 12

and that the assembly posterior pmfs are obtained bycomposition

p p p p

s

k A k A k A k

A k A k A

5 2 1 2 3515

25

35

15

25

3

( ) ( ) ( ) ( )( ) ( ) ( )

( ) ( )

equiv

times 55

45

35

35

45

1231

4 3 345

( )

( ) ( ) ( ) ( )( ) ( )

k

A k A k A k A kp p s

⎡⎣⎢

⎤⎦⎥

= ⎡⎣⎢

minus

⎤⎤⎦⎥

minus1

(13)

We have shown that the total cost for the compositealignment is given by the sum of the sum of alignmentcosts but as with all progressive alignment methodsthe minimum-cost composite alignment may not be theglobal minimum cost alignmentThe complete procedure then is

1 Compute all of the minimum pairwise alignmentcosts cij using dynamic programming2 Assemble the two sub-assemblies with least align-ment cost and replace them with the new assemblythat results Compute the pairwise costs between theremaining subassemblies and the new assembly3 Continue at step 2 until total assembly cost canno longer be reduced

ValidationWe validated the method by inferring the human inter-feron-alpha (IFNA) gene family directly from the humanwhole-genome sequencing trace archives and compar-ing the results to IFNA gene sequences known from thefinished human genome (as well as from decades ofmolecular biology on the human IFNA) We performedthis procedure in three different waysFirst we collected relevant sequencing traces by per-

forming a BLAST search using the 567 nt-long protein-coding portion of the human IFNA2 transcript(NM_000605) and retrieved 291 traces with expectedvalues less than 10-6 We trimmed each sequence andapplied the assembly method described aboveThe procedure resulted in 15 final assemblies We

compared the inferred genes to the human genomereference assembly Each of these 15 assemblies mapsunambiguously to one of the 13 known IFNA genes orthe single whole pseudogene at an average error rate of60 times 10-4 per base (14 mismatches in 23374 bases) Inone case a pair of assemblies mapped to a single geno-mic gene IFNA17 There are 2 single nucleotide differ-ences between the two assemblies mapping to IFNA17and one dinucleotide difference Each of the singlenucleotide differences is strongly supported with multi-ple high-quality reads supporting the inference Thedinucleotide difference is less well supported with just asingle trace covering it in one assembly though the

corresponding bases in this trace are good quality Theevidence strongly suggests that the two assemblies domap to two different alleles of the same gene Acceptingthis conclusion reduces the error rate to less than 4 times10-4 per base The 10 remaining mismatches include asingle 4-nucleotide deletion relative to the genomicsequenceFor the next two validation procedures our intent was

to more accurately represent the situation in which theassembler would be used For this purpose we used oneof the Pvampyrus putative IFNA genes rather thanhuman IFNA2 as the BLAST query and selected a sub-sample of the retrieved sequences to simulate lowcoverageThere were 74 trace reads in the subset produced by

the Whitehead Institute Assembling these reads usingour method produced 13 distinct assemblies with two ormore reads in each mapping (as described above) to 12different human genomic regions 7 IFNAs humanIFNW1 LOC100130866 (annotated by NCBI as similarto IFNW1) 2 IFNA pseudogenes and 1 IFNW pseudo-gene One pair of distinct assemblies was mapped erro-neously to a single IFNA The subset of trace readsfrom the Venter Institute consisted of 103 sequencesproducing 22 distinct assemblies with 2 or more readsmapping to 22 different regions of the human genomeall 13 IFNAs 1 IFNA pseudogene IFNW1 and 6regions annotated as either IFNW-like or IFNW pseu-dogenes The alignments of these assemblies to the gen-ome revealed 3 mismatches in 31531 total bases Onepair of assemblies was erroneously mapped to a singleIFNA pseudogene

Type I interferons in Myotis lucifugus and PteropusvampyrusWe used the protein-coding portions of the humangenes IFNA2 IFNK IFNB and IFNE the bovine geneIFNT and murine gene IFNZ in blast searches againstthe whole genome shotgun sequence trace archives forPteropus vampyrus and Myotis lucifugus The sequencesretrieved in each of the blast searches were thenassembled using the methods described aboveWe obtained unique final assemblies for IFNB in each

bat Each of these final assemblies contained an intactORF which when BLASTed against the NCBI nucleo-tide sequence database returned multiple hits on thecorresponding gene in several mammals The greatestsimilarity was to horse pig and cow sequences withabout 80 nucleotide identityMyotis produced two final assemblies for IFNE both

of which contain intact ORFs and appear on closeexamination to be genuinely distinct sequences Ptero-pus produced a single final assembly which had anintact ORF

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 5 of 12

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 4: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

over the unobserved state ξ We take a simple reversiblemutation model

Prob( | )

=

minus =⎧⎨⎩

1

4

for

else(3)

and let

p pji

ji( | ) Prob( | ) ( )

= sum (4)

Consider the model partially described by having posi-tion i in trace 1 aligned with position j in trace 2 Theposterior pmf for the latent true nucleotide state is

pp p

ki j

ij

121 2

12

5( )

( ) ( )ξ

ξ ξ

ϕ= (5)

where k is the position in the assembly correspondingto i and j in traces 1 and 2 respectively ij

12 is given by

ij i j

A C G T

p p12 1 25equivisin minus

sum ( ) ( )

(6)

Since the criterion is the minimization of the descrip-tion length we compare any two models by the differencein their description lengths The alignment cost then isthe difference in description length between the model inwhich these positions are aligned (given the alignment)and the one in which they are independent and is give by

sij ij12 12= log (7)

Denote by A(k) the vector whose components are theindices for the aligned traces corresponding to the kthposition in the resulting assembly Then the cost of thealignment is

S sA k A k

k

A12 12

1 2= sum ( ) ( ) (8)

For the case where traces 1 and 2 are related indir-ectly through a common ancestor we similarly have

S sA k A k

k

A12 12

1 2( ) ( )( ) ( ) = sum (9)

with the appropriate and obvious definitions The quan-tity of primary importance for us is the cost differencebetween the direct and indirect models for a given pair oftraces

c S

S o

1212

12 1212

=

+

min

min min ( ) log ( )

AA

AA A

μμ μ

(10)

The logarithmic term is the cost of encoding themutation rate that minimizes the alignment score [30]the argument oA

12( ) is the number of nucleotides inthe overlap of traces 1 and 2 in the optimal alignmentie the number of degrees of freedom available to esti-mate μ When c is positive the indirect model is themore efficient when c is negative the direct model ismore efficientFor progressive alignment note that the posterior assem-

bly pmf and the cost are composed from the individualsequential alignments Suppose we first align traces 1 and2 to obtain assembly 4 and then align trace 3 to assembly4 to produce assembly 5 as shown in figure 2 We need torefine our notation for indexing by letting A ki

a( ) refer tothe position in trace i corresponding to position k inassembly a So for the example of figure 2 we have

A k A A ki i5 4

45( ) ( ( ))= (11)

It is then straightforward to show that the alignmentscores are additive

s

p p p

A k A k A k

A k A k A

15

25

35

15

15

1

123

1 2

2 5( ) ( ) ( )

( ) ( )

log

log ( ) ( )

equiv minus

minus 55

14

45

24

45

45

35

3

12 34

( )

( ( )) ( ( )) ( ) ( )

( )k

A A k A A k A k A ks s

sum

= +

(12)

Figure 2 Diagram indicating the labeling of traces and assembliesfor the discussion of progressive alignment in the text

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 4 of 12

and that the assembly posterior pmfs are obtained bycomposition

p p p p

s

k A k A k A k

A k A k A

5 2 1 2 3515

25

35

15

25

3

( ) ( ) ( ) ( )( ) ( ) ( )

( ) ( )

equiv

times 55

45

35

35

45

1231

4 3 345

( )

( ) ( ) ( ) ( )( ) ( )

k

A k A k A k A kp p s

⎡⎣⎢

⎤⎦⎥

= ⎡⎣⎢

minus

⎤⎤⎦⎥

minus1

(13)

We have shown that the total cost for the compositealignment is given by the sum of the sum of alignmentcosts but as with all progressive alignment methodsthe minimum-cost composite alignment may not be theglobal minimum cost alignmentThe complete procedure then is

1 Compute all of the minimum pairwise alignmentcosts cij using dynamic programming2 Assemble the two sub-assemblies with least align-ment cost and replace them with the new assemblythat results Compute the pairwise costs between theremaining subassemblies and the new assembly3 Continue at step 2 until total assembly cost canno longer be reduced

ValidationWe validated the method by inferring the human inter-feron-alpha (IFNA) gene family directly from the humanwhole-genome sequencing trace archives and compar-ing the results to IFNA gene sequences known from thefinished human genome (as well as from decades ofmolecular biology on the human IFNA) We performedthis procedure in three different waysFirst we collected relevant sequencing traces by per-

forming a BLAST search using the 567 nt-long protein-coding portion of the human IFNA2 transcript(NM_000605) and retrieved 291 traces with expectedvalues less than 10-6 We trimmed each sequence andapplied the assembly method described aboveThe procedure resulted in 15 final assemblies We

compared the inferred genes to the human genomereference assembly Each of these 15 assemblies mapsunambiguously to one of the 13 known IFNA genes orthe single whole pseudogene at an average error rate of60 times 10-4 per base (14 mismatches in 23374 bases) Inone case a pair of assemblies mapped to a single geno-mic gene IFNA17 There are 2 single nucleotide differ-ences between the two assemblies mapping to IFNA17and one dinucleotide difference Each of the singlenucleotide differences is strongly supported with multi-ple high-quality reads supporting the inference Thedinucleotide difference is less well supported with just asingle trace covering it in one assembly though the

corresponding bases in this trace are good quality Theevidence strongly suggests that the two assemblies domap to two different alleles of the same gene Acceptingthis conclusion reduces the error rate to less than 4 times10-4 per base The 10 remaining mismatches include asingle 4-nucleotide deletion relative to the genomicsequenceFor the next two validation procedures our intent was

to more accurately represent the situation in which theassembler would be used For this purpose we used oneof the Pvampyrus putative IFNA genes rather thanhuman IFNA2 as the BLAST query and selected a sub-sample of the retrieved sequences to simulate lowcoverageThere were 74 trace reads in the subset produced by

the Whitehead Institute Assembling these reads usingour method produced 13 distinct assemblies with two ormore reads in each mapping (as described above) to 12different human genomic regions 7 IFNAs humanIFNW1 LOC100130866 (annotated by NCBI as similarto IFNW1) 2 IFNA pseudogenes and 1 IFNW pseudo-gene One pair of distinct assemblies was mapped erro-neously to a single IFNA The subset of trace readsfrom the Venter Institute consisted of 103 sequencesproducing 22 distinct assemblies with 2 or more readsmapping to 22 different regions of the human genomeall 13 IFNAs 1 IFNA pseudogene IFNW1 and 6regions annotated as either IFNW-like or IFNW pseu-dogenes The alignments of these assemblies to the gen-ome revealed 3 mismatches in 31531 total bases Onepair of assemblies was erroneously mapped to a singleIFNA pseudogene

Type I interferons in Myotis lucifugus and PteropusvampyrusWe used the protein-coding portions of the humangenes IFNA2 IFNK IFNB and IFNE the bovine geneIFNT and murine gene IFNZ in blast searches againstthe whole genome shotgun sequence trace archives forPteropus vampyrus and Myotis lucifugus The sequencesretrieved in each of the blast searches were thenassembled using the methods described aboveWe obtained unique final assemblies for IFNB in each

bat Each of these final assemblies contained an intactORF which when BLASTed against the NCBI nucleo-tide sequence database returned multiple hits on thecorresponding gene in several mammals The greatestsimilarity was to horse pig and cow sequences withabout 80 nucleotide identityMyotis produced two final assemblies for IFNE both

of which contain intact ORFs and appear on closeexamination to be genuinely distinct sequences Ptero-pus produced a single final assembly which had anintact ORF

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 5 of 12

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 5: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

and that the assembly posterior pmfs are obtained bycomposition

p p p p

s

k A k A k A k

A k A k A

5 2 1 2 3515

25

35

15

25

3

( ) ( ) ( ) ( )( ) ( ) ( )

( ) ( )

equiv

times 55

45

35

35

45

1231

4 3 345

( )

( ) ( ) ( ) ( )( ) ( )

k

A k A k A k A kp p s

⎡⎣⎢

⎤⎦⎥

= ⎡⎣⎢

minus

⎤⎤⎦⎥

minus1

(13)

We have shown that the total cost for the compositealignment is given by the sum of the sum of alignmentcosts but as with all progressive alignment methodsthe minimum-cost composite alignment may not be theglobal minimum cost alignmentThe complete procedure then is

1 Compute all of the minimum pairwise alignmentcosts cij using dynamic programming2 Assemble the two sub-assemblies with least align-ment cost and replace them with the new assemblythat results Compute the pairwise costs between theremaining subassemblies and the new assembly3 Continue at step 2 until total assembly cost canno longer be reduced

ValidationWe validated the method by inferring the human inter-feron-alpha (IFNA) gene family directly from the humanwhole-genome sequencing trace archives and compar-ing the results to IFNA gene sequences known from thefinished human genome (as well as from decades ofmolecular biology on the human IFNA) We performedthis procedure in three different waysFirst we collected relevant sequencing traces by per-

forming a BLAST search using the 567 nt-long protein-coding portion of the human IFNA2 transcript(NM_000605) and retrieved 291 traces with expectedvalues less than 10-6 We trimmed each sequence andapplied the assembly method described aboveThe procedure resulted in 15 final assemblies We

compared the inferred genes to the human genomereference assembly Each of these 15 assemblies mapsunambiguously to one of the 13 known IFNA genes orthe single whole pseudogene at an average error rate of60 times 10-4 per base (14 mismatches in 23374 bases) Inone case a pair of assemblies mapped to a single geno-mic gene IFNA17 There are 2 single nucleotide differ-ences between the two assemblies mapping to IFNA17and one dinucleotide difference Each of the singlenucleotide differences is strongly supported with multi-ple high-quality reads supporting the inference Thedinucleotide difference is less well supported with just asingle trace covering it in one assembly though the

corresponding bases in this trace are good quality Theevidence strongly suggests that the two assemblies domap to two different alleles of the same gene Acceptingthis conclusion reduces the error rate to less than 4 times10-4 per base The 10 remaining mismatches include asingle 4-nucleotide deletion relative to the genomicsequenceFor the next two validation procedures our intent was

to more accurately represent the situation in which theassembler would be used For this purpose we used oneof the Pvampyrus putative IFNA genes rather thanhuman IFNA2 as the BLAST query and selected a sub-sample of the retrieved sequences to simulate lowcoverageThere were 74 trace reads in the subset produced by

the Whitehead Institute Assembling these reads usingour method produced 13 distinct assemblies with two ormore reads in each mapping (as described above) to 12different human genomic regions 7 IFNAs humanIFNW1 LOC100130866 (annotated by NCBI as similarto IFNW1) 2 IFNA pseudogenes and 1 IFNW pseudo-gene One pair of distinct assemblies was mapped erro-neously to a single IFNA The subset of trace readsfrom the Venter Institute consisted of 103 sequencesproducing 22 distinct assemblies with 2 or more readsmapping to 22 different regions of the human genomeall 13 IFNAs 1 IFNA pseudogene IFNW1 and 6regions annotated as either IFNW-like or IFNW pseu-dogenes The alignments of these assemblies to the gen-ome revealed 3 mismatches in 31531 total bases Onepair of assemblies was erroneously mapped to a singleIFNA pseudogene

Type I interferons in Myotis lucifugus and PteropusvampyrusWe used the protein-coding portions of the humangenes IFNA2 IFNK IFNB and IFNE the bovine geneIFNT and murine gene IFNZ in blast searches againstthe whole genome shotgun sequence trace archives forPteropus vampyrus and Myotis lucifugus The sequencesretrieved in each of the blast searches were thenassembled using the methods described aboveWe obtained unique final assemblies for IFNB in each

bat Each of these final assemblies contained an intactORF which when BLASTed against the NCBI nucleo-tide sequence database returned multiple hits on thecorresponding gene in several mammals The greatestsimilarity was to horse pig and cow sequences withabout 80 nucleotide identityMyotis produced two final assemblies for IFNE both

of which contain intact ORFs and appear on closeexamination to be genuinely distinct sequences Ptero-pus produced a single final assembly which had anintact ORF

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 5 of 12

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 6: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

The sequencing traces returned in the search onIFNA2 produced multiple final assemblies that fall intothree distinct families with clear similarity to IFNAIFNW and IFND (Table 1) respectivelyNo relevant hits to murine IFNZ were found in either

chiropteran genome No hits to bovine IFNT beyondthose already obtained using other queries were foundChiropteran Type II interferonIn mammals the type I interferons have a single exon offewer than 700 bases IFNG the type II interferon hasmultiple exons We used our methods to infer the type-II interferon genes in the two bats by using the full-length human IFNG gene comprising introns and exons(4972 bases NC_000012 REGION complement(6854855068553521)) Our methods produced a singleassembly in each bat (Additional File 1) All exons wereclearly identifiable and all intronic splice signals wereconserved Table 2 shows the sequence similarity amongthe human and bat sequences

PhylogeneticsThe intact ORFs of the inferred sequences were alignedwith the type-I interferons from humans and from thedomestic pig Sus scrofa (which is substantially moreclosely related to bats than is the human) and as anoutgroup the IFNB and IFNA gene sequences from theduckbill platypus (Ornithorhynchus anatinus) The phy-logenetic tree relating these genes was inferred usingmaximum likelihood (Figure 3)The features of the inferred phylogenetic tree are con-

sistent with those found in previous investigations [31]In particular the interferon families fall into distinctclades with the exception of the platypus IFNA geneswhich appear ancestral to the IFNA IFNW and IFNDgenes of the other species Within each of the families

the genes fall into clades according to species with nointerdigitation of genes from different species in anyclade (Figure 3) Additionally the bat genes from onebat species are typically but not universally more clo-sely related to the orthologous genes from the other batspecies than to those of pigs or humans The bat genesare typically but not universally more closely related tothe orthologous pig genes than to the orthologoushuman genes

SequencingWe used the inferred sequences to design sequencingprimers and make several clones from each of IFNAIFNB IFND and IFNK The results are as followsWe obtained DNA sequences from 19 independent

IFNB clones from two individual bats Among these 2distinct but very closely related genes were clearly dis-cernible One sequence is identical to the consensus ofall 19 sequences and is found in 7 of 12 clones fromone bat and all the clones from the other Given thisinterpretation the overall frequency of pre-sequencingPCR error was 16 times 10-3 per base (17 errors in 10792bases) The inferred IFNB genes are 91 identical to anIFNB isolated from Rousettus aegyptiacus [4] and 80identical to IFNB from the domestic pig Sus scrofa

Table 1 Summary of Final Assemblies in Myotis lucifugus and Pteropus vampyrus

Species Family assemblies Intact ORFs Pseudogenes Partial length (AA)

M lucifugus IFNB 1 1 0 0 186

IFNE 2 2 0 0 193

IFNK 2 2 0 0 208

IFNA 2 0 2 0 -

IFNW 25 12 7 9 195

IFND 19 11 3 5 1711

P vampyrus IFNB 1 1 0 0 185

IFNE 1 1 0 0 193

IFNK 1 1 0 0 201

IFNA 7 7 0 0 189

IFNW 28 18 8 1 1852

IFND 14 5 7 2 1713

1 two genes encode polypeptides of length 190

2 one at 187 and two at 195

3 one at 170

Table 2 DNA sequence similarity among human (Has)IFNG and inferred bat (Mlu Pva) IFNG assemblies forexons (above the diagonal) and introns (below thediagonal)

species Hsa Mlu Pva

Has ID 0752 0784

Mlu 0655 ID 0832

Pva 0711 0696 ID

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 6 of 12

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 7: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

Using IFNK primers we obtained sequences from 7clones from a single animal and found two distinctgenes differing from each other in two nucleotides Theerror rate was consistent with that found in the IFNBgenes at 17 times 10-3 per base (6 errors in 3430 bases)For IFND and IFNA respectively we collected

sequences from 32 and 52 clones from a single animalBecause PCR was used in an amplification step prior tocloning (unlike the process used for genomic sequen-cing) it is difficult to discern the expected genetic varia-bility from sequencing error For IFNA four sequenceswere recovered multiple times from independent proce-dures The mean number of mismatches in pairwisecomparisons is just under 7 The four sequences arewithin 1 2 5 and 6 bases of the most similar inferredIFNA gene For IFND there are three multiply-repre-sented sequences Two of these genes differ at 9 basesthe third which represents a pseudogene differs fromthe other two by more than 20 bases These genes differfrom their nearest inferred IFND sequences by 7 12and 20 bases respectively Additional file 2 contains an

alignment of the three multiply-represented IFNDclones compared to the IFND assembliesSince it was not possible to determine by inspection

what the underlying true gene sequences are or howmany there are We instead estimated the posteriorprobability on the number of genes in each family asdescribed in Methods (Figure 4) For IFNA the poster-ior probability is maximized at g = 9 genes the 90credible interval is (617) For IFND the greatest poster-ior probability occurs at g = 19 genes and the 90credible interval is (1528)

Gene expressionTo test whether the inferred genes are expressed underconditions known to induce typeI interferons we designed primers to Pteropus IFNB

and to a control housekeeping gene (PPIA) and a gene(OAS2) induced by paracrine IFNB signaling The lattertwo genes were also inferred by the methods describedFresh Pteropus PBMCs from two individual animals

were cultured under four different conditions with

Figure 3 Phylogenetic tree of the type-I interferons in Pteropus vampyrus (Pva) Myotis lucifigus (Mlu) Sus scrofa (Ssc) Homo sapiens(Hsa) and Ornithorhynchus anatinus (Oan) The tree was thinned for clarity by omitting one member of any pair of sequences differing byfewer than 3 amino acids The branch length is proportional to the evolutionary distance

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 7 of 12

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 8: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

lipopolysaccharide (LPS) with the synthetic RNA poly(IC) with Vesicular Stomatitis Virus (VSV) and withculture medium aloneWith LPS or poly(IC) treatment IFNB mRNA expres-

sion was increased 20-50 fold by 2 hours and decreasedslowly to near the starting value by 24 hours OASexpression lagged behind that of IFNB reaching its peakvalue by 8 hours declining from the peak level only par-tially by 24 hours In contrast cells treated with VSVhad peak levels of IFNB appear by 8 hours and remainalmost unchanged to 24 hours OAS expression rosemore slowly and had not reached a maximum level by24 hours (Figure 5)

DiscussionRecent research has demonstrated very clearly that therate of emergence of human pathogens continues toincreasing steadily and that the source of the majorityof these agents is wildlife [1] One of the most intriguingfindings of late is that bats are the natural reservoirs ofmany of the most pathogenic viruses in humansWhile it is known that microbe-host coevolution

drives pathogenicity in the natural host the effect ofsuch coevolution on alternative hosts has not beendescribed The development of many genome sequen-cing projects extending beyond domesticated animalsprovides an opportunity to begin such inquiries Thelow coverage levels of these projects and the fact that somany genes with immunological functions appear inlarge families of very similar genes requires the develop-ment of more precise inferential tools for their studyToward that end we have developed a method for the

assembly of genome sequence fragments for use in theinference of gene family members when the genome

coverage is too low for reliable complete assembly Wevalidated the method on raw unassembled traces fromthe human genome sequencing project finding that wecould assemble the known interferon alpha sequencesaccurately and further identify at least one case inwhich two alleles are presentIt should be noted that our method requires more

computation than assembly methods currently in usedue to the numerical minimization over the mutationfrequency required in Eq(10) and is therefore not suita-ble for large-scale assemblyUsing this method we inferred a total of 61 type-I

interferon genes with intact ORFs from the whole-gen-ome shotgun trace archives for two chiropteran speciesPteropus vampyrus and Myotis lucifugus We find thatthe largest of the IFN-I gene families in both bats com-prises genes orthologous to the IFNW genes in othermammals In humans mice and pigs there is just oneIFNW gene but up to two dozen members in each batA recent analysis of bovine type-I interferons from theassembled Bos taurus genome [32] finds 26 intactIFNW genes providing precedent for our otherwisestriking results In contrast the IFNA family is the lar-gest IFN-I family in several mammals includinghumans mice and pigs but appears to be smaller inPteropus and absent but for pseudogenes in Myotis Thegene family assembly from trace archives indicates thatthere are 7 intact IFNA in Pteropus analysis of directsequencing from PBMCs gives maximum posteriorprobability to the presence of 9 IFNA genes (Figure 4)with a 90 credible interval containing 6 to 16 genesCattle have 13 IFNA genes in spite of having a greatlyenlarged IFNW family [32]Both bats have multiple members in the IFND family

with five intact members and seven pseudogenes inPteropus and twelve intact members in Myotis IFNDhas been found as a functional gene only in pigs wherethe gene product is expressed in the placenta and playsa role in embryonic development [33] but is not sus-pected of involvement in the response to viruses It isstriking to us that this family seems to have been sodramatically enlarged in bats The size of this familysuggests that it may still be involved in host defense inbats even if it has lost that function in pigs Walker andRoberts [32] report finding three IFND pseudogenesbut no intact IFND in the cow It is worth noting thatthe placental type-I interferon in cattle is IFNT ratherthan IFND [34] Searching the bat trace archives withbovine IFNT did not produce hits that had not beenreturned with the other searches We find no evidenceof IFNT or IFNZ in either batFor IFNB IFNK and IFNE we find one member of

each in P vampyrus and one or two in M lucifugus Inthe case where we do find two genes we are confident

Figure 4 The posterior probability mass function on thenumber of genes in the IFNA and IFND families as estimatedby cloning and sequencing from peripheral blood cells fromPteropus vampyrus

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 8 of 12

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 9: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

that there the genes are distinct though they may repre-sent alleles rather than paralogsWe used the inferred sequences for P vampyrus

IFNA IFNB IFND and IFNK to design oligonucleotideprimers for cloning and sequencing and recovered atotal of 110 sequences The directly cloned sequencesvalidate much of the gene family inferences most of therepeated sequences are within a few bases of the nearestinferred gene A minority of the directly sequenced

genes are surprisingly far from the nearest inferredgene This circumstance may be an indication that thereare additional type-I interferon genes not covered by thePteropus sequencing traces and may also reflect signifi-cant population polymorphism in the wild batpopulationWe used these same primers to show using quantita-

tive RT-PRC that the P vampyrus candidate IFNB andOAS2 (a gene induced by IFNB in other mammals) are

Figure 5 Gene expression time course by qRT-PCR in fresh Pvampyrus PBMCs stimulated in culture by the indicated compoundExpression levels are relative to the housekeeping gene PPIA The error bars represent standard errors of the mean over biological duplicates

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 9 of 12

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 10: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

expressed upon stimulation by type-I inducing agentsFurthermore the temporal trajectories of this expressionis consistent with the known mechanisms of such sig-naling The expression of IFNB under viral infection wasdelayed compared to that under stimulation by the TLRligands poly(IC) and LPS

ConclusionsThe bat has been implicated as a major reservoir forviruses of extreme pathogenicity in humans and sufferssubstantially less disease when infected by these virusesthan humans do In addition bat populations in NorthAmerica are declining rapidly as a result of white-nosesyndrome an emergent disease associated with a fungalpathogen [35] These facts and others suggest that thestudy of host defense and immunity in these uniquecreatures would benefit the pursuit of ecological andhuman health The major obstacle in this undertaking isthe lack of reagents that would make such investigationspossible One way to get this effort underway is to takeadvantage of the rich store of information contained inexisting partial genome sequences Although the infor-mation available in these genome databases requiresmore careful treatment for its extraction than isrequired for a complete assembled genome we havedeveloped methods that facilitate this task and havedemonstrated their utility

MethodsEthics StatementAll animals were handled in strict accordance with goodanimal practice as defined by the relevant national andor local animal welfare bodies and all animal work wascovered by an IACUC protocol from the Lubee BatConservancy (USDA Research Facility 58-R-0131)

Animals and sample collectionSubjects for this study included 10 male Large FlyingFoxes (Pteropus vampyrus) Some of the animals werewild caught and some were laboratory born all werereproductively mature ranging in age from 4 to 21 yrsAnimals were housed in captivity at the Lubee Bat Con-servancy in Gainesville Florida USA Animals werehoused together with conspecifics in an indooroutdoorcircular flight enclosure and were fed a mixture of fruitvegetables commercial primate chow and a vitaminsupplement Water was provided ad libitum The batswere captured manually anesthesia was mask-inducedwith 5 isoflurane in oxygen (3 Lmin) and the batswere maintained with 25 isoflurane Blood was col-lected from the left and right brachial veins using 3-mlsyringes and 25-gauge needles Blood samples weretransferred to heparanized tubes maintained and

shipped at room temperature (22degC) via overnightcourier

Cloning and SequencingGenomic DNA was prepared from whole unfractionatedperipheral blood from Pteropus vampyrus bats Using pri-mers derived from the inferred interferon gene sequencesand encoded for restriction enzyme cloning sites thegenes were PCR amplified from the genomic DNA Theforward and reverse primers encoded restriction sites forNheI and XhoI respectively which allowed cloning of thefull length gene into mammalian expression vectorpcDNA31(+) The PCR program consisted of an initiali-zation step at 94degC for 5 minutes then 30 cyclesof amplification consisting of denaturation at 94degC for1 minute annealing at 55degC for 30 seconds and elonga-tion at 72deg for 1 minute Amplification was followed by afinal elongation step at 72degC for 5 minutes and samplesheld at 4degC PCR was performed on a P times 2 ThermalCycler from ThermoElectron Corporation (WalthamMA)After purification the inserted genes were sequenced

using vector primers downstream and upstream of theinsert site Pteropus vampyrus IFNB IFNK IFND andIFNA sequences have been submitted to Genbank(GU126493 HM63650 HM636501 and HM636502)Quantitative RT-PCRWhole anticoagulated blood was collected from the bra-chial vein and layered over lymphocyte separation med-ium (Lympholyte M Cedarlane Labs) Cells werecollected from the interface washed and plated on 24well flat-bottom tissue culture plates (2 times 106 cellswell)in a total volume of 250 μl Cells were treated as indi-cated with either poly(IC)(Invivogen) LPS (Sigma) orVSV (Ramsburg lab stocks confirmed endotoxin freeusing LAL assay) Stimulants were diluted such that theaddition of 250 μl stimulant would give a final concentra-tion of 10 μgml poly IC 1 μgml LPS or an MOI of 5for VSV 500 μl of complete medium was added to theunstimulated control wells Cells were incubated at 37rsquoand 5 CO2 for the indicated times after which cellswere harvested into RNA lysis buffer (Qiagen RNEasykit) according to the manufacturerrsquos instructionsPurified RNA was prepared from these whole-cell

lysates as described in the protocols accompanying theQiagen RNeasy kit Qiashedder and Qiagen RNeasyMini kit and Qiagen DNase-Free DNase Set cDNAsynthesis was performed using the Invitrogen two-stepqRT-PCR kit according to the manufacturerrsquos protocolQuantitative real-time PCR was performed in an Eppen-dorf Mastercycler ep realplex thermal cycler usingSYBRGreenER qPCR Supermix and primers designedfrom the inferred gene sequences (additional file 3)

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 10 of 12

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 11: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

Independent biological replicates were prepared foreach treatment and time point The log2 expressionlevel of IFNB and OAS2 relative to the housekeepinggene PPIA was estimated by subtracting the mean Ctvalue for each gene and treatmenttime point combina-tion from the corresponding mean Ct value for PPIA

Estimation of Gene Family SizeThe likelihood function for the observed set S ofsequences is a function of the true number g of genesand the sample size n by summing over the unobservednumber c of distinct genes sampled

P S g n P S c P c g nS c

c

( | ) ( | ) ( | )= sum (14)

The pmf on c is given by the recursion

P c g nc

gP c g n

g c

gP c g n

c( | ) ( | )

( | )

=

+ +

1

11 1

(15)

with Pc(1 | g1) = 1 and Pc(c | g1) = 0 for c ne 1PS(S | c) is approximated by minimizing the number

m of mutations over the assignments of observedsequences to c groups PS is then the posterior probabil-ity of getting m mutations given the observed numberof mutations in the IFNB and IFNK datasetsNote that this technique is independent of the assem-

bly methods described elsewhere in the paper

Phylogenetic InferenceThe set consisting of type-I interferon amino acidsequences from human (Homo sapiens) domestic pig(Sus scrofa) duckbill platypus (Ornithorhynchus anati-nus) and the amino acid sequences corresponding to theChiropteran genes inferred in this paper was submittedto multiple sequence alignment by CLUSTALW 2010[36] and ProbCons 112 [37] Maximum likelihood andminimum evolution phylogenies were inferred from eachresulting multiple sequence alignment Maximum likeli-hood phylogenies were inferred with proml found in thePHYLIP 368 package [38] Minimum evolution phyloge-nies were inferred using fastme [39] based on the dis-tance matrix calculated with protdist found in PHYLIPIn both cases 1000 boot-strap replications were gener-ated using seqboot and the consensus phylogeny wasassembled with consense both from the PHYLIP pack-age FigTree 121 httptreebioedacuksoftwarefig-tree was used for initial visualization and finalproduction of figures of the resulting phylogenetic treesAlignments of the bat DNA sequences and of all theamino acid sequences are available in additional files 4

Additional material

Additional file 1 All IFNGbio An alignment of IFNG gene sequencesfrom Homo sapiens Pterops vampyrus and Myotis lucifugus as follows firstrow genomic sequence of human IFNG rows 2 3 predicted genomicsequence of IFNG in M lucifugus and P vampyrus respectively rows 4-7exons for human IFNG rows 8-11 predicted exons for M lucifugus rows12-15 predicted exons for P vampyrus row 16 spliced coding region forhuman rows 1718 predicted spliced coding region for M lucifugus andP vampyrus row 19 amino acid sequence for human interferon gammarows 2021 predicted amino acid sequence for chiropteran interferongamma Examination of this bio file requires the BioEdit program whichis freely available at httpwwwmbioncsueduBioEditbioedithtml

Additional file 2 Pva IFND clone-assembly comparisonfasta Thissequence alignment shows a comparison between Pteropus vampyrusIFND sequences obtained by 1) assembly from the genome sequencingtrace archives and 2) by direct cloning and sequencing The clonedsequences include only those found in at least two independent clonesThe assemblies include all those with intact ORFs plus three that appearto be pseudogenes

Additional file 3 Accession numbers and primers The Genbankaccession numbers of the IFN genes used in the phylogenetic analysisand the PCR primers used in the P vampyrus gene expression studies

Additional file 4 Mlu + Pva + Hsa + SscSelectedIntactOrfsAA-alignedfasta The amino acid sequences used in the phylogeneticanalysis

AcknowledgementsWe thank Brice Barefoot for his technical assistance and Rachel Willcutts forher help in the early stages of this work We are grateful to Brian Pope andTasha King at the Lubee Bat Conservancy for their expert assistanceDr Ramsburgrsquos work was supported by NIH grants UC6 AI058607 andAI51445 Dr Keplerrsquos work was supported by discretionary funds provided byDuke University No additional external funding was used No fundingsource had any influence on the study design on the collection analysis orinterpretation of data on the writing of the manuscript or on the decisionto submit the manuscript for publication

Author details1Center for Computational Immunology Duke University Medical CenterDurham NC USA 2Duke Human Vaccine Institute Duke University MedicalCenter Durham NC USA 3Lubee Bat Conservancy and Department ofWildlife Ecology and Conservation University of Florida Gainesville FL USA

Authorsrsquo contributionsER and TBK conceived the project and designed the experiments TBKdeveloped implemented and applied the statistical methods CS KH andAH performed the experiments JR and TBK performed the phylogeneticanalysis AW contributed critical reagents and assisted in interpretation ofresults TBK CS and ER wrote the manuscript All authors read and approvedthe final manuscript

Received 21 November 2009 Accepted 21 July 2010Published 21 July 2010

References1 Jones KE Patel NG Levy MA Storeygard A Balk D et al Global trends in

emerging infectious diseases Nature 2008 451990-9932 Calisher CH Childs JE Field HE Holmes KV Schountz T Bats important

reservoir hosts of emerging viruses Clin Microbiol Rev 2006 19531-5453 Messenger SRC Smith J Bats Emerging Virus Infections and the Rabies

Paradigm Bat Ecology Chicago and London University of ChicagoPressKunz TF M B 2005 622

4 Omatsu T Watanabe S Akashi H Yoshikawa Y Biological characters ofbats in relation to natural reservoir of emerging viruses ComparativeImmunology Microbiology and Infectious Diseases 2007 30357-374

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 11 of 12

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References
Page 12: RESEARCH ARTICLE Open Access Chiropteran types I and II … · 2017. 4. 5. · RESEARCH ARTICLE Open Access Chiropteran types I and II interferon genes inferred from genome sequencing

5 Wong S Lau S Woo P Yuen K-Y Bats as a continuing source ofemerging infections in humans Reviews in Medical Virology 2007 1767-91

6 Chua KB Lek Koh C Hooi PS Wee KF Khong JH et al Isolation of Nipahvirus from Malaysian Island flying-foxes Microbes and Infection 20024145-151

7 Halpin K Young PL Field HE Mackenzie JS Isolation of Hendra virus frompteropid bats a natural reservoir of Hendra virus Journal of GeneralVirology 2000 811927-1932

8 Towner J Pourrut X Albarino C Nkogue C Bird B et al Marburg virusinfection detected in a common African bat PLoS ONE 2007 2e764

9 Pourrut X Souris M Towner J Rollin P Nichol S et al Large serologicalsurvey showing cocirculation of Ebola and Marburg viruses in Gabonesebat populations and a high seroprevalence of both viruses in Rousettusaegyptiacus BMC Infectious Diseases 2009 9159

10 Enright JB Bats and their Relation to Rabies Annual Review ofMicrobiology 1956 10369-392

11 Badrane H Tordo N Host Switching in Lyssavirus History from theChiroptera to the Carnivora Orders J Virol 2001 758096-8104

12 Li W Shi Z Yu M Ren W Smith C et al Bats Are Natural Reservoirs ofSARS-Like Coronaviruses Science 2005 310676-679

13 Lau SKP Woo PCY Li KSM Huang Y Tsoi H-W et al Severe acuterespiratory syndrome coronavirus-like virus in Chinese horseshoe batsProceedings of the National Academy of Sciences of the United States ofAmerica 2005 10214040-14045

14 Sulkin SE Allen R Virus Infections in Bats Monographs in Virology Basel SKarger 1974 8

15 Wilson DE Reeder DM Mammal Species of the World Baltimore TheJohns Hopkins University Press 2005

16 Devaraj SG Wang N Chen Z Chen Z Tseng M et al Regulation of IRF-3-dependent Innate Immunity by the Papain-like Protease Domain of theSevere Acute Respiratory Syndrome Coronavirus Journal of BiologicalChemistry 2007 28232208-32221

17 Basler CF Amarasinghe GK Evasion of Interferon Responses by Ebola andMarburg Viruses Journal of Interferon amp Cytokine Research 2009 29511-520

18 Park MS Shaw ML Munoz-Jordan J Cros JF Nakaya T et al Newcastledisease virus (NDV)-based assay demonstrates interferon-antagonistactivity for the NDV V protein and the Nipah virus V W and C proteinsJ Virol 2003 771501-1511

19 Brzozka K Finke S Conzelmann K-K Identification of the Rabies VirusAlphaBeta Interferon Antagonist Phosphoprotein P Interferes withPhosphorylation of Interferon Regulatory Factor 3 J Virol 2005797673-7681

20 Mendonccedila RZP Pereira CA Relationship of interferon synthesis and theresistance of mice infected with street rabies virus Brazilian Journal ofMedical and Biological Research 1994 27691-695

21 Sen GC Viruses and interferons Annu Rev Microbiol 2001 55255-28122 Stark GR Kerr IM Williams BR Silverman RH Schreiber RD How cells

respond to interferons Annu Rev Biochem 1998 67227-26423 Theofilopoulos AN Baccala R Beutler B Kono DH Type I interferons

(alphabeta) in immunity and autoimmunity Annu Rev Immunol 200523307-336

24 Pestka S Krause CD Walter MR Interferons interferon-like cytokines andtheir receptors Immunol Rev 2004 2028-32

25 Ohno S Taniguchi T Structure of a chromosomal gene for humaninterferon beta Proc Natl Acad Sci USA 1981 785305-5309

26 Cohen L Lacoste J Parniak M Daigneault L Skup D et al Stimulation ofinterferon beta gene transcription in vitro by purified NF-kappa B and anovel TH protein Cell Growth Differ 1991 2323-333

27 Pannisi E More Genomes But Shallower Coverage Science 20043041227

28 Bouck J Miller W Gorrell JH Muzny D Gibbs RA Analysis of the Qualityand Utility of Random Shotgun Sequencing at Low RedundanciesGenome Research 1998 81074-1084

29 Ohno S Evolution by Gene Duplication London George Allen and Unwin1970

30 Gruumlnwald PD Myung IJ Pitt MA Advances in Minimum DescriptionLength Theory and Applications Cambridge MIT Press 2005

31 Woelk CH Frost SDW Richman DD Higley PE Kosakovsky Pond SLEvolution of the interferon alpha gene family in eutherian mammalsGene 2007 39738-50

32 Walker A Roberts RM Characterization of the bovine type I IFN locusrearrangements expansions and novel subfamilies BMC Genomics 200910187

33 Lefegravevre F Boulay V A novel and atypical type one interferon geneexpressed by trophoblast during early pregnancy Journal of BiologicalChemistry 1993 26819760-19768

34 Roberts RM Chen Y Ezashi T Walker AM Interferons and the maternal-conceptus dialog in mammals Seminars in Cell amp Developmental Biology2008 19170-177

35 Blehert DS Hicks AC Behr M Meteyer CU Berlowski-Zier BM et al BatWhite-Nose Syndrome An Emerging Fungal Pathogen Science 2009323227

36 Larkin MA Blackshields G Brown NP Chenna R McGettigan PA et alClustal W and Clustal X version 20 Bioinformatics 2007 232947-2948

37 Do CB Mahabhashyam MSP Brudno M Batzoglou S ProbConsProbabilistic consistency-based multiple sequence alignment GenomeResearch 2005 15330-340

38 Felsenstein J PHYLIP - Phylogeny Inference Package (Version 32)Cladistics 1989 5164-166

39 Desper R Gascuel O Fast and Accurate Phylogeny ReconstructionAlgorithms Based on the Minimum-Evolution Principle Journal ofComputational Biology 2002 9687-705

doi1011861471-2164-11-444Cite this article as Kepler et al Chiropteran types I and II interferongenes inferred from genome sequencing traces by a statistical gene-family assembler BMC Genomics 2010 11444

Submit your next manuscript to BioMed Centraland take full advantage of

bull Convenient online submission

bull Thorough peer review

bull No space constraints or color figure charges

bull Immediate publication on acceptance

bull Inclusion in PubMed CAS Scopus and Google Scholar

bull Research which is freely available for redistribution

Submit your manuscript at wwwbiomedcentralcomsubmit

Kepler et al BMC Genomics 2010 11444httpwwwbiomedcentralcom1471-216411444

Page 12 of 12

  • Abstract
    • Background
    • Results
    • Conclusion
      • Background
        • Bat Genome Projects
          • Results
            • Inference of Gene Families From Trace Archives
              • Quality scores and conditional probabilities
                • Validation
                • Type I interferons in Myotis lucifugus and Pteropus vampyrus
                  • Chiropteran Type II interferon
                    • Phylogenetics
                    • Sequencing
                    • Gene expression
                      • Discussion
                      • Conclusions
                      • Methods
                        • Ethics Statement
                        • Animals and sample collection
                        • Cloning and Sequencing
                          • Quantitative RT-PCR
                            • Estimation of Gene Family Size
                            • Phylogenetic Inference
                              • Acknowledgements
                              • Author details
                              • Authors contributions
                              • References