complete mitochondrial genome sequence from an endangered indian snake, python molurus molurus...

10
Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae) Bhawna Dubey P. R. Meganathan Ikramul Haque Received: 24 May 2011 / Accepted: 25 January 2012 / Published online: 14 February 2012 Ó Springer Science+Business Media B.V. 2012 Abstract This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide com- position of the genome shows that there are more A–C % than T–G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non- synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pres- sure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny. Keywords Codon usage Á Mitochondrial genome Á Python molurus Á Pythonidae Á Phylogeny Introduction Complete mitochondrial genomes studies lead to a better insight into the biology and evolution of vertebrates. Mito- chondrial DNA (mtDNA) has been frequently used for phylogenetic studies because of conservative gene order, absence of introns, lack of recombination, maternal inheri- tance, and presence of various protein coding genes, which provide evolutionary milieu of the genome. Also, the applications of mtDNA for wildlife conservation approa- ches, population structure studies and conservation genetics involve the use of rapidly evolving mtDNA sequences which may be used along with other markers like microsatellites and SNPs [1, 2]. Mitochondrial DNA sequences can also be used to identify cryptic species/hybrid individuals in a population [3]. Furthermore, many studies, including ours have also reported the successful use of mtDNA in forensic identification of endangered species, thereby aiding in their conservation programs [411]. Phylogenetic tree construc- tion is another approach commonly used in species identi- fication, where whole mtDNA sequences/genes are used to establish the evolutionary relationships of species in ques- tion to the closest reference sequence [12, 13]. The Indian Rock Python (Python molurus molurus) and its products are of significant commercial importance in the Electronic supplementary material The online version of this article (doi:10.1007/s11033-012-1572-5) contains supplementary material, which is available to authorized users. B. Dubey Á P. R. Meganathan Á I. Haque (&) National DNA Analysis Centre, Central Forensic Science Laboratory, 30-Gorachand Road, Kolkata 700 014, West Bengal, India e-mail: [email protected] Present Address: B. Dubey Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India Present Address: P. R. Meganathan Department of Biochemistry and Molecular Biology, Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA 123 Mol Biol Rep (2012) 39:7403–7412 DOI 10.1007/s11033-012-1572-5

Upload: ikramul

Post on 19-Aug-2016

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

Complete mitochondrial genome sequence from an endangeredIndian snake, Python molurus molurus (Serpentes, Pythonidae)

Bhawna Dubey • P. R. Meganathan •

Ikramul Haque

Received: 24 May 2011 / Accepted: 25 January 2012 / Published online: 14 February 2012

� Springer Science+Business Media B.V. 2012

Abstract This paper reports the complete mitochondrial

genome sequence of an endangered Indian snake, Python

molurus molurus (Indian Rock Python). A typical snake

mitochondrial (mt) genome of 17258 bp length comprising

of 37 genes including the 13 protein coding genes, 22 tRNA

genes, and 2 ribosomal RNA genes along with duplicate

control regions is described herein. The P. molurus molurus

mt. genome is relatively similar to other snake mt. genomes

with respect to gene arrangement, composition, tRNA

structures and skews of AT/GC bases. The nucleotide com-

position of the genome shows that there are more A–C % than

T–G% on the positive strand as revealed by positive AT and

CG skews. Comparison of individual protein coding genes,

with other snake genomes suggests that ATP8 and NADH3

genes have high divergence rates. Codon usage analysis

reveals a preference of NNC codons over NNG codons in the

mt. genome of P. molurus. Also, the synonymous and non-

synonymous substitution rates (ka/ks) suggest that most of

the protein coding genes are under purifying selection pres-

sure. The phylogenetic analyses involving the concatenated

13 protein coding genes of P. molurus molurus conformed to

the previously established snake phylogeny.

Keywords Codon usage � Mitochondrial genome �Python molurus � Pythonidae � Phylogeny

Introduction

Complete mitochondrial genomes studies lead to a better

insight into the biology and evolution of vertebrates. Mito-

chondrial DNA (mtDNA) has been frequently used for

phylogenetic studies because of conservative gene order,

absence of introns, lack of recombination, maternal inheri-

tance, and presence of various protein coding genes, which

provide evolutionary milieu of the genome. Also, the

applications of mtDNA for wildlife conservation approa-

ches, population structure studies and conservation genetics

involve the use of rapidly evolving mtDNA sequences which

may be used along with other markers like microsatellites

and SNPs [1, 2]. Mitochondrial DNA sequences can also be

used to identify cryptic species/hybrid individuals in a

population [3]. Furthermore, many studies, including ours

have also reported the successful use of mtDNA in forensic

identification of endangered species, thereby aiding in their

conservation programs [4–11]. Phylogenetic tree construc-

tion is another approach commonly used in species identi-

fication, where whole mtDNA sequences/genes are used to

establish the evolutionary relationships of species in ques-

tion to the closest reference sequence [12, 13].

The Indian Rock Python (Python molurus molurus) and

its products are of significant commercial importance in the

Electronic supplementary material The online version of thisarticle (doi:10.1007/s11033-012-1572-5) contains supplementarymaterial, which is available to authorized users.

B. Dubey � P. R. Meganathan � I. Haque (&)

National DNA Analysis Centre, Central Forensic Science

Laboratory, 30-Gorachand Road, Kolkata 700 014,

West Bengal, India

e-mail: [email protected]

Present Address:B. Dubey

Centre for Cellular and Molecular Biology, Uppal Road,

Hyderabad 500 007, India

Present Address:P. R. Meganathan

Department of Biochemistry and Molecular Biology, Institute

for Genomics, Biocomputing and Biotechnology,

Mississippi State University, Mississippi State, MS 39762, USA

123

Mol Biol Rep (2012) 39:7403–7412

DOI 10.1007/s11033-012-1572-5

Page 2: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

national and international markets and as a result the

population density of the species has been alarmingly

depleted [14]. In view of this Indian Pythons are the species

of great conservation concern (listed by IUCN Red list

2009, as lower risk, near threatened [15] and the subspe-

cies, P. molurus molurus is listed in the Appendix I of

CITES whereas all other subspecies of P. molurus and

species of Pythonidae are listed in the Appendix II).

Therefore, additional information garnered from the field

of genetics/molecular biology could be of immense value

for conservation efforts of this species. In this regard, there

is always a need to use large quantity of data for better

understanding of phylogenetic/evolutionary relationships

[16]. Further, the insights into the genome makeup and

variations in the endangered species can find conservation

applications in wildlife health management of populations

in wild or captivity [17]. However, the complete mtDNA

sequences of approximately 35 snake species are presently

available in public databases, which have contributed to the

study of evolutionary dynamics and phylogenetics of

snakes, yet, complete mtDNA of only one species of family

Pythonidae has been described [18, 19]. Hitherto, the par-

tial sequences of mtDNA for comparison at species level

have been used to estimate Python phylogenetics [20];

nevertheless, use of complete mtDNAs is essential for

robust comparison, as each of the mt. genes may provide

answers to different phylogenetic questions [21, 22].

In view of the above it is considered imperative to

generate complete mitochondrial DNA sequences for the

endangered Indian snake, P. molurus molurus, which can

be used in phylogenetic, gene variability and identification

studies to facilitate the understanding of genetic relation-

ships among snakes. We have also used this data to analyze

relationships within major snake lineages and phylogenetic

position of P. molurus molurus in family Pythonidae.

Materials and methods

DNA extraction

Blood and tissue samples of P. molurus molurus were

obtained from Snake Transit House, Jabalpur, Madhya

Pradesh and Chennai Snake Park Trust, Chennai. DNA

extraction from tissues was carried out using DNeasy

Tissue kit (Qiagen), and standard Phenol–Chloroform

method [23] was used for blood samples.

Primer Design

In order to make use of Long and Accurate PCR strategy

[24], we designed specific primer sets LAF1–LAR1 and

LAF2–LAR2 (Table 1) which will result in the amplification

of whole mitochondrial genome in two overlapping pieces of

approximately 9 and 10 kb each. These primer sets were

targeted in the 12S rRNA/COX II gene region and 16S

rRNA/ND4 gene sequences (our ‘‘unpublished data’’) of mt.

genome of P. molurus.

Also, a series of internal primer sets (Table 1) were

designed by making use of conserved sites in the alignment,

produced by aligning the complete mitochondrial gene

sequences of available snake species from public databases.

Polymerase chain reaction and DNA sequencing

In order to amplify relatively longer regions, the conditions

of long and accurate PCR (LA-PCR) were utilized. LA

PCRs were set in a total volume of 50 ll which contained

5 ll of 109 LA PCRTM

buffer (with Mg2?), 8 ll of dNTP

mixture (Takara Bio INC. Japan), 1.0 ll each of 10 lM

primers, 0.5 ll of Takara LA TaqTM

polymerase (Takara

Bio INC. Japan) and 3.0 ll genomic DNA. PCR was per-

formed on GeneAmp� PCR system 9700 (Applied Bio-

systems) with following conditions: initial denaturation at

94�C for 5 min followed by 30 cycles of denaturation

(94�C for 30 s), annealing (57�C for 30 s) and extension

(68�C for 8 min) followed by 10 min elongation at 68�C.

Amplified products were purified using low melting aga-

rose. Each of the long amplified fragments was then used as

template for the amplifications using internal primers.

PCRs for targeting smaller regions were set in a volume

of 25 ll, containing 1–2 ll of DNA template, 5 mM

MgCl2, 2.5 mM dNTPs, 0.2 lM of each primer, 2.5 ll 109

buffer and 1.5 U of Taq polymerase (Invitrogen Life

Technologies, Brazil) on GeneAmp� PCR system 9700

(Applied Biosystems). The PCR conditions were: initial

denaturation at 94�C for 5 min followed by 35 cycles of

denaturation (94�C for 30 s), respective annealing (Table 1)

for 30 s, and extension (72�C for 30 s) with a final exten-

sion of 72�C for 5 min followed by 4�C hold. The amplified

fragments were visualized on 2% agarose gel using ethi-

dium bromide stain (0.5 lg/ml). The generated amplicons

were purified using the exosap treatment and then cycle

sequenced using BigDye� Terminator v3.1 cycle sequenc-

ing kit (Applied Biosystems, Foster City, CA). DNA

sequencing was performed on 3100 Avant Genetic Ana-

lyzer (Applied Biosystems).

Annotation/Data analyses

The genome was assembled using BioEdit [25]. Transfer

RNAs and their structures were identified using ARWEN

program [26]. The tRNAs were then used to estimate the

boundaries of protein coding genes, rRNAs and control

regions. Finally, the positions of start and stop codons were

used to assess the sizes of protein coding genes (PCG).

7404 Mol Biol Rep (2012) 39:7403–7412

123

Page 3: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

Table 1 List of primers used

and their respective annealing

temperatures

Primer Primer sequence Annealing temperature (�C)

RTF1 AAA GCA CAG CAC TGA AAA TGC 52

RTR1 TTC TTG CTA AAC CAT GAT GC

L85512S GCG YAC ACA CCG CCC GTC 54

H38316S AAR GKD GAA CTW AKA TTC HKT TT

L28716S GTR GCA AAA GAG TGG RAA GAC 55

H106216S GCT TCA CAG GGT CTT CTG GTC TTA

L94316S TTR TAG ACC HGT ATG AAA GGC 52

H294ND1 TGG TAT KGG TAW KGG BGC TCA

L204ND1 ACC CAC ACT TTC CTC CCC AAT CCT ATT 60

H258CONT CAT TGA ACG ACC AAG AAA TGA GGA GCT AA

L61CONT TAC CCC CCC CCA CTT ACA TAG GAG GAA T 53

H713CONT GCA TTA AGA GAT GTA AGC CTC AAG GGA

L687CONT TCC CTT GAG GCT TAC ATC TCT TAA TGC 53

H413ND2 GGB GCD AKT TTY TGT CAK GT

L325ND2 GCH CCA YTY CAY TYY TGA 50

H60ASN GTY TGK RYD RCW ARY WGT WRA A

L18TRP AA ACT AGD RRC CTT CAA AG 51

H57COI GTA KAG KGT KCC RAT RTC TTT

L1324COI TAC TCD GAY TWY CCW GAY GC 53

H200CO2 GTT CAD GCD GCY TCT ART TGT TC

L64CO2 TTY YTA CAY GAC CAY GTV YT 49

H26ATP6 AAT TGW TCR AAT ATR TTT AT

L19ATP6 GAA CAA TTC GCA AGC CCA GAA AT 56

H182CO3 AYR TCK CGT CAY CAT TG

L28CO3 TWG THG AYC CVA GCC CWT GRC C 54

H125ND3 GGG TCR AAK CCR CAT TCR TA

L28GLY TGC CYT CCA AGC AYT WRG HCC C 60

H239ND4L GCD ACD ACW AGG CTD AGK CC

L92ND4L TAT GYV TDG AAR CAA TRA TAC 51

H655ND4 CBA CRT GRG CTT TKG GKA GYC A

L595ND4 GCH TTY HTR GCW AAAA ATA CC 50

H75LEU CTA YYA CTT GGA KTY GCR CC

L28LEU TGG TCT TAG GCA CCA AWA YHC TT 52

H752ND5 ACW ACT ATK GTR CTDGAR TG

L500ND5 TWC ARG CYA TYR TYT AYA AYC G 49

H1175ND5 ATD GTR TCT TTD GAR TAR AAV CC

L1000 ND5 GCA CTA CTA TTC CTA TGT TCA GGA TC 55

H1636ND5 TTA ATT CTG TTG AGG TTT GTT GGC TGA

L92ND4L TAT GYV TDG AAR CAA TRA TAC 57

H655ND4 CBA CRT GRG CTT TKG GKA GYC A

L1550ND5 AAC CAA TTA GCT TTT TTC AAT CTC C 52

H138CYT CTG RAY DGC TAG RAA RAA DCC

RC2F GAA AAA CCA CCG TTG TTA ATC AAC TA 50

RC2R TTA CAA GAA CAATGC TTT

L982CYT ACA TGA RCH GCH WCH AAA CC 51

H505CONT TGC GAC CAA AGG TCT TGG AAA AAG C

LAF1 ACA GAA GAA GTA GAA CAA CTA GAA GC 68

LAR1 AGT TAC ACC TCG ACC TGT CGT GTT A

LAF2 ATA AGA CCA GAA GAC CCT GTG AAG CT 68

LAR2 AGR TCW GTT TGT TGB GRG CAD GTD AG

Mol Biol Rep (2012) 39:7403–7412 7405

123

Page 4: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

Further, the proteins were translated to amino acid

sequences and compared with corresponding genes from

available snake mt. genomes to check for the annotations.

The rRNA genes were identified by aligning with other

available snake mtDNA sequences. The nucleotide base

composition for different PCGs was calculated using

MEGA 3.1 [27]. Codon usage for all PCGs of P. molurus

molurus was estimated using CodonW (J. Peden;

http://molbiol.ox.ac.uk/Win95.codonW.zip) and also the

AT and GC content and base skews were calculated to

analyze the genome. The codon usage and other features of

the PCGs of P. molurus molurus were also compared with

two other snakes, P. regius and Boa constrictor. Synony-

mous and non-synonymous substitution rates were calcu-

lated using the ka/ks calculator [28].

Phylogenetic analyses

Phylogenetic analysis was performed in two datasets using

complete nucleotide sequences of all mitochondrial protein

coding genes (dataset I) and partial cyt b gene sequences of

species belonging to Pythonidae (dataset II) (Supplemen-

tary Material 1). Complete gene sequences from snake

species representing major snake families and available in

public databases were downloaded and aligned against the

gene sequences of P. molurus molurus generated in this

study using Clustal X [29]. Ambiguously aligned positions

were eliminated to result in alignment of 11180 bp for 19

taxa, forming the dataset I. Scolecophedians (blind snakes)

are sister group to Alethinophidians (true snakes) hence,

Ramphotyphlops braminus, a Scolecophedian snake was

used to root dataset I. Partial cyt b gene sequences were by

far the most prevalent sequence data in the Pythonidae, and

therefore these were utilized to result in the alignment of

307 bp for 16 taxa including P. molurus molurus to form

dataset II. Many molecular studies suggest that Pythons

appear to be more closely related to archaic macrostomatan

snakes (Loxocemus, [30, 31]) than the Boines, therefore,

phylogenetic analyses for dataset II were performed using

Loxocemus as an outgroup taxa.

Three different analyses were performed on both the

datasets, (1) A maximum likelihood (ML) tree

(1,000 bootstrap replicates) was computed using PHYML

[32] under GTR ? I ? gamma model as selected by

Treefinder software [33], (2) Bayesian analysis was per-

formed using MrBayes [34] with the maximum likelihood

model employed six substitution types (‘‘nst = 6’’) and

rate variation across sites modeled using a gamma distri-

bution (rates = ‘‘gamma’’). The Markov chain Monte

Carlo search was run with 4 chains for 500,000 genera-

tions, with trees begin sampled every 100 generations (the

first 10,000 trees were discarded as ‘‘burnin’’), (3) Maxi-

mum parsimony analyses were carried out using Phylip

software [35] with nonparametric bootstrapping used to

assess support for the nodes in the MP analyses with 1,000

replicates.

Results and discussion

Snake mt. genomes are of great interest in understanding

mitogenomic evolution because of gene duplications and

rearrangements and the fast evolutionary rate of their genes

compared to other vertebrates [36]. Also, the complete

mtDNA information of endangered species can be a useful

tool in the area of conservation genetics of snakes.

Therefore, in the present study we sequenced the complete

mt. genome of P. molurus molurus (an endangered snake)

and present the comparative analyses of some of the

important genomic features.

Genome organization

The gene arrangement of P. molurus molurus mitochon-

drial genome is shown in Fig. 1 and is similar to that

described for other snake species [18, 19, 37]. The

P. molurus mtDNA is a typical circular molecule, of length

17258 bp, with all the 37 genes including the 13 PCGs, 22

tRNAs and 2 ribosomal RNAs that are usually present in

bilaterian mt. genomes.

Fig. 1 Gene map of P. molurus molurus mitochondrial DNA (22

tRNAs are abbreviated using one letter code for the corresponding

amino acid)

7406 Mol Biol Rep (2012) 39:7403–7412

123

Page 5: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

Protein coding genes and nucleotide composition

The 12 PCGs, ND1, ND2, COI, COII, ATPase 8, ATPase

6, ND3, COIII, ND4L, ND4, ND5 and Cyt b, were located

on the H-strand whereas the gene, ND6 was located on the

L-strand of mtDNA. Mitochondrial genes may use several

substitutes to ATG as start codon [38] and similarly, all the

PCGs of P. molurus begin with one of the common start

codons ATG/ATA/ATT, except for COX I and COX II

which begin with GTG, which has been known to initiate

these and other genes in many metazoans [39] including

reptiles [40, 41]. Out of the 13 PCGs, 7 show incomplete

stop codon TA/T. This phenomenon has been described in

other species also, where polyadenylation after transcrip-

tion leads to completion of partial T/TA codon into a

functional (TAA) termination codon [42]. The metazoan

mtDNA is compact and consists of some overlappings

between genes. In P. molurus molurus overlappings were

found between the genes ATP 6/ATP 8 (9 bp) and ND

5/ND 6 (4 bp). Overlapping genes found in mtDNAs leads

to an assumption that the ‘‘polycistron model’’ may not

hold true universally, since it would not be possible to

generate full-length RNAs with overlapping message from

a single transcript [43]. In addition, ATP6 and ATP8 genes

commonly overlap in chordate mtDNA and are known to

be translated from the same bicistronic mRNA [44].

Python molurus sequence showed *91% average sim-

ilarity to the P. reguis mtDNA sequences. The nucleotide

variability of each mitochondrial PCG was estimated by

calculating gene-by-gene overall genetic distances in

P. molurus along with P. regius; another species of Pyth-

onidae and Boa constrictor; member of the nearest related

family, Boidae for which complete mtDNA sequences are

available. ATP 8 was the least conserved gene both in

terms of pairwise amino acid identity among the three

snakes (57% on average and range 44–80%), and the

genetic distance values. The distance values were highest

for ATP 8 (0.6136) followed by NADH 3 (0.2852), how-

ever, more conserved gene was COX I with least genetic

distance value (0.0289) (Table 2). Also, the number of

codons for each gene was more or less similar in the three

species compared except for COX I gene in Pythons (534

codons) which was found to be longer than in Boa con-

strictor (482 codons). Some basic features of PCGs

(number of codons, the start/stop codons) compared among

the three snakes are given in Table 2.

The proportion of nucleotides in various genes of

P. molurus and two of the related species is given in

Table 3. The general trend of nucleotide composition for

majority of the PCGs was observed to be A (29–39%) [ C

(24–32%) [ T (19–27%) [ G (10–17%) except for genes

COX III and ND6 where the nucleotide composition fol-

lowed the order C[A[T[G and G[A[T[C, respec-

tively. The A ? T content of 13 PCGs was found to be

57.4% and overall A ? T content was 58%. The CG skew

[calculated as (C–G %)/(C ? G %)] and AT skew [cal-

culated as (A–T %)/(A ? T %)] are a good indicator of

strand specific nucleotide frequency bias [45, 46]. CG

skews were found to be positive in all the positive strand

encoded genes and negative in the negative strand encoded

genes (NADH 6), and a similar trend was observed for the

AT skew values (Table 3). The asymmetry in the nucleo-

tide composition between two strands is well known [46],

where there are more A and C% than T% and G% on the

Table 2 Characteristics of protein-coding genes of P. molurus, and comparison with P. regius and B. constrictor

Protein No. of codons Percent amino acid similarity Predicted start/stop codon Overall genetic

distancesP. molurus P.regius B.constrictor PM/PR PR/BC PM/BC P. molurus P.regius B.constrictor

ATP 8 56 56 56 80.3 46.4 44.6 ATG/TAA ATG/TAA ATG/TAA 0.6136

ATP 6 227 227 226 78.4 77.0 92.5 ATG/TAA ATG/TA ATG/TAG 0.2537

COX I 534 534 482 99.0 86.7 86.1 GTG/AGA GTG/AGA GTG/TA 0.0289

COX II 229 229 229 97.8 87.7 87.3 GTG/TA GTG/TA GTG/TAA 0.2124

COX III 261 261 261 90.0 88.8 98.0 ATG/T ATG/T ATG/T 0.2017

CYT B 370 370 371 89.9 84.0 78.8 ATG/T ATG/T ATG/T 0.2261

NADH1 321 321 322 96.5 86.6 85.7 ATA/T ATA/T ATA/T 0.1795

NADH2 344 344 343 96.5 72.0 72.0 ATT/TAA ATT/TAA ATA/TAA 0.2191

NADH3 114 114 114 89.4 75.4 74.5 ATA/T ATA/T ATA/T 0.2852

NADH4 452 452 452 72.1 61.7 59.7 ATG/A ATG/A ATG/G 0.2729

NADH4L 96 96 96 90.6 73.9 72.9 ATG/TA ATG/TA ATG/TA 0.2398

NADH5 598 598 597 89.2 75.5 74.7 ATG/TAA ATG/TAA ATG/TAA 0.2488

NADH6 171 171 169 99.0 59.6 59.0 ATG/TAG ATG/TAG ATA/TAG 0.2373

PM, Python molurus; PR, Python regius; BC, Boa constrictor

Mol Biol Rep (2012) 39:7403–7412 7407

123

Page 6: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

positive strand. The base compositions in P. molurus

mtDNAs are skewed similarly to other vertebrate mtDNAs

[47], with greater A ? C content in the gene-rich strand

than in the gene-poor strand.

Codon usage

The vast majority of prokaryotic and eukaryotic species

have non-random codon usage and the patterns of nucle-

otide usage are of great importance in the definition and

functional investigation of coding regions (http://www.nem

atode.net). The codon usage is influenced by a complex

association of mutational pressures, selection constraints

and genetic drift [48]. The codon usage bias varies within

and among genomes which can facilitate the understanding

of evolution and environmental adaptation of organisms. In

order to facilitate the examination of the codon preference

in three snake species, we analyzed the frequencies of

synonymous codons in the mt. genomes. A comparative

analysis of the codon usage, measured in terms of relative

synonymous codon usage (RSCU) which is the relative

frequency that each codon suits to encode a particular

amino acid (Supplementary Material 2), in all three

organisms show that P. molurus follows a similar pattern as

followed by other snakes except in cases of glycine where

usage of GGC (RSCU = 2.11) was preferred at the

expense of GGA and in tyrosine, where UAU (RSCU =

1.268) was preferred over UAC in P. molurus. In the

absence of any codon usage bias, the RSCU value should

be 1.00 whereas a codon that is used less frequently than

expected will have a value of less than 1.00 and vice versa

for a codon that is used more frequently than expected [49].

All the RSCU values deviating from 1.0 in Python mtDNA

suggest a bias in general for NNC codons over their NNG

counterparts. This appears reasonable as the nucleotide

composition of P. molurus shows the coding strand to be

rich in Cs over Gs. The total number of codons used by

P. molurus was 3775, followed by B. constrictor (3,721)

and P. regius (3,442 codons).

Transfer RNAs

The mt. genome of P. molurus bears all the 22 tRNAs

commonly found in metazoan mtDNA. All tRNAs possess

typical cloverleaf secondary structure (Supplementary

Material 3) except for tRNA Ser, where DHU loop was

absent. This is a common feature of vertebrate tRNA Ser

[50]. The TWC stem is usually 4–5 nucleotides (nlt) in

most of the tRNAs but shortened to 3 nlt in case of tRNA

Gly and tRNA Met and composed of just 2 pair of bases in

tRNA Phe. This shortening of TWC stem has been previ-

ously reported in snake mt. genome [18]. Several mis-

matched nucleotide pairs were found in the stems and most

of them were accompanied by a neighboring G–C pair,

probably to impart compensatory stability to the arms. In

vertebrates, at position 8 adjacent to the amino-acyl stem, a

conserved ‘T’ is usually present [51], which was found to

be replaced by ‘A’ in five of the tRNAs: tRNA-Ser (UCN),

tRNA Lys, tRNA Asn, tRNA Leu (CUN) and tRNA Arg.

Similar replacement has been reported in avian mtDNA

[50] in case of first four tRNAs mentioned above.

Non-coding region

Snake mitochondria are reported to contain duplicate

control regions [18], and the same was found in P. molurus.

Table 3 Nucleotide composition and skews of P. molurus molurus mitochondrial protein-coding and ribosomal RNA genes

GENE (?/-) strand A C G T AT SKEW CG SKEW

ATP 6 (?) 0.333 0.301 0.1 0.266 0.112 0.501

ATP 8 (?) 0.363 0.262 0.113 0.262 0.161 0.397

COX I (?) 0.298 0.28 0.157 0.265 0.058 0.281

COX II (?) 0.319 0.299 0.161 0.221 0.181 0.3

COX III (?) 0.29 0.311 0.158 0.241 0.092 0.326

CYT B (?) 0.305 0.32 0.122 0.253 0.093 0.447

NADH 1 (?) 0.342 0.32 0.101 0.238 0.179 0.52

NADH 2 (?) 0.364 0.329 0.087 0.219 0.248 0.581

NADH 3 (?) 0.315 0.292 0.12 0.274 0.069 0.417

NADH 4 (?) 0.319 0.326 0.114 0.242 0.137 0.392

NADH 4L (?) 0.348 0.29 0.1 0.262 0.14 0.487

NADH 5 (?) 0.356 0.301 0.105 0.238 0.198 0.482

NADH 6 (-) 0.146 0.08 0.314 0.46 -0.518 -0.593

12S rRNA 0.354 0.272 0.179 0.194 0.291 0.206

16S rRNA 0.396 0.247 0.154 0.202 0.324 0.231

7408 Mol Biol Rep (2012) 39:7403–7412

123

Page 7: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

One of the control regions was present typically between

tRNA Pro and tRNA Phe whereas other was located

between tRNA Ile and tRNA Leu-Gln-Met cluster. Con

served sequence blocks, CSB 1 and CSB 3 found in ver-

tebrate mtDNA were identified in P. molurus. Both the

control regions were nearly identical in sequence similarity

(over 90%) as they have been proposed to evolve in a

highly concerted fashion [18]. Other than the control

regions, there were 34 nlt distributed, that were unas-

signed to the genes and the composition of these appear

unremarkable.

Synonymous/Non-synonymous substitutions

The comparison between the number of non-synonymous

mutations (dn or ka), and the number of synonymous

mutations (ds or ks), can suggest whether, at the molecular

level, natural selection is acting to promote advantageous

mutations (positive selection) or to remove deleterious

mutations (purifying selection). In general, when positive

selection dominates, the ka/ks ratio is greater than 1, i.e.

the diversity at the amino acid level is favored, likely due

to the fitness advantage provided by the mutations. Con-

versely, when negative selection dominates, the ka/ks ratio

is less than 1, i.e. most of the amino acid changes are

deleterious and, therefore, are selected against [52]. When

the positive and negative selection forces balance each

other, the ka/ks ratio is close to 1.

Analysis of amino acid substitution mutations (non-

synonymous, ka) versus neutral mutations (synonymous,

ks) for all 13 mtDNA protein coding genes of the three

snakes (P. molurus, P. regius, B. constrictor) revealed that,

the ka/ks ratio was less than 1 (Fig. 2). The ka/ks ratio for

the 13 PCGs ranged from 0.0054 to 0.225; in accordance

with the fact that most protein coding genes are considered

to be under the effect of purifying selection. The highest

ka/ks ratio was obtained for ATP 8 gene and COX I show

the smallest ka/ks rates of any mt. gene compared herein.

Consequently, ATP 8 is indicated to be under positive

selection whereas COX I seems to be under stronger

purifying selection. The ka/ks values for ATP 8 and COX I

also corroborate with the evolutionary rates of these genes

as shown by the genetic distance values (Table 2), also,

ATP 8 and COX I have been reported to show similar trend

in previous studies [53].

Phylogenetic analyses

In order to ensure the usefulness of newly sequenced mt.

genome we carried out phylogenetic analyses using the

available mt. genomes of major snake lineages to establish

an overall snake phylogeny. Also, a separate phylogenetic

reconstruction was carried out using partial cyt b gene

sequences of P. molurus in conjunction with other 15

members of family Pythonidae. The tree topologies

obtained from both the analyses are discussed below.

(i) Relationships among the major lineages of snakes

The phylogenetic analyses of dataset I (sequences of all

protein coding mitochondrial genes) show the position of

P. molurus molurus relative to 18 other snake species

belonging to major snake families (Fig. 3). Ramphotyph-

lops (a Scolecophidean snake) was used to root the tree.

Our results were in complete agreement with the previ-

ously established snake phylogeny [54, 55], known to be

comprising of three major lineages: Scolecophidea (blind

snakes), Henophidea (primitive snakes), and Caenophidia

(advanced snakes). Scolecophideans are considered basal

group [54] followed by the henophidians (Python, Boa,

Xenopeltis) and was well supported in our study. Among

henophidian snakes, Pythons differ from the generally

similar boas in the mode of reproduction (viviparous boas;

oviparous pythons), this was also evident in our phyloge-

netic analyses where, Python does not cluster with Boa, but

instead shows a strong relationship with Xenopeltis. Hence,

the complete mitochondrial genome analyses also support

the fact that Pythons are not the immediate relatives of

Boid snakes [56].

The remaining taxa, belong to caenophidia (advanced

snakes) with Achalinus meiguensis (family Xenodermati-

dae) occupying the basal position among caenophidians.

This observation was also supported by Vidal et al. [57].

Colubroidea is the most diverse and vast lineage of cae-

nophidian snakes comprising of 3 major groups; Colubri-

dae, Elapidae and Viperidae. We found viperid snakes to

be basal to Colubroidea which was also in concordance

with Kelly et al. [56]. Colubridae and Elapidae cluster

together before they combine with the Viperidae, thus,

supporting the assumption that Elapidae share an ancestor

with the Colubridae, rather than the Viperidae.

Fig. 2 The synonymous and non-synonymous substitution rates (ka/

ks ratio) calculated for three snake species. PM P. molurus, PRP. regius, BO B. constrictor

Mol Biol Rep (2012) 39:7403–7412 7409

123

Page 8: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

(ii) Relationships among Pythonidae

The phylogenetic analyses using dataset II (partial cyt

b gene sequences) discern the position of P. molurus molurus

among the other members of family Pythonidae. Pythonidae

is an old world group of ancestral constricting snakes and the

major genera of the family are divided into two groups on the

basis of their occurrence, namely (i) The Afro-Asian and (ii)

the Australo-Papuan genera. Most of the genera (Liasis,

Apodora, Morelia, Bothrochilus, Leiopython, Antaresia) are

found in the Australo-Papuan region barring the Pythons

[20]. A close relative of pythons, ‘Loxocemus’ was used as an

outgroup member. Our phylogenetic analyses (Fig. 4) show

that P. molurus molurus is sister taxon to P. molurus bivitt-

atus (the Burmese Python) and is well placed within the clade

formed by the Afro-Asian species (P. sebae, P. regius,

P. brongersmai and P. molurus bivitattus). However, we find

that genus Python was not monophyletic as, P. reticulatus

and P. timoriensis formed a separate clade, sister to the

Australo-Papuan species suggesting, evolution once in

P. reticulatus, and once in the lineage leading to the Asian

and African species (P. sebae, P. molurus etc.). Our analyses

were in correlation with the paraphyly of Pythons as already

reported in earlier studies [20].

Our phylogenetic analyses reflect the most widely

accepted interpretations with respect to the overall snake

phylogeny as well as the Python phylogenetics thus signi-

fying the utility of newly generated P. molurus molurus mt.

genome in answering phylogenetic issues. Moreover, the

applications of mt. DNA to wildlife management [58, 59],

hybridization of closely related species and evolutionary

genomics are well established. This study, presents the

complete mt. genome of endangered Indian snake, P. molu-

rus molurus, as the data on codon usage, start/termination

codons and increase in the amount of mt. sequence data

availability could be helpful in various population genetics

or evolutionary studies of these animals.

Fig. 3 Phylogenetic

relationships among snake

lineages as inferred from 13

PCGs (dataset I). Numbers on

the branches indicate Bayesian

posterior probabilities,

maximum likelihood and

maximum parsimony analysis

bootstrap values, respectively

7410 Mol Biol Rep (2012) 39:7403–7412

123

Page 9: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

Acknowledgments We gratefully acknowledge Mr. Manish

Kulshreshtha, Director Snake Transit House, Jabalpur and Chennai

Snake Park Trust, Chennai, for providing the valuable research

samples. The study was funded by Directorate of Forensic Science,

Ministry of Home Affairs, Government of India, New Delhi.

References

1. Morin PA, Luikart G, Wayne RK, the SNP workshop group

(2004) SNPs in ecology, evolution, and conservation. Trends

Ecol Evol 19:208–216

2. Pearse DE, Arndt AD, Valenzuela N, Miller BA, Cantarelli VH,

Sites JWJR (2006) Estimating population structure under non-

equilibrium conditions in a conservation context: continent-wide

population genetics of the gaint Amazon river turtle Podocnemisexpansa (Chelonia: Podocnemididae). Mol Ecol 15:985–1006

3. Parham JF, Feldman CR, Boore JL (2006) The complete mito-

chondrial genome of enigmatic bigheaded turtle (Pltysteron):

description of unusual genomic features and reconciliation of

phylogenetic hypotheses based on mitochondrial and nuclear

DNA. BMC Evol Biol 6:11

4. Teletchea F, Maudet C, Hanni C (2005) Food and forensic

molecular identification: update and challenges. Trends Bio-

technol 23:359–366

5. Dubey B, Meganathan PR, Haque I (2009) Multiplex PCR assay

for rapid identification of three endangered snake species of

India. Conserv Genet 10:1861–1864

6. Dubey B, Meganathan PR, Haque I (2009) Molecular identifi-

cation of three Indian snake species using simple PCR-RFLP

method. J Forensic Sci 55(4):1065–1067

7. Dubey B, Meganathan PR, Haque I (2011) DNA mini-barcoding:

an approach for forensic identification of some endangered Indian

snake species. Forensic Sci Int Gen 5(3):181–184

8. Meganathan PR, Dubey B, Haque I (2009) Molecular identifi-

cation of crocodile species using novel primers for forensic

analysis. Conserv Genet 10:767–770

9. Meganathan PR, Dubey B, Haque I (2009) Molecular identifi-

cation of Indian crocodile species: PCR-RFLP method for

forensic authentication. J Forensic Sci 54(5):1042–1045

10. Meganathan PR, Dubey B, Jogayya KN, Whitaker N, Haque I

(2010) A novel multiplex PCR assay for the identification of

Indain crocodiles. Mol Ecol Resour 10(4):744–747

11. Meganathan PR, Dubey B, Batzer MA, Ray DA, Haque I (2011)

Complete mitochondrial genome sequences of three Crocodylusspecies and their comparison within the Order Crocodylia. Gene

478:35–41

12. Avise JC (1994) Molecular markers, natural history and evolu-

tion. Chapman & Hall, New York

13. Roman J, Bowen BW (2000) The mock turtle syndrome: genetic

identification of turtle meat purchased in the south-eastern United

States of America. Anim Conserv 3:61–65

14. Whitaker R (2006) Common Indian Snakes: a field guide. Mac-

millan India, New Delhi

15. World Conservation Monitoring Centre (1996) Python molurus.

In: IUCN (2009) IUCN Red List of threatened species. Version

2009.2. www.iucnredlist.org. Downloaded on 10 March 2010

16. Boore JL (1999) Animal mitochondrial genomes. Nuc Acids Res

27:1767–1780

17. Ryder OA (2005) Conservation genomics: applying whole gen-

ome studies to species conservation efforts. Cytogenet Genome

Res 108:6–15

18. Kumazawa Y, Ota H, Nishida M, Ozawa T (1996) Gene rear-

rangements in snake mitochondrial genomes: highly concerted

evolution of control-region like sequences duplicated and inserted

into a tRNA gene cluster. Mol Biol Evol 13:1242–1254

19. Kumazawa Y, Dong S (2005) Complete mitochondrial DNA

sequences of six snakes: phylogenetic relationships and molec-

ular evolution of genomic features. J Mol Evol 61:12–22

20. Rawlings LH, Rabosky DL, Donnellan SC, Hutchinson MN

(2008) Python phylogenetics: inference from morphology and

mitchondrial DNA. Biol J Linn Soc 93:603–619

21. Arnason U, Johnsson E (1992) The complete mitochondrial DNA

sequence of the harbor seal, Phoca vitulina. J Mol Evol 34:

493–505

22. Cao Y, Adachi J, Janke A, Paabo S, Hasegawa M (1994) Phy-

logenetic relationships among eutherian orders estimated from

inferred sequences of mitochondrial proteins: Instability of tree

based on a single gene. J Mol Evol 39:519–527

23. Sambrook JE, Fritsch F, Maniatis T (1989) Molecular cloning. A

laboratory manual, 2nd edn. Cold Spring Harbor Laboratory

Press, New York

24. Barnes WM (1994) PCR amplification of up to 35-kb DNA with

high fidelity and high yield from k bacteriophage templates. Proc

Natl Acad Sci USA 91:2216–2220

25. Hall TA (1999) BioEdit: a user-friendly biological sequence

alignment editor and analysis. http://www.mbio.ncsu.edu/Bio

Edit/bioedit.html

Fig. 4 Phylogenetic relationships among members of Pythonidae as

inferred from partial cytochrome b gene sequences (dataset II).

Numbers on the branches indicate Bayesian posterior probabilities,

maximum likelihood and maximum parsimony analysis bootstrapvalues respectively

Mol Biol Rep (2012) 39:7403–7412 7411

123

Page 10: Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)

26. Laslett D, Canback B (2008) ARWEN, a program to detect tRNA

genes in metazoan mitochondrial nucleotide sequences. Bioin-

formatics 24:172–175

27. Kumar S, Tamura K, Nei M (2004) MEGA 3: integrated software

for molecular evolutionary genetics analysis and sequence

alignment. Brief Bioinform 5:150–163

28. Zhang ZLJ, Zhao XQ, Wang J, Wong GK, Yu J (2006)

KaKs_Calculator: calculating Ka and Ks through model selection

and model averaging. Genomics Proteomics Bioinform 4:259–

263

29. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins

DG (1997) The ClustalX windows interface: flexible strategies

for multiple sequence alignment aided by quality analysis tools.

Nucleic Acids Res 24:4876–4882

30. Wilcox TP, Zwickl DJ, Heath TA, Hillis DM (2002) Phylogenetic

relationships of the dwarf boas and a comparison of Bayesian and

bootstrap measures of phylogenetic support. Mol Phylogenet

Evol 25:361–371

31. Noonan BP, Chippindale PT (2006) Dispersal and vicariance: the

complex evolutionary history of boid snakes. Mol Phylogenet

Evol 40:347–358

32. Guindon S, Gascuel O (2003) A simple, fast, and accurate

algorithm to estimate large phylogenies by maximum likelihood.

Syst Biol 52:696–704

33. Jobb G, von Haeseler A, Strimmer K (2004) TREEFINDER: a

powerful graphical analysis environment for molecular phylog-

entics. BMC Evol Biol 4:18

34. Huelsenbeck JP, Ronquist FR (2001) MRBAYES: Bayesian

inference of phylogenetic tree. Bioinformatics 17:754–755

35. Felsenstein J (1993) Phylogenetic INFERENCE PROGRAMs

(PHYLIP). University of Washington, Seattle, and University

Herbarium. University of California, Berkeley

36. Douglas AD, Gower JD (2010) Snake mitochondrial genomes:

phylogenetic relationships and implications of extended taxon

sampling for interpretations of mitogenomic evolution. BMC

Genomics 11:14

37. Yan Jie, Li Hongdan, Zhou Kaiya (2008) Evolution of the

mitochondrial genome in snakes: gene rearrangements and phy-

logenetic relationships. BMC Genomics 9:569

38. Boore JL (2004) Complete mitochondrial genome sequence of

Urechis caupo, a representative of the phylum Echiura. BMC

Genomics 5:67

39. Wolstenholme DR, Macfarlane JL, Okimoto R, Clary DO, Wa-

hleithner JA (1987) Bizarre tRNAs inferred from DNA sequences

of mitochondrial genomes of nematode worms. Proc Natl Acad

Sci USA 84:1324–1328

40. Janke A, Erpenbeck D, Nilsson M, Arnason U (2001) The

mitochondrial genomes of the iguana (Iguana iguana) and the

caiman (Caiman crocodylus): implications for amniote phylog-

eny. Proc Biol Sci 268(1467):623–631

41. Zhang M, Wang Y, Yan P, Wu Xiaobing (2011) Crocodilian

phylogeny inferred from twelve mitochondrial protein-coding

genes, with new complete mitochondrial genomic sequences for

Crocodylus acutus and Crocodylus novaeguineae. Mol Phylo-

genet Evol 60:62–67

42. Ojala D, Montoya J, Attardi G (1981) tRNA punctuation model

of RNA processing in human mitochondria. Nature 290:470–474

43. Li Hu, Gao Jianyu, Liu Haiyu, Cai Wanzhi (2009) Progress in the

researches on insect mitochondrial genome and analysis of gene

order. Sci Found China 17(2):39–45

44. Fearney IM, Walker JE (1986) Two overlapping genes in bovine

mitochondrial DNA encode membrane components of ATP

synthase. EMBO J 5:2003–2008

45. Hassanin A, Leger N, Deutsch J (2005) Evidence for multiple

reversals of asymmetric mutational constraints during the evo-

lution of the mitochondrial genome of metazoa, and conse-

quences for phylogenetic inferences. Syst Biol 54:277–298

46. Perna NT, Kocher TD (1995) Patterns of nucleotide composition

at fourfold degenerate sites of animal mitochondrial genomes.

J Mol Evol 41:353–358

47. Asakawa S, Kumazawa Y, Araki T, Himeno H, Miura K, Wa-

tanabe K (1991) Strand-specific nucleotide composition bias in

echinoderm and vertebrate mitochondrial genomes. J Mol Evol

32(6):511–520

48. Jia W, Higgs PG (2008) Codon usage in mitochondrial genomes:

distinguishing context-dependent mutation from translational

selection. Mol Biol Evol 25(2):339–351

49. Shardiwal RK, Sartaj SS (2009) A more elaborative way to check

codon quality: an open source program. EMBnet.news 15(3):

18–21

50. Harlid A, Janke A, Arnason U (1998) The complete mitochon-

drial genome of Rhea Americana and early Avian divergences.

Mol Biol Evol 46:669–679

51. Harlid A, Janke A, Arnason U (1997) The mt DNA sequence of

Ostrich and the divergence between paleognathous and neog-

nathous birds. Mol Biol Evol 14:754–761

52. Hurst LD (2002) The Ka/Ks ratio: diagnosing the form of

sequence evolution. Trends Genet 18(9):486–487

53. Deodoro CSG, Oliveira, Raychoudhury R, Dennis VL, John HW

(2008) Rapidly evolving mitochondrial genome and directional

selection in mitochondrial genes in the parasitic wasp nasonia

(Hymenoptera: Pteromalidae). Mol Biol Evol 25(10):2167–2180

54. Heise JP, Maxson LR, Dowling GH, Hedges SB (1995) Higher-

level snake phylogeny inferred from mitochondrial DNA

sequences of 12s rRNA and 16s rRNA Genes. Mol Biol Evol

12:259–265

55. Dessauer HC, Cadle JE, Lawson R (1987) Patterns of snake

evolution suggested by their proteins. Fieldiana Zool 34:1–34

56. Kelly CMR, Barker NP, Villet MH (2003) Phylogenetics of

advanced snakes (Caenophidia) based on four mitochondrial

genes. Syst Biol 52:439–459

57. Vidal N, Delmas Anne-Sophie, David P, Cruaud C, Couloux A,

Hedges SB (2007) The phylogeny and classification of caeno-

phidian snakes inferred from seven nuclear protein-coding genes.

CR Biologies 330:182–187

58. Ferris SD, Berg WJ (1987) The utility of mitochondrial DNA in

fish genetics and fishery management. In: Ryman N, Utter F (eds)

Population genetics and fishery management. University Wash-

ington Press, Seattle, pp 277–300

59. Quinn TW, White BN (1987) Analysis of DNA sequence varia-

tion. In: Cooke F, Buckley PA (eds) Avian genetics. Academic

Press, London, pp 163–198

7412 Mol Biol Rep (2012) 39:7403–7412

123