Conservation of the homeodomain sequence of
the Mixl1 homeobox gene in multiple species
Nhi Hin
ABSTRACT
The Mix family of paired-like homeobox genes are highly conserved throughout evolution due to their vital
roles in the formation and specification of mesoderm and endoderm during vertebrate gastrulation. The
homeodomain motif is essential for the DNA-binding activity of the transcription factor products of these
genes and is shown to be highly conserved amongst a diverse range of species despite some variation in
homeobox nucleotide sequence. In the present study, the Mix1 homeodomain sequences in the mouse,
zebrafish and platypus are isolated, sequenced and compared to gain insight into their evolutionary history.
The Mix1 homeobox sequences determined are consistent with the reported literature. Despite some
variation in nucleotide sequence, active-site amino acid residues in the N-terminal arm and recognition helix
were found to be particularly conserved throughout all species while nucleotides corresponding to non-
active site amino acids displayed greater variation.
INTRODUCTION
Homeobox (Hox) genes are crucial in the pattern
formation of many vertebrates during
embryogenesis. The genes are clustered in the
genome and encode transcription factors called
homeoproteins that specify segmental identity and
positional information along the anterior-posterior
axis. The organisation of Hox genes in the
chromosome corresponds to the order of their
spatial and temporal expression along the anterior-
posterior body axis, a phenomenon referred to as
“collinearity” (Fig.1A). Additionally, Hox genes
contain a highly conserved DNA sequence known as
the homeobox which encodes a 60-amino acid
protein structure called the homeodomain (Fig.1B).
The homeodomain has a helix-turn-helix motif that
allows homeoproteins to bind to specific DNA
sequences; the N-terminal arm contacts the minor
DNA groove while the third helix interacts with the
major DNA groove (Fig. 1C) (Burke et al. 1995;
Gehring et al. 1994).
The Mix family of paired-like homeobox genes has
been highly conserved throughout vertebrate
development and is involved in the establishment
and specification of mesoderm and endoderm germ
layers during gastrulation (Pereira et al. 2012).
Mesoderm-Inducing-Factor Inducible Homeobox
(Mix1) is predominantly expressed at the sites of
future endoderm and mesoderm development in the
blastula in many species including mice and humans
(Pereira et al. 2012). Zebrafish appear to have
multiple partially redundant Mix genes, including the
specific paired-like-homeobox gene called Mxtx1,
which is expressed in an analogous location to the
primitive endoderm in mammalian embryos (Hirata
et al. 2000).
The conservation of Mix-family genes across many
species make them useful for investigating the
evolutionary history of these species. In the present
study, sequences corresponding to the Mixl1 gene in
mice and platypus and Mxtx1gene in zebrafish will be
extracted and sequenced. The gene structures for
mouse and platypus Mixl1 along with zebrafish Mxtx1
are shown in Figure 2. Note the homeobox
sequences are separated by a variable intrionic
sequence in each species. Primers have been
designed to extract sequences containing the
homeobox regions. The extent of conservation of
nucleotide sequence and amino acid sequence will
be determined and the evolutionary history of these
species briefly discussed.
Figure 1. Illustrations of Hox gene properties. (A) Collinear expression of Hox genes in Drosophila. The order of
Hox genes in the HOM-C cluster in the genome corresponds to spatial and temporal expression across the anterior-
posterior body axis. Image: (Mark et al. 1997). (B) General schematic representation of a Hox gene, homeoprotein and
homeodomain. The 180 bp homeobox on the Hox gene encodes the homeodomain on the homeoprotein. The
homeodomain has three helices. Image: (Lappin et al. 2006). (C) Binding of homeodomain motif to DNA. The N-terminal
arm contacts the minor groove while the third helix interacts with the major groove. Image: (Hueber 2007).
A. Mouse Mixl1 gene structure
B. Platypus Mixl1 gene structure
C. Zebrafish mxtx1 gene structure
Figure 2. Schematic gene diagrams of (A) mouse Mixl1, (B) platypus Mixl1 and (C) zebrafish Mxtx1 (Daish 2016b).
Primers Homeobox
location
RESULTS & DISCUSSION
Extraction & Purification of Genomic DNA
for Sequencing Reactions.
Figure 3 shows that most genomic DNA samples
(Lanes 1-5) were successfully extracted, with distinct
intense bands in most lanes indicating sufficient
amounts of gDNA at least 23 kbp in size. This
suggests excessive degradation has not occurred for
DNA in Lanes 1-5 and integrity of extracted DNA is
high. In contrast, the diffuse band in Lane 6 is less
than 0.56 kpb, suggesting only very small fragments
of DNA are present. Possible causes of this include
contamination by nucleases that degrade DNA.
excess salt in the sample, leaving the sample at high
temperature for too long, or using an incorrect
buffer solution (ThermoFisher 2016). Issues with the
reagents themselves are unlikely as the same
reagents were used to prepare all samples.
Streaking of bands is prevalent, signifying various
sizes of DNA in the samples, although the intense
bands near 23 kbp indicate most DNA is still intact
in large genomic fragments. Intense staining at the
wells of samples in Lanes 2, 3 and 5 suggest
particularly high concentrations of genomic DNA;
these high concentrations would block the pores of
the gel matrix, inhibiting DNA movement through
the gel. Consequently, smearing of the bands occurs
as DNA bleeds into the gel slowly, producing the
streaking observed. Band distortion in Lanes 2, 3 and
5 may have been caused by air bubbles when loading
sample or uneven heating of gel which would cause
local changes in buffer conductivity (Qiagen 2015).
Sample DNA concentration is estimated through
comparing intensity of sample bands with the
intensity of fragments in the 0.5μg of Lambda/HindIII
molecular weight marker in Lane 7. Box 1 explains
how the concentration of the zebrafish genomic
DNA in Lane 4 was estimated to be 13.2ng/μL. Bands
in other lanes (e.g. 3, 5) in Figure 3 are more intense,
indicating higher concentrations of genomic DNA.
However, it is not critical that the gDNA sample
concentration is high. The recommended
concentration of gDNA for amplification via PCR
ranges from 25-100ng/μL, so having a lower
concentration simply means that a greater amount
should be used in the PCR (see Appendix A).
Isolation of DNA fragments containing mixl1
homeobox structure using PCR.
The PCR products are shown in a gel
electrophoresis image in Figure 4. Most PCR
product sizes are consistent with expected amplicon
sizes, indicating primers were highly specific to the
target sequence and amplification was successful.
The expected amplicon size for the Z1 forward and
reverse primers for the zebrafish is 653 bp (Daish
2016b). This is consistent with the gel
electrophoresis in Figure 4, which shows a PCR
product of approximately 653 bp in Lane 5.
Meanwhile, Lanes 1, 2 and 7 in Figure 3 show bands
corresponding to the expected amplicon using the
platypus PF2 and PR2 primers of 685 bp (Daish
2016b). However, Lane 7 shows an additional band
of approximately 360 bp. It is possible that this
second PCR product arose from the primers binding
to and amplifying another region on the template
DNA.
Figure 3. Electrophoresis of extracted
genomic DNA from various species on
1.5% agarose gel.
Lanes 1-6 contain 7.5μg samples of genomic
DNA; 1 = Mouse liver, 2 = Mouse liver, 3 =
Zebrafish, 4 = Zebrafish, 5 = Platypus, 6 =
Mouse liver, 7 = 0.5μg Lambda/HindIII molecular
weight marker.
In this case, gel purifying the 685 bp PCR product
would help ensure that the purity is sufficient for a
successful sequencing reaction. Lanes 8 and 9 used
the MF1/MR1 and MF2/MR2 primer sets
respectively. Lane 8 has one band of 453 bp while
Lane 9 has one band of 391 bp. These sizes are
consistent with expected amplicon sizes and the
lack of other bands indicates the PCR products are
sufficiently pure. However, faintness of these bands
suggests low concentration of PCR products. This
could be due to non-optimal PCR conditions. For
example, temperature may have been too low
resulting in incomplete denaturation, DNA template
had insufficient integrity, denaturation time was too
long leading to degradation (Bio-Rad 2016).
The product in Lane 6 failed to amplify. Running the
following controls in the same gel would help
determine the cause of failure along with ensuring
the correct target sequence was amplified:
Negative control with water to ensure no DNA
contamination was in the water.
Negative control with all PCR reagents except
for DNA template to ensure that reagents are
not contaminated, and there is no non-specific
amplification in the reaction. Detection of
positive signal in this control would indicate the
presence of contaminating nucleic acids.
Positive control using template and primers
known to amplify correctly and produce distinct
bands under the PCR conditions. This control
should contain the same PCR reagents as the
samples and should be easily distinguished from
the target DNA (e.g different size).
Preparing several samples of different
concentration may help determine if smearing is
due to using too high of a DNA concentration.
This is particularly important if the
concentration of the genomic DNA was only
estimated.
Lane 4 has one smeared band where the majority of
DNA has remained in the well. A smear instead of a
single band indicates DNA fragments of varying sizes.
The expected products of a successful PCR reaction
should have the same sequence and same size,
assuming PCR conditions are optimal and the
primers are specific. Hence varying sized bands
indicate that the PCR reaction was not specific
enough in amplifying the target DNA. Possible
causes include:
Non-specific primers, leading them to bind to
other parts of the template DNA which also get
amplified.
Non-optimal cycling conditions: For example,
excessive number of cycles, excessive extension
time, excessive annealing time, or insufficiently
Box 1. Estimation of concentration of
zebrafish genomic DNA in Lane 4.
Table 1. Known sizes of DNA fragments
from Lambda/HindIII molecular weight
marker
Fragment Size (Kbp)
1 23
2 9.6
3 6.6
4 4.4
5 2.2
6 2.0
7 0.56
Total Size 48.36
Source: Genetics III Practical Manual (University of
Adelaide, 2016).
The total size of the fragments in the DNA
marker is 48.36 kbp. Since 0.5μg of
molecular weight marker was used, the
corresponding ratio is:
48.36kpb
0.5μg=48.36kbp
500ng=
1kbp
10.339ng
The sample in Lane 4 has comparable
intensity to Fragment 2 of the molecular
weight marker, corresponding to a size of
9.6 kbp (Table 1):
1kbp
10.339ng× 9.6kbp =
9.6kbp
99.26ng
i.e. There is 99.26ng of genomic DNA in
Lane 4. Because 7.5μL of genomic DNA had
been loaded onto the gel, the concentration
of genomic DNA in Lane 4 is estimated to
be:
99.26ng
7.5μL=13.2ng
1μL
Figure 2. Estimation of Zebrafish genomic
DNA concentration (Lane 4 from Figure 1)
and brief explanation of reasoning.
high annealing temperature all increase the
opportunity for non-specific amplification (Bio-
Rad 2016).
Too high concentration of template DNA. This
can inhibit the polymerase due to inhibitors in
the template or inefficient denaturation (Qiagen
2015).
Genomic DNA was of poor quality (e.g.
sheared).
Using such a PCR product in a sequencing reaction
would likely result in many nucleotides which cannot
be accurately identified (appear as “N” in the
sequence). The presence of multiple DNA
sequences (from the non-specific PCR products)
means that the sequence of the desired PCR
product cannot be distinguished from the
contaminating sequences. The PCR products in
Lanes 5 and 7 in Figure 4 are suitable for sequencing
due to single discrete PCR bands indicating
amplicons of expected size. The concentration of
the zebrafish template DNA used in Lane 5 was
measured to be 20.28 μg/mL using a
spectrophotometer.
Figure 4. Gel electrophoresis image of PCR products of various species. Lanes 2-9 were loaded
with 25-100ng DNA template. 1 = SPP1 Molecular Markers; 2 = Platypus, PF2 and PR2 primers; 3 =
Platypus, PF2 and PR2 primers; 4 = Mouse, MF1 and MR1 primers; 5 = Zebrafish ZF1 and ZR1 primers, 6 =
Zebrafish ZF2 and ZR2 primers; 7 = Platypus PF2 and PR2 primers; 8 = Mouse MF1 and MR1 primers; 9 =
Mouse MF2 and MR2 primers. Approximate size of successful amplicons marked above PCR bands.
360
Sequencing of Zebrafish PCR Product & Identification of mxtx1 homeobox sequence.
Figure 6. Comparison of sequenced zebrafish and cavefish Mxtx1 homeobox sequences, and
corresponding amino acid sequences.
Figure 5 shows a sequence alignment of the zebrafish
amplicon amplified with ZF1 and ZR1 primers and
sequenced using ZF1 primer; zebrafish amplicon
amplified with ZF2 and ZR2 primers and sequenced
using ZF2 primer (obtained from demonstrators);
and the cavefish mxtx1 homeobox sequence. Both
the ZF1 and ZF2 zebrafish sequences were required
to determine the full 180 bp zebrafish homeobox, as
although the ZF1 amplicon has most of the required
homeobox sequence, the ZF2 amplicon has the
remaining small part. The zebrafish and cavefish
sequences are very similar, which is expected as
homeobox sequences tend to be highly conserved
through evolution, and the cavefish and zebrafish
share a common ancestor. There are several
nucleotide differences (36/180 = 20%), indicating
that point mutations have occurred since the
zebrafish and cavefish diverged from their common
ancestor. However, some of these may also be due
to the accuracy of the zebrafish sequences used. It
zebrafish_ZF1 TGCTACTGCTAAAACATCTGGAAGTGGAGCTGTATCCAGAAGCGCAAGTC cavefish_mxtx1_homeobox ----------------------------------------------AGCC ** * zebrafish_ZF1 GCAGGAAAAGGACAAGTTTCTCCAAGGAACACGTTGAGCTTCTGCGAGCT cavefish_mxtx1_homeobox GCAGAAAGAGGACCAGCTTCTCCAAAGAGCACGTAGAGCTGCTGAGGGCC **** ** ***** ** ******** ** ***** ***** *** * ** zebrafish_ZF1 ACATTTGAAACAGACCCTTACCCTGGAATCAGTCTCAGAGAGAGTCTTTC cavefish_mxtx1_homeobox ACATTTGAGACGGACCCGTACCCGGGCATCAGCCTGAGGGAGAGCCTGTC ******** ** ***** ***** ** ***** ** ** ***** ** ** zebrafish_ZF1 CCAAACCACAGGACTGCCAGAGTCTCGCATACAGGT-------------- cavefish_mxtx1_homeobox TCAGACCACCGGCCTGCCTGAGTCACGAATACAGGTTTGGTTCCAGAACA zebrafish_ZF2 --------------------------------AGGTCTGGTTCCAGAATA ** ***** ** ***** ***** ** ******** *********** * cavefish_mxtx1_homeobox GGAGGGCTCGTACGCTTAAGTGTAAG zebrafish_ZF2 GGAGAGCTCGCACGTTGAAATGCAAG **** ***** *** * ** ** ***
Figure 5. Multiple Sequence Alignment of zebrafish ZF1/ZR1 PCR product (sequenced using
ZF1 primer), zebrafish ZF2/ZR2 PCR product (sequenced using ZF2 primer) and cavefish
mxtx1 homeobox sequences. Asterisks signify perfectly matched nucleotides.
**** ***** *** * ** ** ***
may be useful to do several independent sequencing
reactions using different zebrafish samples to
average out random point mutations which do not
reflect the consensus zebrafish homeobox sequence.
Point mutations result in a different RNA transcript,
which may be translated into a different amino acid
sequence. A different amino acid sequence could
affect the folding and hence function of the encoded
protein. It is likely that point mutations
corresponding to active-site amino acids would
likely be deleterious and hence be unlikely to occur,
explaining why most nucleotides are highly
conserved between the zebrafish and cavefish
sequences. However, replacement polymorphisms
that correspond to non-active site amino acids or to
no change to the amino acid sequence (silent
polymorphisms) would have less or no effect on the
functioning of the homeoprotein and hence may be
passed onto the next generation. Figure 6 shows the
amino acid sequences corresponding to the 180 bp
zebrafish and cavefish mxtx1 homeobox sequences.
Surprisingly, despite the nucleotide substitutions,
both amino acid sequences are identical. This is
made possible by codon degeneracy which refers to
the manner in which a particular amino acid may be
specified by several different codons.
Figure 8. Comparison of sequenced platypus homeobox sequence and opossom Mixl1
homeobox sequence, with corresponding amino acid sequences.
platypus_PF1 GTCGCAGCGGCGGAAGCGCACGTCGTTCAGCCCGGAGCAGCTGCAGCTGC opossum_MixL1_homeobox ----CAGCGCAGGAAGAGAACGTCTTTCAGCCCCGAGCAGCTGCAGCTGC ***** ***** * ***** ******** **************** platypus_PF1 TGGAGCTCGTCTTCCGCCGCACCATGTACCCCGACATCAACCTGCGGGAC opossum_MixL1_homeobox TGGAACTGGTGTTTCGCCGGACCATGTACCCGGACATCACCTTGCGGGAA **** ** ** ** ***** *********** ******* * ******* platypus_PF1 CGCCTGGCCGCCCTCACGCAGCTCCCCGAGTCCAGGATCCAGGTC----- opossum_MixL1_homeobox CGCCTGGCTACCCTCACTAGGCTCCCGGAGTCCAGGATCCAGGTCTGGTT platypus_PF2 --------------------------------------CCAGGTCTGGTT ******** ******* ****** *********************** opossum_MixL1_homeobox CCAGAACAGACGCGCCAAATCCCGTCGGCAGAGA platypus_PF2 CCAGAACAGACGTGCCAAATCCCGGCGCCAGAAA ************ *********** ** **** *
Figure 7. Multiple Sequence Alignment of platypus PF1/PR1 PCR product (sequenced using
PF1 primer), platypus PF2/PR2 PCR product (sequenced using PF2 primer) and opossom mixl1
homeobox sequences. Asterisks signify perfectly matched nucleotides.
Sequencing of platypus PCR Product &
Identification of mixl1 homeobox sequence.
Figure 7 shows the multiple sequence alignment of
the sequenced platypus products (using PF1,
obtained from Marina Zupan; and PF2, obtained
from demonstrators). Sequence similarity is also
high, consistent with how the homeobox is highly
conserved. However, there are several mismatches
(24/180 = 13%), indicating that the platypus and
opossom underwent some evolutionary change
since diverging from their common ancestor. Figure
8 shows the amino acid sequence corresponding to
the platypus and opossom homeobox sequences. It
can be seen that most amino acids are the same due
to some of the replacement polymorphisms being
silent. However, there are several amino acid
changes (E → D at 94-96 nt; T→A at 107-109 nt;
R→Q at 115-118 nt). It is likely that these
correspond to amino acids not at the active site of
the homeoprotein, as replacements to active site
amino acids would likely be deleterious. It is also
possible that there could be errors associated with
the platypus sequence used or mismatches during
sequencing that account for some nucleotide
changes and hence amino acid changes.
Sequencing of mouse PCR product &
Identification of mixl1homeobox sequence.
A sequence comparison of mice PCR products (MF1,
MR1, sequenced with MF1 primer; MF2, MR2,
sequenced with MR2 primer; obtained from 2014
class) and the human mixl1 homeobox sequence is
shown in Figure 9. In contrast to the zebrafish and
platypus sequences, the mouse sequence appears to
have more “N”s (unidentified nucleotides) in the
middle of the sequence. Reasons for these “N”s are
discussed later. However, sequence similarity is still
high (34/180 = 19% difference), consistent with the
conservation of homeobox sequences through
evolution. The corresponding amino acid sequences
are shown in Figure 10. Although some amino acids
are unknown due to the presence of unidentified
nucleotides, most amino acids appear to be
conserved.
It is noted that a mouse PCR product using
MF1/MR2 primers can be used in the sequencing
reactions for both the forward and reverse
sequencing primers. This is because the 2,148 bp
amplified region contains the entire homeobox
sequence (Figure 2A). Sequencing using the MF1
forward primer would yield a sequence containing
both homeobox sequences, while sequencing with
the MF2 forward primer yields a sequence
containing the smaller homeobox sequence.
sequencing with the MR1 reverse primer yields a
sequence containing the larger homeobox sequence
while sequencing with the MR2 reverse primer
yields a sequence containing both homeobox
sequences.
mouse_f1_2014 AGGGTCGGGCGCCCCGTCGGAGCCNNNNNNCGCAAGAGT-TGTCGTTCANCTCGGAGCAG human_mixl1_cds ------------------------CAGCGCCGCAAGCGCACGTCTTTCAGCGCCGAACAG ****** * *** **** * * ** *** mouse_f1_2014 CTGCCGTTGCTGGATCTCGTCTTCCNACAGACCATGTACCCNGACATCCACTTGCGGGAG human_mixl1_cds CTGCAGCTGCTGGAGCTCGTCTTCCGCCGGACCCGGTACCCCGACATCCACTTGCGCGAG **** * ******* ********** * **** ****** ************** *** mouse_f1_2014 CGCCTGGCTGCGCTCACGNTNCTACCCGAGTCCAGGATCC------------------- human_mixl1_cds CGCCTGGCCGCGCTCACCCTGCTCCCCGAGTCCAGGATCCAGGTATGGTTCCAGAACAGG mouse_f2_2014 CCAGGTATGGTTCCAGAACCGA ******** ******** * ** ********************************* * human_mixl1_cds CGTGCCAAGTCTCGGCGTCAGAGT mouse_f2_2014 CGGGCCAAGTCCAGGCGCCAGAGT ** ******** **** ******
Figure 9. Multiple Sequence Alignment of mouse MF1/MR1 PCR product (sequenced using
MF1 primer), mouse MF2/MR2 PCR product (sequenced using MR2 primer and reverse
complemented) and human mixl1 homeobox sequences.
Asterisks signify perfectly matched nucleotides.
Figure 10. Comparison of sequenced mouse homeobox sequence and human Mixl1
homeobox sequence, with corresponding amino acid sequences.
Evolutionary history of species using mix
homeobox sequences.
A phylogenetic tree for the sequenced species
(zebrafish, mouse, platypus) along with comparison
sequences (chicken, opossom, human, cavefish) is
shown in Figure 11. The tree supports the choice of
comparison sequences used earlier for the zebrafish,
platypus and mouse. The zebrafish is most related to
the cavefish compared to the other organisms. This
is also the case between the mouse and human, and
platypus and opossom. In the sequence comparison
of all species analysed in Figure 12, it can be seen
that species that are more related (share a more
recent common ancestor) appear to have more
Figure 11. Phylogenetic tree for various species, constructed using their 180 bp mixl1 homeobox
sequences using Jukes-Cantor Genetic Distance Model and Neighbour Joining tree build method.
similar nucleotide sequences compared to other
species. The tree shows that the chicken is less
related to the other species and this is reflected
through its Mixl1sequence in Figure 12 which shows
comparatively more variation. In addition, although
the zebrafish and cavefish share a recent common
ancestor, their Mxtx1 nucleotide sequences display
more variation when compared to mammal,
platypus and opossom sequences. This is consistent
with their position on the tree.
The branch length reflects the rate of evolutionary
change. Here it indicates the zebrafish underwent a
greater rate of evolutionary change compared to the
cavefish; the human underwent a greater rate of
evolutionary change compared to the mouse; and
the opossom underwent a greater rate of
evolutionary change compared to the platypus. Both
fish have a significantly greater rate of change
compared to the other species, consistent with their
shorter generation times and greater numbers of
offspring. However, it is known that monotremes
(e.g. platypus) are older than marsupials (e.g.
opossoms) along with mammals like humans and
mice. From this, it is expected that the branch length
of the platypus should be longer than that observed
in the tree to reflect more evolutionary change.
Additionally, while it would be expected that
monotremes like the platypus would diverge from
the common ancestor before mammals and
marsupials, the tree in Figure 10 suggests that the
platypus and opossom diverged from their common
ancestor at the same time, and this event happened
after the divergence of mammals, which is
inconsistent with current knowledge (see Appendix
B for a more accurate phylogenetic tree). This
indicates possible issues with the sequences used to
construct the tree; perhaps the sequences had
errors or it is inaccurate to construct phylogenetic
relationships using a single homeobox sequence.
Primer design for sequencing compared to
PCR.
Some primers work well in PCR but not in a
sequencing reaction. Sequencing reactions differ
from PCR in that they involve linear amplification of
the template DNA rather than exponential
amplification. While PCR uses two primers to create
a product that has priming sites and is readily able
to be used as a template for future amplifications,
DNA sequencing uses one primer. The resulting
product is in the same direction as the primer and
cannot be used as a template for future cycles, so all
amplification is directly from the original template
DNA (DNACore 2015). Because sequencing
reactions are much more sensitive to inefficient
primers, this may mean that primers that are
sufficiently competent in the exponential PCR
process are inefficient in the linear amplification of
sequencing reactions and may not produce sufficient
sequencing product to obtain a clear sequence.
Kieleczawa (2006) describes additional factors that
may inhibit the success of a sequencing reaction,
including:
GC-rich templates: DNA templates
containing >60% GC-content;
Templates with various repeats (e.g. di and tri-
nucleotide, direct, inverted);
Templates or primers which contain hairpin or
other secondary structures which interfere with
DNA polymerase movement;
Primers which have high melting point
temperatures (>65oC) which can lead to
secondary priming artefacts and noisy sequences;
Primers which are able to self-hybridise, forming
‘primer-dimers’.
Sequencing Reaction Issues
The presence of “N”s in the sequence obtained
from a sequencing reaction indicates unidentifiable
nucleotides. It is also common to observe variation
in the sequence size. Possible causes include:
Inadequately purified PCR product: This may
transfer PCR reaction components to the
sequencing reaction. If dNTPs were transferred,
these would compete with the fluorescent
ddNTPs, resulting in less fluorescence and
hence low resolution readings which may
contribute to “N”s. If forward and backward
primers still remained, they will both anneal to
complementary strands, resulting in two
sequences superimposed on each other that are
not readable (contain many “N”s) (Micromon
2013). Also. the non-purified PCR reaction may
contain multiple products which bind the
sequencing primer leading to multiple reads.
Primer melting temperature too low: Will result
in less stringent primer, increasing probability of
non-specific annealing to other sites on the
template. These multiple reads would likely
result in undistinguishable nucleotides (“N”s). If
they overlap, different sized sequencing
products could be formed.
Secondary structures in the DNA template that
limit DNA polymerase movement, contributing
to shorter sequences (Kieleczawa 2006).
Multiple priming sites on the template DNA:
Resulting sequence reads will be overlapped.
Calculation error in primer or template DNA
concentration. Too little primer or template
DNA will result in low amounts of extension
products being generated, showing up as a low
resolution signal, which may contribute to
inability to distinguish nucleotides.
Contamination by inhibitors (e.g. salt, protein,
nuclease) which degrade the DNA into smaller
fragments. If the primer binding site is still intact
on the smaller fragments, this may result in
smaller length sequencing products.
High conservation of DNA sequence between
each species is restricted to the 180 bp
homeobox. DNA sequence does not need to
be conserved for function to be conserved.
Homeobox sequences tend to be highly conserved
through evolution as they play an essential role in
differentiation during embryogenesis (Burke et al.,
1995; Gehring et al., 1994). Homeobox sequences
encode transcription factors with a homeodomain
allowing them to bind to specific DNA sequences.
These transcription factors regulate gene
expression to determine the identity of segment
structures that will form on a given segment. Since
transcription factors are proteins, their amino acid
sequence dictates their tertiary structure and hence
their function. Replacement polymorphisms
corresponding to amino acid changes are usually
deleterious to homeoprotein function and selected
against during evolution; hence the homeobox
sequence tends to be highly conserved. However,
the intrionic DNA flanking the homeobox does not
need to be highly conserved as it is usually spliced
out during RNA transcript processing. Nucleotide
substitutions tend to have minimal effect on the
phenotype of the organism, hence leading to greater
flexibility for variation.
It is not essential that the DNA sequence is
conserved for function to be conserved. The
function of the homeoprotein depends on its shape
which depends on its amino acid sequence.
Nucleotide substitutions corresponding to silent
polymorphisms (no amino acid change) would not
affect shape nor function of the homeoprotein.
Nucleotide substitutions corresponding to
replacement polymorphisms in non-active site
amino-acids may affect protein folding but may not
significantly affect protein function. The
homeodomain has a helix-turn-helix structure
composed of three helices. The N-terminal arm
interacts with the minor groove of DNA, while the
third helix interacts with the major groove of DNA.
Gehring et al. (1994) derived a consensus homeobox
sequence from 346 homeodomains taken from
different species and determined that there were
seven positions in the consensus sequence that are
occupied by the same amino acid in more than 95%
of cases. These amino acid positions are marked on
the sequences in Figure 12, which show that the
homeodomain amino acid sequences in the various
species determined in this study are also highly
conserved. The exception is the mouse sequence
which has an S instead of R in the N-terminal arm;
however, there are several unidentifiable
nucleotides around this region, meaning that it is
possible that this is an issue with the mouse
sequence itself. Additionally, the zebrafish Mxtx1
sequence exactly matches the one determined
Pereira et al. (2012), indicating it was accurately
sequenced. The mouse Mixl1 sequence has several
differences, likely attributable to errors in
sequencing, although the majority of the sequence is
similar.
Figure 12. Comparison of Mixl1 homeobox sequences and corresponding amino acids across various species.
Functional regions marked above the sequences. Amino acids found to be particularly conserved by Gehring et al.
(1994) are marked in boxes.
CONCLUSION
In the present study, the Mixl1 homeobox sequences
for the zebrafish, mouse and platypus were isolated
and successfully sequenced. Comparing these
sequences to closely related species including the
cavefish, human and opossom revealed the highly
conserved nature of the Mixl1 homeobox sequence.
Although some nucleotide differences were
observed, amino acid sequences were mostly similar
with particularly high conservation of active-site
amino acids. Highest similarity was observed
between species that shared a more recent common
ancestor, reflecting less evolutionary change
between them. Mixl1 sequences determined are
consistent with reported literature, although there
are some mismatches, likely due to insufficient
purification of PCR products or other experimental
error.
MATERIALS & METHODS
As stated in Daish (2016a; 2016c), except 10X load buffer was used during gel electrophoresis instead of 6X.
REFERENCES
Bio-Rad 2016, PCR Troubleshooting, Bio-Rad
Australia, viewed 16 April 2016, <http://www.bio-
rad.com/en-au/applications-technologies/pcr-
troubleshooting>.
Burke, AC, Nelson, CE, Morgan, BA & Tabin, C 1995,
‘Hox genes and the evolution of vertebrate axial
morphology’, Development, vol. 121, no. 2, pp.333-
346.
Daish, T 2016a, ‘Multispecies analysis of homeobox
domain containing transcription factors’, practical
notes for Genetics 3111, University of Adelaide,
viewed 20 April 2016,
<https://myuni.adelaide.edu.au/bbcswebdav/pid-
6999322-dt-content-rid-
9036585_1/courses/3610_GENETICS_COMBINED_
0001/Prac%20Handbook%202016.pdf>.
Daish, T 2016b, ‘PCR Amplification and Sequence
Analysis of Homeobox Containing Mixl1 Transcription
Factors’, practical notes for Genetics 3111,
University of Adelaide, viewed 20 April 2016,
<https://myuni.adelaide.edu.au/bbcswebdav/pid-
7272638-dt-content-rid-9349662_1/xid-
9349662_1>.
Daish, T 2016c, ‘GIII Prac Session 3: Transcription
factor homeobox sequencing file manipulation and
multi-species alignments’, practical notes for Genetics
3111, University of Adelaide, viewed 20 April
2016,
<https://myuni.adelaide.edu.au/bbcswebdav/pid-
7287107-dt-content-rid-
9402676_1/courses/3610_GENETICS_COMBINED_
0001/DAISH%20GIII%202016%20Prac%203%20i
ntro%282%29.pdf>.
DNACore 2015, Sanger DNA Sequencing: Template
Preparation, Harvard University, viewed 16 April
2016, <https://dnacore.mgh.harvard.edu/new-cgi-
bin/site/pages/sequencing_pages/seq_template_pr
eparation.jsp;jsessionid=B4030742B11CFAB5D82B5
7477E17E5E5>.
Gehring, WJ, Affolter, M & Burglin, T 1994,
‘Homeodomain proteins’, Annual Review of
Biochemistry, vol. 63, no. 1, pp.487-526.
Hirata, T, Yamanaka, Y, Ryu, SL, Shimizu, T, Yabe, T,
Hibi, M & Hirano, T 2000, ‘Novel mix-family
homeobox genes in zebrafish and their differential
regulation’, Biochemical and Biophysical Research
Communications, vol. 271, no.3, pp.603-609.
Hueber, SD 2009, ‘Identification and Functional
Analysis of Hox Downstream Genes in Drosophila’,
viewed 20 April 2016, <http://nbn-
resolving.de/urn:nbn:de:bsz:21-opus-38027>.
Kieleczawa, J 2006, ‘Fundamentals of sequencing of
difficult templates-an overview’, Journal of
Biomolecular Techniques, vol. 17, no. 3, p.207.
Lappin, TR, Grier, DG, Thompson, A & Halliday, HL
2006, ‘HOX genes: seductive science, mysterious
mechanisms’, Ulster Medical Journal, vol. 75, no. 1,
pp.23-31.
Mark, M, Rijli, FM & Chambon, P 1997, ‘Homeobox
genes in embryogenesis and pathogenesis’, Pediatric
Research, vol. 42, no. 4, pp.421-429.
Micromon 2013, Common Reasons for DNA
Sequencing Failure, Monash University, viewed 16
April 2016,
<https://platforms.monash.edu/micromon/images/st
ories/forms-and-user-guides/sequencing-
failure.pdf>.
Pereira, LA, Wong, MS, Lim, SM, Stanley, EG &
Elefanty, AG 2012, ‘The Mix family of homeobox
genes—key regulators of mesendoderm formation
during vertebrate development’, Developmental
Biology, vol. 367, no. 2, pp.163-177.
Qiagen 2015, Why do I get smeared PCR products?,
Qiagen, viewed 16 April 2016,
<https://qiagen.com/au/resources/faq?id=4eb03cc
8-4623-4e9e-96b2-6a4c17c03c58>.
ThermoFisher 2016, Nucleic Acid Gel Electrophoresis
and Blotting Support/Troubleshooting, ThermoFisher
Scientific, viewed 16 April 2016,
<https://thermofisher.com/au/en/home/technical-
resources/technical-reference-library/nucleic-acid-
purification-analysis-support-center/nucleic-acid-
electrophoresis-blotting-support>.
Warren, WC, Hillier, LW, Graves, J, Birney, E,
Ponting, CP, Grützner, F, Belov, K, Miller, W, Clarke,
L, Chinwalla, AT & Yang, SP 2008, ‘Genome analysis
of the platypus reveals unique signatures of
evolution’, Nature, vol. 453, no. 7192, pp.175-183.
Appendix A
Appendix B: Evolutionary tree showing accepted placement of monotremes and marsupials
relative to mice and humans (Warren et al. 2008)
Table A. PCR reaction reagents (in μL) used for the amplification of a segment of the Mixl1 homeobox
sequence from zebrafish genomic DNA.
Water 10x reaction
buffer
25mM
MgCl2
2.5mM
dNTP
Primers
(ZF1, ZR1
5μM)
DNA
template
(zebrafish
gDNA)
Taq polymerase
22.5 5 5 4 4 7.5 2