~~1 (resource biotechnology) - ir.unimas.my evolution study of... · fungsi penting ribosom...
TRANSCRIPT
MOLECULAR EVOLUTION STUDY OF VERTEBRATE RPS27 HOMOLOGUES
Patrick Tiong Joon Kiat
Bachelor of Science with Honours (Resource Biotechnology) ~~1 2010T594
2010
Pusal Kbidmat umat Akuem ik: vERS1TI , 1ALAYSIA SAi.,AWAJ{
P.KHICMAT MAKLUMAT AKACEMIK
111111111 IINiiili III1III11 1000212417
MOLECULAR EVOLUTION STUDY OF VERTEBRATE RPS27 HOMOLOGUES
Patrick Tiong Joon Kiat
(19685)
A Thesis subrWtted in partial fulfillment of the requirements for the degree of Bachelor of Science with Honours
(Resource Biotechnology)
Faculty of Resource Science and Technology UNIVERSITI MALAYSIA SARAWAK
2010
I
I
Declaration
I declare that this thesis entitled "MOLECULAR EVOLUTION STUDY OF
VERTEBRATE RPS27 HOMOLOGUES" is the result of my own research except as cited
in the references. The thesis has not been accepted for any degree and is not concurrently
submitted in candidature of any other degree .
.' Signature .... .~.... ............. .. .
Name ...~'1..~ ...-p~ ..~.~~.... .....~. 9. ....'!.?~j....~~~....................Date
i
Acknowledgements
First and foremost, I would like to thank to my supervisor, Assc. Prof. Dr. Edmund
Sim who always willing to contribute his valuable time, advice and guidance on this
project. He taught us to work independently and trigger scientific thinking among us in this
research. Besides that, I would like to thank the Committee member and participant from
Basic Phylogenetic Workshop FSTS which was held on 24 February 2010. Dr Ramlah and
Dr Leaw who introduce some basic bioinfonnatics tools that widely used in phylogenetic
research and provides basic guidance on using the PAUP*4 beta software to conduct
phylogenetic analysis. Also, I would like to take this opportunity to thank to UNIMAS,
provides free access to some of the available journal and interschool loan service for
literature required in this project.
Other than supervisor and lecturers, I greatly like to deliver my special thanks and
appreciation to my best friend, Yii Ming Leong, for his spiritual support when I feeling
frustrated during completing this proJht. Without his encouragement, I would end up with
stress and might never able to finish this project.
Finally, the greatest honour I would like to delivers to my families and friends for
their understandings and supports upon completing this project.
11
I
P t hidrnat Maklumal ('0 1 . UNlVERSrTI MALAY ~ 1A.fU...W
Table of Contents
Page
DECLARATION
ACKNOWLEDGEMENTS ii
TABLE OF CONTENTS 111
LIST OF ABBREVIATIONS v
LIST OF TABLES Vl
LIST OF FIGURES Vll
ABSTRACT Ix
CHAPTER 1 INTRODUCTION
CHAPTER 2 LITERATURE REVIEW 4 I 2.1 The structure & function of RPS27 ribosomal gene 5
2.2 The homology of the RPS27 across taxa. 5 2.3 RPS27 gene isoform 5 2.4 RPS27L gene homologues 6 2.5 Early gene evolution in vertebrates 6 2.6 . Sequence Databank (Genbank) 7 2.7 Bioinformatics tool 8
2.7.1 Multiple Sequence "tignment 8 2.7.2 Modelling of the evolutionary nucleotides sequence 9 2.7.3 Phylogeny tree reconstruction 9
CHAPTER 3 MATERIALS AND METHODS 11 3.1 Data Mining 11
3.1.1 Entrez Search 11 3.1.2 Blast Search 12 3.1.3 Optimization of sequence search 13 3.1.4 Data confinement and validation 13
3.2 Multiple Sequence analysis 13 3.2.1 Intra-species Multiple Sequence Alignment 13 3.2.2 Inter-species Multiple Sequence Alignment 14
111
J
I
3.3 Protein secondary structure prediction 15 3.4 Model Selection 15 3.5 Phylogeny reconstruction 15
CHAPTER 4 RESULTS AND DISCUSSION 17 4.1 Homology relationship on RPS27 and its homologues sequences 17 4.2 The substitution patterns among RPS27 homologues 21 4.3 Evolution relationship with RPS27 protein functional properties 24 4.4 Conservation of RPS27 sequence at different regions in respects to protein 25
functionalities 4.5 Evolutionary history of RPS27 homologues inferred from phylogenetic tree 20
CHAPTER 5 CONCLUSION AND RECOMMENDATION 36
REFERENCES 38
APPENDIX A- F 41
IV
I
AIC
BIC
CI
dLRT
DNA
DOS
DT
GUI
HI
hLRT
MCMC
ML J MP
MPS-I
NCBI
NIH
NJ
PAUP*
RC
RI
RPS27
List of Abbreviations
Akaike information criteria
Bayesian information criteria
Consistency index
Dynamic likelihood ratio test
Deoxyribonucleic Acid
Disk Operating System
Decision theory method
Graphic User Interface
Homoplasy Index
Hierarchical likelihood ratio test
Markov chain Monte Carlo analyses
Maximum Likelihood
Maximum Parsimony
Metallopanstimulin-l
National Center for Biotechnology Information .~
National Institutes of Health
Neighbour-Joining
Phylogenetic Analysis Using Paximony
Rescaled Consistency Index
Retention Index
Ribosomal Protein Small Subunit 27
v
List of Tables
Table Page
Table 1 The finalised set of sequences and respective assigned annotation 18 after confinement of raw data sequence obtained from Entrez and Blastn.
I
Table 2 The transition and transversion rate generated from MEGA4. 22
I
VI
l. I
List of Figures
Figure
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Page
Ribbon illustration of paralogous sequence, RPS27 (left) and 20 RPS27L (right) constructed through 1-TASSER with C-score value of -0.82 and -0.92, respectively.
The conservation scores plotted against the nucleotides position 21 shows 51 sites are highly conserved (conservation score = 100) and 204 sites are variable (conservation score < 100) from a total of255 sites.
The conservation scores plotted against the amino acids 22 position shows 53 sites are highly conserved (conservation score = 100) and 31 sites are variable (conservation score < 100) from a total of 84 sites.
RPS27 homologues nucleotides alignment with single 26 translational amino acid shows highly conserved Cysteine amino acids residues (in black box) in Zinc Finger like structural domain.
RPS27 homolo:~s nuc1eotides alignment with single 28 translational amino acid shows well conserved basic clustered amino acid residues (blue box) and rather well conserved acidic (red box) and aliphatic (green box) amino acid residues.
Space filled and ribbon diagram representing the conserved 29
clusters of aliphatic and acidic amino acid residue that flanking basic amino acids residue which potentially function as nuclear localisation signal.
VII
Figure 6a Neighbour-Joining (NJ) tree analysis generated by using coding region of RPS27 mRNA homologue sequences with Branchiostoma be/cheri tsingtaunese as outgroup. The branch lengths indicate the numbers of change along each branch. The values indicate the bootstrapping with 1000 replicates.
32
Figure 6b Maximum Parsimony tree analysis (first tree) generated by using coding region of RPS27 mRNA homologue sequences with Branchiostoma be/cheri tsingtaunese as outgroup. The branch lengths indicate the numbers of change along each branch. The values indicate the bootstrapping with 1000 replicates.
33
Figure 6c Maximum Likelihood (ML) tree analysis generated from coding region of RPS27 mRNA homologue sequences with Branchiostoma be/cheri tsingtaunese as outgroup. The branch lengths indicate the number of changes along each branch. The branching values indicate the bootstrapping with 100 replicates.
34
Figure 6d Bayesian tree analysis generated by using coding region of RPS27 mRNA homologue sequences with Branchiostoma be/cheri tsingtaunese as outgroup. The branch lengths indicate the numbers of change along each branch. The branching values indicate the posterior probability.
35
V1ll
MOLECULAR EVOLUTION STUDY OF VERTEBRATE RPS27 HOMOLOGUES
Patrick Tiong Joon Kiat
Resource Biotechnology Faculty of Resource Science and Technology
University Malaysia Sarawak
ABSTRACT
RPS17 is a protein coding gene that constitutes one of the components in small ribosomal subunit and its housekeeping fun'i tion makes RPS27 gene likely to be expressed in every cells types of all living organisms. It has unique Zinc finger like structure that have been deduced potentially function to bind and interact with nucleic acids. In this study, multiple sequence aligrunent of protein coding sequence and phylogenetic analysis were carried out to delineate the substitution pattern and the evolutionary relationship of the RPS27 homologues sequence among vertebrates. High similarity from mUltiple sequence aligrunent and protein secondary structure of the RPS27 and RPS27L genes which located in different genome location have lead to the possibility that RPS27 have undergo gene duplication during divergence of the vertebrates species and both sharing orthologs and paralogs relationships. However, the exact divergence time of RPS27 homologue are not investigated in this study. Besides that, this research also revealed that RPS27 homologous are highly conserved in the amino acids sequence and therefore reflected strict selective pressure acting on the homologues sequences to maintain indigenous functionality of the protein. Further evolutionary functionality of the protein also have been investigated, suggesting that acidic and aliphatic amino acids residue flanking the clustered basic amino acid residues have evolved in the vertebrates RPS27 proteins may facilitate the appropriate orientation of the nucleic acids from interaction with the nuclear localisation signal towards Zinc finger structure ofRPS2 7 protein.
Key words: RPS27, MPS-I , molecular evolution, gene duplication, vertebrates, bioinfonnatics
ABSTRAK
RPS27 merupakan gen pengekod protein dan merupakan salah satu komponen membentuk ribosom subunit kecil. Fungsi penting ribosom menjadikan RPS27 gen mungkin disajikan dalam semua jenis sel-sel daripada setiap organisma hidup. RPS27 protein mempunyai struktur jari Zink yang unik membolehkan ia berpotensi berfungsi un/uk mengikat dan berinteraksi dengan asid nukleik. Dalam kajian ini, penjajaran (alignment) dan analisis filogenetik dilakukan pada wutan nukleotida pengekod protein untuk menggambarkan pola subsitusi dan hubungan evolusi homolo!!{JflPS27 pada vertebrata. Hasil daripada kajian ini telah membuktikan RPS27 dan RPS27L mempunyai identiti yang tinggi dalam segi penjajaran urutan protein dan juga struktur 3D protein sedangkan gen-gen ini didapati pada lokasi genom yang berbeza. Ini telah menimbulkan kemungkinan RPS27dan RPS27L gen adalah berasal daripada gen yang sama dan merupakan hasil duplikasi sejak dari pemesonganlperbezaan spesies vertebrata. Namun, masa perbezaan RPS27 homolog pada vertebrate tidak ditelitikan dalam analisis ini. Selain itu, kajian ini juga telah mendedahkan bahawa dalam urutan asid amino RPS27 homolog dikekalkan pada tahap yang tinggi dan ini juga mencerminkan wujudnya tekanan selektif yang ketat bertindak ke atas RPS27 homolog untuk mempertahankan fungsi protein. Evolusi pada fungsi protein homologs juga telah ditelitikan, menunjukkan bahawa asid amino berasid dan alifatik residu yang mengapit antara amino asid residu berbasic dijllmpai pada vertebrata RPS27 protein memlldahkan orientasi yang sesllai oleh asid nukleik dengan interaksi penyetempalon nuklear isyarat terhadap struktllr jari Zink pada RPS2 7protein.
ta Kunci: RPS27, MPS-I, evolusi molekul, duplikasi gene, vertebrata, bioinfonnatik
IX
1.0 INTRODUCTION
Recently, ribosomal protein small subunit 27, RPS27 gene has been intensively
sequenced among species and its usefulness has been proposed to be used as the molecular
I
markers for phylogenetic analysis (Manchado, Infante, Asensio, Canavate & Douglas,
2007). Besides that, the molecular functions of RPS27 genes such its physiological roles in
association with the established isoforms (Revenkova, Masson, Koncz, Asfar, lakoveleva
& Paszkowski 1999) or differential expression in respect with cancer development (Sim,
Toh & Tiong, 2008) have been widely studied. However, the evolutionary relationships of
RPS27 homologues among vertebrate species are yet to be well established.
Ribosomal protein small subunit 27, RPS27 gene is also known as metallopanstimulin-
I, MPS-l is a protein coding gene that encodes a component of eukaryotic 40S ribosome
which function as machinery of mRNA-directed protein biosynthesis. This gene may
function as mediator of cellular proliferation (Fernandez-Pol, Dennis, & Paul, 1993) and
may exhibit extra-ribosomal functions in certain species (Revenkova et aI., 1999). The
important role in housekeeping and additional extra-ribosomal function of RPS27 genes
indicate that this gene should be present and expressed continually in most of the cells in .. vertebrates. Thus, it should exist among the eukaryotes (Ma et aI., 2005).
Along with the determination of cDNA libraries and protein sequences have
implied increasing importance on information in molecular evolutionary relationship of
RPS27 gene. In years 2005, deduced RPS27 protein from amphioxus Branchiotoma
belcheri tsingtauense which has been claimed to be extant invertebrate of the most closely
ated to proximate ancestor of vertebrates was compared with its homologues among
vertebrate species (Ma, Zhang, Liu, Li & Xia, 2005). However, the great similarities in
I
RPS27 protein sequences exist among its homologues have proven this to be highly
conserved throughout evolution.
The investigation of RPS27 gene has further carried out by Manchado et al. (2007),
RPS27 cDNA gene sequences is compared between Solea senegalensis and Hippoglossus
hippoglossus, two commercially important flatfish species with a numbers of RPS27
homologues from marines species and the phylogenetic tree has been constructed for
species identification. However, the molecular evolution of RPS27 homologues has not
explained throughout the research.
Thus, in order to discern complete evolutionary relationship among vertebrates
RPS27 homologues, a proper comparative analysis which incorporates wider taxonomic
families are required. Even so, neither proper research is conducted with the aim to access
the molecular evolution nor genetic relationship of RPS27 gene phylogenetic tree is
available. Therefore, in this study, the genetic relationship of RPS27 homologues from
broader taxonomic families of vertebrate species was assessed.
2
1.1 OBJECTIVES
The objectives of this study are:
1. To compare genetic relationship, in terms of nucleotides similarities and
differences among RPS27 gene homologues.
2. To estimate the rate and investigate pattern of the nucleotides substitution
contribute to the evolutionary of RPS27 genes among vertebrates RPS27
sequences.
3. To deduce the orthology and paralogy of RPS27 homologues from the
phylogenetic tree constructed.
3
2.0 LITERATURE REVIEW
2.1 The structure & function of RPS27 ribosomal gene
RPS27 gene is ribosomal protein coding gene which is also known as
metallopanstimulin-I (MPS-I). The product of RPS27 gene constitutes as a component of
the small ribosomal subunit (40S) of the eukaryotic ribosome. Interestingly, this ribosomal
protein has an unusual structural domain of C4-type Zinc finger like motif which is
associates with zinc ion and potentially coordinates the interaction of zinc ions with
nucleic acids such as DNA and RNA. In the secondary and tertiary structure of the C4 type
zinc finger motif found in the ribosomal protein S27 shows that presentation of the tandem
repeats of the a-helix is able to bind with DNA to major groove. These Zinc finger-like
motif structure have shared similarities with those proteins found in the DNA binding such
as the transcription factor and protein involve in response to DNA injury, however the
specific roles of this structure in the ribosome are currently unavailable. Throughout the
studies, some of researchers have also suggested that this structure might highly associate
with the function such as nuclear localization signal (Ma et al., 2005), recruitment of ..' DNA-binding protein (Revenkova et al., 1999) and binding of mRNA during translation.
4
· ",
UNJVE r '
2.2 The homology of the RPS27 across taxa
The RPS27 gene is highly conserved throughout the evolutionary process. It has
been claimed that the zinc finger structure of the RPS27 protein are vestiges of the
evolution (Revenkova et al., 1999). This is consistent with the study by Ma et al. (2005)
revealing that the cDNA coded RPS27 protein from the cephalochordate amphioxus that
recognized as the extant invertebrates which is mostly related to the ancestors of
vertebrates has higher similarities among the higher eukaryotes across the taxa, The finding
of the comparative analysis on 12 of known S27 protein sequence has been shown that, the
S27 protein from the amphioxus (AmphiS27) shares it's homology with homologues in
vertebrates such as humans, Xenopus and fish ranging from 94-99%, whereas 84-94%
among the invertebrate homologues such as annelids, mollusks and crustaceans and 69
72% homology with other eukaryotes counterparts such as plant and yeasts (Ma, et al.,
2005). This have strengthen the fact that amphioxus as extant invertebrate that most closely
relates with common ancestor of the vertebrates and agreed by Holland and Shimeld
(2005). Therefore, the RPS27 gene sequence from the amphioxus implicates more reliable
outgroup during phylogenetic analysis.
2.3 RPS27 gene isoform
During the evolutionary process, accumulated mutation, transposition, duplication
and gene conversion are the major contribution which leads to genetic variation (Russell,
2006). In higher eukaryotes, there are a numbers of different copies of the RPS27 gene that
actively transcribed and translated into a functional protein known as isoforms
(Revenk.ova, et al. , 1999; Chan, Katsuyuki Suzuki, Olvera, & Wool, 1993) has been
5
detected located at the different location in the genome. These isofonn might derive from a
single gene. Some of the research has come out with the finding deduced that different
isofonn of this gene carries its own specific function beyond the protein synthesis. For
example, ARS27A gene is one of three RPS27 isofonn gene located in the genome of the
Arabidopsis thaliana which has been found that involves in the degradation of the mRNA
trigger by the genotoxic stress (Revenkova et ai., 1999).
2.4 RPS27L gene homologs
RPS27L also known as ribosomal RPS27 like protein is a novel protein from RPS27
gene family that yet to be well characterised (He & Sun, 2007). According to He and Sun
(2007), the RPS27L gene have shows great similarity with 96.3% identity with RPS27
protein in human although the RNA sequence are quite diversified. Meanwhile, Li and his
colleagues (2007) also have justified that both RPS27 and RPS27L are expressed in both
cancerous and normal cells. However, experimental evidences have shown that RPS27L
protein is a direct p53 inducible modulator which promotes cell apoptosis in the respond
toward genotoxic stress (He & Sun, 2007; Li et ai, 2007) .
• 2.5 Early gene evolution in vertebrates
Vertebrates are all organisms characterised with availability of backbones (Benton,
2007) and classified into different class namely Class Agnatha, Placodenni,
Chondrichthyes, Osteichthyes, Amphibia, Reptilia, Aves and Mammalia (Klappenbach,
). The evolutions of vertebrates from palaeontology data are likely to trance back that
oldest organism in ancient period as possible common ancestors. However, most of the
6
fossil evidences from many fossils specles are probably mlssmg and render it to be
impossible to unveil the entire vertebrate evolution (Benton, 2007). Meanwhile continuous
evolution in the molecular data also has been used as useful tool for the phylogenic
reconstruction. Two round genome wide duplication (2R) hypothesis is among the most
popular hypothesis which has been proposed as early vertebrates evolutionary occurred in
vertebrates species (Ohta, 1970).
2.6 Sequence Databank
The GenBank is one of the most popular sequence databank which is developing
National Institutes of Health (NIH) sequence database, where large amount of the DNA
sequence are deposited, annotated and readily retrievable at the National Center for
Biotechnology Information, NCBI website. (http://www.ncbi.nlm.nih.gov/GenbankJ). One
of the easiest ways to access this homologues gene sequence is through Homologene.
According to NCBI, the system such as Homologene which allows the automated detection
of homologues gene among the several completely sequenced eukaryotic genome. Other
than that, Entrez tools able is to provides a broad search to retrieval of DNA or protein
sequence or Medline reference relates to the molecular biology sequence database from a ..~
series of interconnected database throughout the Genbank (Mount, 2005). Therefore,
Entrez tool is able to fulfil the complete search throughout the database.
7
,
2.7 Bioinformatics tool
There are a numbers of the computation tools available online by which the
analysis can be done through the web server or downloadable from the website that aid the
various sequence and phylogeny analysis.
2.7.1 Multiple Sequence Alignment
Multiple sequence alignment is routinely carried out as a part of the phylogenetic
analysis. There are several package of computational tools widely used in performing
multiple sequence alignment which includes Clustal W & Clustal X (Larkin et al., 1997),
T-Coffee (Notredame, Higgins & Heringa, 2000), MAFFT (Katoh, Kuma, Toh & Miyata,
2005), and MUSCLE (Edgar, 2004). ClustalW and ClustalX is the oldest and widely used
program involves in the sequence analysis as it able to work in most types of PC platforms
(Larkin et ai, 2007). Clustal W program able to alignment the nucleotides or amino acids
sequence through global alignment method and this program calculate the best match each
of the sequence so that the identities, similarities and differences among the sequence can
be easily seen (http://www.ebi~.uk/2canJtutorials/nucleotide/clustalw2.html). Both
Clustal W and Clu tal X utilize pairwise progressive alignment method by which the
alignment begins with the most closely related groups of the sequence and build upon the
alignment using the rest of the sequence (Phillips, 2006).
8
1.7.1. Modelling of the evolutionary nucleotides sequence
The nucleotides substitutions of the gene are vary considerably. According to Li
and Graur (1991), the variation of the substitution rates among the gene are due to the rate
of the mutation and the probability of fixation of a mutation. Therefore, it is important to
estimates the rates of nucleotides substitution before the construction of the phylogenetic
analysis. The jModelTest is a tool by which enable researcher to carry out the statistical
selection of best-fit model of the nucleotides substitution. It was released in year 2008,
which superceded the previous version Modeltest by which implement with larger set of
the models (up to 88 models) through five model selection strategies including hierarchical
and dynamical likelihood ratio tests (hLRT and dLRT), Akaike and Bayesian infonnation
criteria (AIC and BIC), and a decision theory method (DT) (Guindon & Gascuel, 2003).
'1..7.3 Phylogenetic tree reconstruction
Phylogenetics analysis is a research field of study involves in the reconstruction of
the evolutionary process of organism or genes. In the evolutionary studies, the systematic
i ultimately based on the notion oihomology. The concept of the homology defined to be
different from the similarity by which the homologues are similar sequence in two
different organisms which has been derived from a common ancestor (Phillips, 2006).
Therefore, the gene homologues has to be detennined prior to the phylogenetic analysis
since during the construction of gene phylogeny the criteria of similarity is used. The
evolutionary relationships are presented in the fonn of bifurcating networks called
ogram (Phillips, 2006) or trees (David, 2005). The phylogenetics analysis orders of
the sequences into interested sets based on the pattern of the similarity of among the gene
9
ences family. In others words, the sequences that has highest similarities will be
clustered as neighbour sequence under the common branch in front them. During the
n:eonstruction of gene phylogenies, in some instance it will not follows the track of the
species phylogeny. This may attributes to several other genetic mechanisms such as the
horizontal gene transfer, introgression, and ancestral polymorphism (Phillips, 2006).
Phylogenetic Analysis Using Parsimony, PAUP is the most widely used program in
the inference of evolutionary trees. The Latest version of PAUP is PAUP* version 4.0
which has been upgraded and able to works with several common platforms such as
Macintosh, Windows UNIX/VMS and even in DOS based format.
10
.1.1
procedure
3.1.2
MllJ:Iank
MATERIALS AND METHODS
Data Mining
Data mining of RPS27 homologous were carried out by manipulating two major
sequence search engine provided by GenBank (http://www.ncbi.nlm.nih.govl) that is
Bntrez and Blast alignment tool.
Entrez Search
Firstly, the nucleotide database was selected from the search scroll down menu of
the Entrez search engine. The words "RPS27" was then typed into Entrez query box and
click Go button. Further constraint towards the Entrez targeted database search was
introduced by manipulation of Boolean "AND "vertebrate"[porgn:_txid7742]" which was
entered into the Entrez query box after the word "RPS27". The results of the Entrez search
were saved as collection in the GenBank and named as "Entrez RPS27 nuc1eotides". The
were repeated by replacing the search database to protein database for
nucleotide search and again the nu!& tide search database were then saved as collection
named "Entrez RPS27 proteins".
Blast Search
On the other hand, Blast search was executed from an RPS27 nuc1eotides sequence
accession number: AY168455) from Amphioxus Branchiostoma belcheri
lSingtauense which is the most approximate to the ancestor of vertebrates (Holland et al.,
11
..
200S' Ma et ai., 2005) as Blastn query sequence. Before the Blastn alignment executed, the
Blutn default searching preferences were used except the "Choose search set" and "Max
sequences" sections were altered according to parameter listed below.
Choose search set: Non redundant nuc1eotides sequence in vertebrates (taxid: 7742)
excluding Annelida (taxid:6340), Plants (taxid:3193), Fungi (taxid:4751), Synthetic
Construct (taxid:32630) and Crustaceans (taxid:6657) .
Max target sequences: 20000
A rough sequences confinement were first carried out manually based on the four
criteria, Max identity (>70%), Coverage (>80%), Expected value «r40) and Max scores
(>150). Highly similar or related sequences listed in Blast alignment output were later
saved into collection in NCBI database and named as "Blast nuc1eotides sequence".
imilar search method was also applied to protein database by using Blastp algorithm .
•1.3 Optimization of sequence search
From the Entrez and Blast search, four sets of data collections were obtained. These
• include two nucleotide and two protein collections generated from each search method
(Entrez and Blastn, respectively). These sequence collections were then merged into two
major combined collections as nucleotides and proteins collections. The respective
nucleotides sequences which hyperlinked from each proteins sequences were collected and
ubsequently combined with the nucleotides collections. The individual redundant
sequences which obtained from both Blast and Entrez that having same accession number
eliminated automatically by the NCBI database.
12
eliminate unsuitable
8IId inter-species multiple sequence alignments.
I
sets of each species
_
Data confmement and validation
In the final stage of data mining process, all the collected sequences were confined
or less meaningful sequences such as genomic sequence and
,.,adogene sequences. The redundant sequences with identical nucleotides sequence were
eel based on intra-species multiple sequence alignment stated in the next section
(_am 3.2.1). The selection of sequence used in the analysis is based on the nucleotides
Multiple Sequence Alignment
Multiple sequence alignment was carried out in two distinct stages that is the intra
Intra-species Multiple Sequence Alignment
Intra-species Multiple Seq\\!nce Alignment was carried out to eliminate the
redundant nucleotides sequences and selection of representative sequences for species. The
sequences obtained from data mining were clustered into groups based on types of species.
Subsequently, multiple sequence alignment of RPS27 sequences (does not include RPS27L)
was carried out by using ClustalW 1.4 (Thompson, 1994)
implemented as accessory application in the BioEdit 7.0.5.3 (Hall, 1999). The nucleotides
PCilllces that consists identical sequences in coding region of the gene despite of different
eBank accession number are chosen as representative sequence for RPS27 homologues
13