selnikraj - basic bioinformatics
DESCRIPTION
the work contain the basic bioinformatics work it may useful to the bioinformatics freshersTRANSCRIPT
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
INTRODUCTION TO BIOLOGICAL DATABASE
A biological database is a large, organized body of persistent data, usually associated with computerized
software designed to update, query, and retrieve components of the data stored within the system. A
simple database might be a single file containing many records each if information, for example, a record
associated with a nucleotide sequence database typically contains information such as contact name, the
input sequence with a description of the type of molecule, scientific source organism from which it was
isolated, and literature citations associated with the sequence. For researchers to benefit from the data
stored in a database, two additional requirements must be met.
• Easy access to the information.
• A method for extracting only that information needed to answer a specific biological question.
Currently, a lot of Bioinformatics work is concerned with technology of databases. These bases include
both “Public repositories” of gene data bank like Gen Bank or the Protein Data Bank (PDB) and the
private databases like those used in gene mapping projects or those held by Biotech companies.
Making such databases accessible via open standards like the web is very important since consumers of
Bioinformatics data use a range of computer platforms from the more powerful and forbidding UNIX
boxes favored by the developers and curators to the far friendlier MACS often found populating the labs
of computer biologists.
RNA and DNA or the Proteins store the hereditary information about an organism. These macromolecules
have a fixed structure, which can be analyzed by the biologist with the help of Bioinformatics tools and
databases.
A few popular databases are Gen Bank from NCBI (National Center for Biotechnology Information),
SWISS PROT from the Swiss Institute of Bioinformatics and PIR (Protein Information Resources).
SeLnIkRaJ http://www.selnikraj.110mb.com
SELECTION OF NUCLEOTIDES
Aim:
To search for nucleotide sequence in NCBI.
Procedure:
Go to Google and go to NCBI Home Page.
Choose Nucleotide
The drop down menu in NCBI shows several options. Choose Nucleotide. It has options like Pubmed,
Protein, Structure, Genome, Books, 3D domains, Domains, etc.
Fig 1 – Home page of NCBI
Search Window
Type name of the gene, gene ID, protein product or name of the disease. NCBI offers several search
options. For now type frog and set go. (According to the title chosen)
frog (according to the title chosen)
The Nucleotide database has been recently divided into three databases:
• Core nucleotide: It contains all nucleotide records that are not in EST and GSS
• EST: Expressed Sequence Tags which contains only EST records.
• GSS: Genome Survey Sequence which contains only GSS records.
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com Under the name Homosapiens there are 252 core nucleotides, 312 EST (Expressed Sequence Tags)
records. Next select FASTA under display dropdown menu and let go.
Fig 2 – NCBI Nucleotide
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com Select a nucleotide sequence, 1YVP_H in FASTA format that appears. Select the sequence along with
the Fasta description and save it in a notepad.
.
Fig 3 – Fasta format of Sequence
This particular sequence is taken for BLAST search and the result is obtained.
Result:
The nucleotide sequence for frog is searched and reported
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
SEQUENCE SIMILARTITY SEARCHING
Introduction
Sequence alignment provides a powerful way to compare Novel sequence with previously characterized
genes and proteins. Both functional and evolutionary information can be inferred from well-designed
queries and alignments. BLAST (Basic Local Alignment Tool) provides a method for rapid searching of
nucleotides and proteins. BLAST algorithm defects local as well as global alignment, regions of similarity
embedded in otherwise unrelated proteins could be detected. Both types of similarity may provide
important clues to function of uncharacterized proteins.
BLAST program are set of sequence comparison algorithms introduced in 1990 that are used to search
sequence database for optimal local alignment to a query. The BLAST programs improved the overall
speed of searches, while retaining good sensitivity by breaking the query and database sequence into
fragments and seeking matches between fragments.
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
SELECTION OF PROTEINS
Aim: To select a protein of interest and note down the biological features of that protein.
Procedure:
Step 1: Go to google search engine.
Step 2: Enter protein name.
Step 3: Note down the biological significance of the protein.
Step 4: Close the window.
catalase
Catalase was first noticed as a substance in 1811 when Louis Jacques Thénard, who discovered
H2O2 (hydrogen peroxide), suggested that the breakdown of H2O2 is caused by a substance".
Introduction
In 1900 Oscar Loew was the first to give it the name catalase, and found its presence in many
plants and animals[6]. In 1937 catalase from beef liver was crystallised by James B. Sumner [7] and the
molecular weight worked out in 1938[8]. In 1969 the amino acid sequence of bovine catalase was worked
out[9]. Then in 1981, the 3D structure of the protein was revealed[10].
Action of catalase
The reaction of catalase in the decomposition of hydrogen peroxide is:
2 H2O2 → 2 H2O + O2[11]
In microbiology, the catalase test is used to differentiate between bacterial species in the lab.[1] The test is
done by placing a drop of hydrogen peroxide on a microscope slide. Using an applicator stick, a scientist
touches the colony and then smears a sample into the hydrogen peroxide drop. If bubbles or froth form,
the organism is said to be catalase-positive; if not, the organism is catalase-negative.[2] This test is
particularly useful in distinguishing staphylococci and micrococci, which are catalase-positive, from
streptococci and enterococci, which are catalase-negative.[3] While the catalase test alone cannot identify
a particular organism, combined with other tests it can aid diagnosis. The presence of catalase in bacterial
cells depends on both the growth condition and the medium used to grow the cells.
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-4,, catalse 3d structure
Moloecular mechanism
While complete mechanism of catalase is not currently known, the reaction is believed to occur in
two stages:
H2O2 + Fe(III)-E → H2O + O=Fe(IV)-E(.+)
H2O2 + O=Fe(IV)-E(.+) → H2O + Fe(III)-E + O2[12]
Here Fe()-E represents the iron centre of the heme group attached to the enzyme. Fe(IV)-E(.+) ís a
mesomeric form of Fe(V)-E, meaning that iron is not completely oxidized to +V but receives some
"supporting electron" from the heme ligand. This heme has to be drawn then als radical cation (.+).
As hydrogen peroxide enters the active site it interacts with the amino acids Asn147 (asparagine at
position 147) and His74, causing a proton (hydrogen ion) to transfer between the oxygen atoms. The free
oxygen atom coordinates, freeing the newly formed water molecule and Fe(IV)=O. Fe(IV)=O reacts with
a second hydrogen peroxide molecule to reform Fe(III)-E and produce water and oxygen.[12] The
reactivity of the iron center may be improved by the presence of the phenolate ligand of Tyr357 in the
fifth iron ligand, which can assist in the oxidation of the Fe(III) to Fe(IV). The efficiency of the reaction
may also be improved by the interactions of His74 and Asn147 with reaction intermediates.[12]
Generally, the rate of the reaction can be determined by the Michaelis-Menten equation.[4]
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Catalase can also oxidize different toxins, such as formaldehyde, formic acid, and alcohols. In doing so, it
uses hydrogen peroxide according to the following reaction:
H2O2 + H2R → 2H2O + R
Again, the exact mechanism of this reaction is not known.
Any heavy metal ion (such as copper cations in copper(II) sulfate) will act as a noncompetitive inhibitor
on catalase. Also, the poison cyanide is a competitive inhibitor of catalase, strongly binding to the heme of
catalase and stopping the enzyme's action.
Three-dimensional protein structures of the peroxidated catalase intermediates are available at the Protein
Data Bank. This enzyme is commonly used in laboratories as a tool for learning the effect of enzymes
upon reaction rates.
Cellular role
Hydrogen peroxide is a harmful by-product of many normal metabolic processes: To prevent
damage, it must be quickly converted into other, less dangerous substances. To this end, catalase is
frequently used by cells to rapidly catalyze the decomposition of hydrogen peroxide into less reactive
gaseous oxygen and water molecules.[13]
The true biological significance of catalase is not always straightforward to assess: mice genetically
engineered to lack catalase are phenotypically normal, indicating that this enzyme is dispensable in
animals under some conditions.[14]
Catalase works at an optimum temperature of 37 °C, which is approximately the temperature of the human
body.
Catalase is usually located in a cellular organelle called the peroxisome.[15] Peroxisomes in plant cells are
involved in photorespiration (the use of oxygen and production of carbon dioxide) and symbiotic nitrogen
fixation (the breaking apart of diatomic nitrogen (N2) to reactive nitrogen atoms).
Hydrogen peroxide is used as a potent antimicrobial agent when cells are infected with a pathogen.
Pathogens that are catalase positive, such as Mycobacterium tuberculosis, Legionella pneumophila, and
Campylobacter jejuni, make catalase in order to deactivate the peroxide radicals, thus allowing them to
survive unharmed within the host .[5]
Human application
Catalase is used in the food industry for removing hydrogen peroxide from milk prior to cheese
production.[7] Another use is in food wrappers, where it prevents food from oxidizing.[8] Catalase is also
used in the textile industry, removing hydrogen peroxide from fabrics to make sure the material is
peroxide-free.[9] A minor use is in contact lens hygiene - a few lens-cleaning products disinfect the lens
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
using a hydrogen peroxide solution; a solution containing catalase is then used to decompose the hydrogen
peroxide before the lens is used again.[19] Recently, catalase has also begun to be used in the aesthetics
industry. Several mask treatments combine the enzyme with hydrogen peroxide on the face with the intent
of increasing cellular oxygenation in the upper layers of the epidermis.
Reference:
http://en.wikipedia.org/wiki/catalase
Result:
The selected protein is catalase
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
NATIONAL CENTER FOR BIOTECHNOLOGY
INFORMATION (NCBI)
Aim:
To identify the features of NCBI server.
Description:
NCBI was established in 1988 as a national resource for molecular information.NCBI creates public
database, conduct research in computational biology disseminates biomedical information all for the better
understanding and molecular processes affecting human health and diseases.
NCBI includes the following
BLAST : Basic Local Alignment and Search Tool
PUBMED : Biomedical literature citations.
OMIM : Online Mendelian Inheritance In Man.
ENTREZ : Federated search Engine.
TAXONOMY : Taxonomic divisions.
HUMAN GENOME RESOURSES : Genomic data of Homo sapiens.
Procedure:
Step1: Go to://www.ncbi.nlm.nih.gov.
Step2: Go to the links of taxonomy, Pubmed, OMIM, BLAST and Entrez.
Step3: Note down the features of taxonomy, Pubmed, OMIM, BLAST and Entrez.
Step4: Close the window.
Blast
BLAST is an acronym for BASIC LOCAL ALIGNMENT AND SEARCH TOOL.This program compares
nucleotide or protein sequences to sequence databases and calculates the statistical significance of
matches.Sequences similarity is a powerful tool for identifying the unknowns in the sequence
world.BLAST is fast,reliable and flexible.
Protein
A protein-protein (BLAST P)is used to study sequences of protein type.
Position specific iterated and Pattern Mutated (PHI&PSI BLAST).
Search the conserved domain database.
Nucleotide
Quickly search for highly similar sequences.(mega blast)
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Quickly search for divergent sequences.
Blast n searches for short nearly exact matches.Discontinuous mega blast.
Pubmed
PUBMED is available in the NCBI Entrez retrieval system and was developed at NCBI.Pubmed provides
access to citations from biomedical literature.Publishers participating in Pubmed submit their citations to
NCBI prior to or the time of publication.Pubmed provides links to biological databases, resourses, search
tool.
OMIM
OMIM is acronym for Online Mendelian Inheritance in Man.OMIM is a catalogue of human gene and
genetic disorders and edited by Dr.Victor and his colleagues. OMIM contains copious link to MEDLINE
and sequence records in Entrez system.
OMIM is open to public but was intented primary to physicians and other professional concerned in
genetic disorders.
Entrez
It is the integrated, text based search and retrievals system used at NCBI for the major databases including
Pubmed, Nucleotide and amino acid sequences, Taxonomy, books, genome etc.
Taxonomy
The NCBI taxonomy databases contain the name of the organisms that are represented in genbank
databases with at least one nucleotide or protein sequence.
Human Genome Resources
Complete human genome sequencing was completed in 2003.NCBI released its first assembled view of
human genome which is not based on the finished and draft sequences deposited by the human genome
sequences center in Genbank but also from sequences contributed to GenBank.
SeLnIkRaJ http://www.selnikraj.110mb.com Output:
fig-5..ncbi home page
Result:
The features of NCBI such as BLAST, PUBMED, OMIM, ENTREZ, TAXONOMY and HUMAN
GENOME RESOURCES are noted.
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
EUROPEAN MOLECULAR BIOLOGY LABORATORY (EMBL)
Aim:
To retrieve sequence information for a nucleotide using EBI.
Description:
The European Bioinformatics institute (EBI) is a non- profit academic organization that forms part of the
European molecular biology laboratory (EMBL). The EBI is a center for research and services in
bioinformatics. The institute manages databases of biological data including nucleic acid, protein
sequences and macromolecular structures. The mission of the EBI is to ensure that the growing body of
information from molecular biology and genome research is placed in the public domain and is accessible
freely to all facets of the scientific community in ways that promote scientific progress.
Procedure:
Step 1: Go to http://www.ebi.ac.uk.
Step 2: Select databases from EBI home page.
Step 3: In dropdown list choose nucleotide and in search box.
In search box, type drosophila and click go.
Step 4: Select any one of the hits from the entries.
Step 5: Note down the sequences information
Step 6: Close the window. Output:
Fig-6.. home page of embl
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-7..search result of rabbit nucleotide in embl
Fig-8..search for rabbit in nucleotide sequence
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-9- general information about the rabbit nucleotides
Fig-10.. features of the rabbit nucleotides
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-11.. sequence of rabbit nucleotides
Result:
The nucleotide sequence for the species of rabbit is retrived using the embl
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
DNA DATABANK OF JAPAN (DDBJ)
Aim:
To perform editing, alignment and manipulation for protein and nucleic acid sequences.
Description:
DDBJ (DNA Data Bank of Japan) is a DNA databank. It is located at the National Institute of Genetics of
Japan. DDBJ has been functioning as the international nucleotide sequence database in collaboration with
EBI/EMBL and NCBI/ GenBank. DNA sequence records the organism evolution more directly than other
biological materials and thus, is invaluable not only for research in lifesciences, but human welfare in
general. The databases are, so to speak, a common treasure of human beings. The Center for Information
Biology at NIG was reorganized as the Center for Information Biology and DNA Data Bank of Japan
(CIB-DDBJ) in 2001. The new center is to play a major role in carrying out research in information
biology and to run DDBJ operation in the world. It is generally accepted that research in biology today
requires both computer and experimental equipment equally well. In particular, we must rely on
computers to analyze DNA sequence data accumulating at a remarkably rapid rate. Actually, this triggered
the birth and development of information biology. DDBJ is the sole DNA data bank in Japan, which is
officially certified to collect DNA sequences from researchers and to issue the internationally recognized
accession number to data submitters. We collect data mainly from Japanese researchers, but of course
accept data and issue the accession number to researchers in any other countries. Since we exchange the
collected data with EMBL/EBI and GenBank/NCBI on a daily basis, the three data banks share virtually
the same data at any given time.
Procedure:
Step 1: Go to DDBJ homepage at http://www.ddbj.nig.ac.jp
Step 2: Search DDBJ site for protein and search for CATALASE
Step 3: Go to the database & select UNIPROT browser
Step 4: Note down the statistics for casein
Step 5: Note down the features & references below it.
Step 6: Close the window.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-12..home page of DDBJ
Fig-13..search result for protein- catalase in DDBJ
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-14..genaral information about the entry of catalase protein
Fig-15..features &sequence information of catalase
Result:
The protein catalase is perform editing, alignment and manipulation for protein and nucleic acid
sequences
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
GENBANK
Aim:
To retrieve information about the given species.
Description:
Genbank nucleotide database is maintained by the National Centre for Bioinformatics
Information(NCBI),which is a part of National Institute Of Health(NIH),a federal agency of the US
Government.It can be accessed and searched through the enterz system at NCBI,or one can download the
entire databases as flat files.It is a part of International collobration each the EMBL,EBI,DDBJ. GenBank
is a collection of all publicaly available nucleotide sequences and is an open access database.Current
version available on the NCBI-FTP site.
Each Genbank entry includes a concise description of the sequence,the scientific name and taxonomy of
the source organism and a table of features that identifies coding regions and other sites of biological
significance,such as transcription units,sites of mutations or modifications and repeats.Protein translations
for coding regions are included in the feature table.Bibliographic references are included along with a link
to the Medline unique identifier for all published sequences.
Procedure:
Step 1: Go to http://www.ncbi.nlm.nih.gov/
Step 2: Select GenBank from the NCBI homepage.
Step 3: Type (the name of organism chosen) and click on Go to search for the nucleotide
sequence information.
Step 4: Note down the accession number, locus, and base pair, molecular type,
definition, source organism, organism classification, author, title, journal, university.
Step 5: Save the page.
Step 6: Close the window.
SeLnIkRaJ http://www.selnikraj.110mb.com Output:
fig-16.. home page of NCBI
Fig-17..search result for rabbit nucleotides
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-18..genbank format of rabbit nucleotides
Result:
The information about the species rabbit is retrived by the gen bank
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
SWISSPROT
Aim:
To retrieve sequence information about a given protein sequence about Swissprot database.
Description:
Swissprot is curated protein sequence database, which strives to provide a high level of annotation such
as description, function of protein , domain structure, post- translational modification , variants etc.,.
Minimal level of redundancy and high level of integration with other database.
Swissprot is a protein knowledge database established in 1986 and maintain collaboratively, since 1987,
by the department of medical biochemistry of university of Geneva and EMBL Data library. In Swissprot,
2 classes of data can be distinguished: core data and the annotation.
Procedure:
Step 1: Go to http://www.expasy.org/sprot
Step 2: Type the protein to search and click Go to in the Swissprot/TrEMBL homepage
Example: Protein: Estrogen Receptor site
Step 3: Select any one of the hits from the entries.
Step 4: Note down the sequence information like – Swiss-Prot id, protein name, gene
name and under comments note down the Sub cellular location, tissue specificity, functions,
catalytic activity, cofactor and similarity.
Step 5: Note down the feature table and sequence length.
Step 6: Click on the Swissprot id and save the Entrez Sequence.
Step 7: Note down the feature table and sequence length.
Step 8: Click on the Swissprot ID and save the Entrez page.
Step 9: Close the window.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-19..home page of swissprot
Fig-20..search result of catalase protein in swissprot
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-21..entry information of catalase protein
..cross refrence of the catalase protein
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-22..sequence information of catalase protein
Result:
The features of Swissprot such as Protein name and gene name, primary accession numbers are noted for
protein catalase and retrived
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
PROTEIN INFORMATION RESOURCE (PIR)
Aim:
To compute the molecular weight of each amino acid present in the given protein sequence.
Description:
The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC)., is
an integrated public bioinformatics resource to support genomic and proteomic research and scientific
studies .PIR was established in 1984 by the National Biomedical Research Foundation(NBRF) as a
resource to assist researchers in the identification and interpretation of protein sequences information.
Prior to that the NBRF compiled the first comprehensive collection of macromolecular sequences in the
Atlas of protein sequence and structure, published from 1965-1978 under the editorship of Margaret
O.Dayhoff. PIR has provided many protein databases and analysis tools freely accessible to the scientific
community, including the Protein Sequence Databasa (PSD) .In 2002, PIR along with its international
partners, EBI (European Bioinformatics Institute and SIB (Swiss Institute of Bioinformatics) calculations.
Today, PIR offers a wide variety of resources mainly oriented to assist the propagation and
standardization of protein annotation : PIRSF, iProClass, iProLINK.
Procedure:
Step1: Go to PIR home page at http://www.pir.georgekown.edu
Step2: From the search /analysis option, select composition/molecular weight
calculations.
Step3: Enter the protein sequence in FASTA format.
Step4: Note down the molecular weight composition of each residue.
Step5: Close the window.
Output:
]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-23..home page of PIR
Fig-24..search result for catalase protein in PIR
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-25. summary report of catalase protein
Fig-26.. PIRSF family hierarchy
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
UniProtKB ID UniProtKB Accession
Protein Name
A2F9K9_TRIVA A2F9K9Cyclic nucleotide-binding domain containing protein
Fig-27..uniprot entry
Fig-28..general information
Fig-29.. entry information
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-30..bibliography report of catalase protein
Fig-31..sequence information
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-32.. id mapping report
Fig-33.. taxonomic distribution
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-34.. phylogenitic pattern
Fig-35.. domain display
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-36.. query sequence
Fig-37..Allignment between two sequence
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Result:
molecular weight of each amino acid present in the given protein sequence of catalase is searched
by using the pir…
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
PROTEIN DATA BANK (PDB)
Aim:
To analyze the protein using protein data bank.
Description:
PDB is the structure databank. The PDB archive contains macromolecular structure data on proteins,
nucleic acids, protein-nucleic acid complexes and viruses. PDB data is freely available worldwide. A
variety of information available, including sequence 3D structure neighbors computed using various
methods etc.
PDB search can be performed using the output from one search as input. A search can return a single
structure or multiple structures.
The RCSB (research collaborator for structural bio informatics) is a non-profit consortium. The RCSB
PDB provides a variety of tools and resources for studying the structure of biological macromolecules and
their relationship to sequences, functions and disease.
Procedure:
Step1: Go to PDB home page. The home page is www.rcsb.org/pdb.
Step2: The protein to be searched is given in the search box. The structures related to the query protein
that are available in the PDB are listed in the next page.
Step3: The protein structure is then downloaded from the list in the PDB and further work is carried out.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-38.. Protein data bank home page..
Fig-39.. Search result of protein-catalase
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-40.. catalase derived information
fig-41.. 3d view of catalase protein
Result:
The protein catalase is analysed by using the protein data bank
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
CAMBRIDGE STRUCTURE DATABASE (CSD)
Aim:
To retrieve structural information regarding a protein using CSD.
Description:
CSD is the world repository of small molecule crystal structures the Cambridge structural database (CSD)
is the principal product of the CCDC. It is the central focus of the CSD system, which also comprises
software for Data base access, structure visualization and data analysis, and structural Knowledge bases
derived from the CSD.The CSD records bibliographic, Chemical and crystallographic information for:
organic molecules and Metal-organic compounds whose 3D structures have been determined using X-ray
diffraction and neuron diffraction. The Cambridge structural database(CSD) is distributed as part of the
CSD system which includes software for
• Search and information retrieval (ConQuest)
• Structure visualization (Mercury)
• Numerical analysis (Vista)
• Database creation (PreQuest)
Unlike the protein data bank, CSD does not store polypeptides and polysaccharides having more than 24
units as well as oligonucleotides and metals and alloys.
Deposit Structure
The Cambridge Crystallographic Data Centre accepts depositions of crystal Structure data from X-ray
and neuron diffraction studies. Data depositions with the CCDC are of two main types: Pre-publication:
structures are being submitted for publication in a journal. Private communication to the CSD: structures
are not intended for publication, but you wish them to be available to other scientists through the CSD.
Depositions that include the list of authors and the full journal reference only will be accepted. On
receipt, all depositions are stored In the secure and separate CCDC supplementary data archive. After
publication, Deposited structures will be processed to the main distributed CSD, and the original
Deposited data will be made freely available to all scientists on request, for research purpose.
Request Structure
Cambridge crystallographic data centre (CCDC) has provided copies of the supplementary data of
individual published structures for research purpose. Data arriving electronically at CCDC in CIF format
are held in CCDC supplementary data archive. After publication, these data are converted into CSD
entries by the addition of
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Bibliographic and chemical text, chemical structural data, and the results of crystal structure validation.
Each database entry is identified by a CSD reference code, comprising: six alphabetic characters (e.g.
BENZEN) to identify the chemical compound a possible two digit number (as in benzen05) to identify a
specific experimental determination of the crystal structure of BENZEN. A typical CSD entry would thus
comprise information categories such as
• Bibliography
• 2D chemical conductivity
• 3D molecular structure of a particular molecule.
• CSD entries have a total number of 366886 structures and 19000 publications.
Procedure:
Step1: Go to http://www.ccdc.com.ac.uk/
Step2: Note down the features of deposit structure and request structure in
Cambridge Structure Database.
Step3: Close the window.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-42..home page of cambridge structural database
Result:
The features of CSD were noted down.
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
PUBMED
Aim:
To perform literature search using pubmed.
Description:
Pubmed, available via the NCBI Entrez retrieval system, was developed by the
National center for biotechnology information (NCBI) at the national library of medicine
(NLM), located at the U.S. national institutes of health (NIH), Entrez is the text based search and retrieval
system used at NCBI for services including pubmed, nucleotide and protein sequences, protein structures,
complete genomes, taxonomy, OMIM, and many others. Pubmed provides access to citations from
biomedical literature. Linkout provides access to full-text articles at journal websites and other related web
resources; pubmed provides a batch citation matcher, which allows users to match their citations to
pubmed citations using bibliographic information such as journal, volume, issue, page number, and year.
Procedure:
Step 1: Go to http:// www.ncbi.nlm.nih.gov/entrez
Step 2: Search pubmed for salmonella
Step 3: Select any article from the result page and note down the Article name, journal name, author name,
pubmed ID
Step 4: Use ‘limit’ in pubmed by selecting limit option and select any
criteria for searching like search by author, search by journal
or date etc,
Step 5: Note down one or two articles based on the given criteria
Step 6: Close the window.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-43.. Pubmed home page:
Fig-44.. Search result in pubmed
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-45.. Summary of search result
Result:
The literature search is done for salmonella by using the pubmed
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
BLAST
Aim:
To compare the protein sequence of interest to the entries in the non-redundent database using blast-
blastp.
Description:
Searching the database will provide similar sequence, which is potentially related to our sequence from
the several thousands of sequences present in the database. Searching the basic option is to align the query
sequence of the subject sequence in the database. The computer programs that come for help are BLAST
and FASTA. BLAST-the basic local alignment search tool.
(http://www.ncbi.nlm.nih.gov/BLAST) searches for sequences (nucleotide or protein) in the given
database that in similar to a given sequence. It was developed by Altschul and co-workers at the NCBI.
Procedure:
Step 1: Go to the NCBI site and choose the protein from the PDB in the FASTA format.
Step 2: Select “BLASTp” option from NCBI-BLAST.
Step 3: Paste the sequence in the FASTA format.
Step 4: Enter BLAST and press format once the search is over.
Step 5: The results in description and graphical representation are displayed.
Step 6: Sequences similar to the query are arranged in decreasing order of e- value.
Step 7: Note down the best to homology search results of BLAST.
Step 8: Close the BLAST.
Output
CAA00094. Reports amylase [Aspergil...[gi:14646] BLink, Links
>gi|14646|emb|CAA00094.1| amylase [Aspergillus niger]
MTIFLFLAIFVATALAATPAEWRSQSIYFLLTDRFARTDNSTTASCDLSARVSH
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-46.. blast result query sequence
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-47.. sequence allignment
Result:
Protein sequence of amylase to the entries in the non-redundent database using blast-blastp is compared by
using blast
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
FASTA
Aim:
To align the protein sequence using FASTA tool.
Description:
FASTA is a DNA and Protein sequence alignment software package first described (as FASTP) by David
J. Lipman and William R. Pearson in 1985. The original FASTP program was designed for protein
sequence similarity searching. FASTA, described in 1988 (Improved Tools for Biological Sequence
Comparison) added the ability to do DNA: DNA searches, translated protein: DNA searches, and also
provided a more sophisticated shuffling program for evaluating statistical significance. There are several
programs in this package that allow the alignment of protein sequences and DNA sequences. FASTA is
pronounced "FAST-Aye", and stands for "FAST-All", because it works with any alphabet, an extension of
"FAST-P" (protein) and "FAST-N" (nucleotide) alignment. This reflects the fact that it can be used for a
fast protein comparison or a fast nucleotide comparison.
FASTA program acheives a high level of sensitivity for similarity searching at a high speed. This is
achieved by performing optimized searches for local alignments using a substitution matrix. The high
speed of this program is achieved by using the observed pattern of word hits to identify potential matches
before attempting the more time consuming optimized search. The trade-off between speed and sensitivity
is controlled by the ktup parameter, which specifies the size of the word. Increasing the ktup decreases the
number of background hits. Not every word hit is investigated but instead initially looks for segment’s
containing several nearby hits.
The FASTA package provides SSEARCH, an implementation of the optimal Smith-Waterman algorithm.
The FASTA package is available from fasta.bioch.virginia.edu.
The web-interface to submit sequences for running a search of the European Bioinformatics Institute
(EBI)'s online databases is also available called fasta33. The FASTA file format used as input for this
software is now largely used by other sequence database search tools (such as BLAST) and sequence
alignment programs (Clustal,T-Coffee, etc).
Procedure:
Step1: Go to http://www.ebi.ac.uk/fasta33
The page appears as below:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig 48 – Home page of Fasta
Step2: The above parameters are set.
Step3: The protein sequence is either pasted in the query box or it is uploaded through the ‘Browse…’
option.
Step4: Finally, ‘Run Fasta3’ is clicked, which displays the result window.
Output:
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
fig-49.. fasta summary table
Fig-50 fasta results
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-51 fasta scores
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-52 general information of fasta result
Fig-53 fasta sequence
Result:
The protein sequence of catalase is alligned using FASTA
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
CLUSTAL W
AIM:
To perform multiple sequence alignment of 10 blast hits using Clustal W
DESCRIPTION:
Multiple alignments of protein sequence and nucleotide sequence are important tool in studying the
sequences. The basic information provides identification of conserved regions. This is very useful in
designing experiments to test and modify the function of specific proteins predicting the function and
structure of proteins and in identifying new members of protein families.
Clustal W is a general purpose multiple sequence alignment program for DNA or proteins.
Julie.D.Thomson and Toby Gibson of European Molecular Biology Laboratory produced Clustal W. It
produces biologically meaningful multiple sequence alignments of divergent species. It calculates the best
match for the selected sequence , and lines them up so that the identities, similarities and differences can
be seen. Evolutionary relationship can be seen via viewing cladograms or phylograms.
PROCEDURE:
Step 1: The multiple sequences were uploaded in the fasta format.
Step 2: The options are left default.
Step 3: If desired sequence title may be entered.
Step 4: Press the run button Clustal w alignment.
Step 5: The plain text version of the alignment will be temporarily stored in all
file. ‘*’ Means that the particular residue into that column are identical in all sequence
alignment.‘:’ Means that conserved substitution have been observed according to colour table.‘.’
Means that semi conserved substitutions are observed.
Step 6: Save the Phylogenic trees.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-54.. multiple sequence allignment
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-55 multple aligned seqences
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-56 phylogram tree
Fig-57 cladogram tree view
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-58 sequence name view
- Result: the multiple sequence of 10 selected proteins is aligned using the clustrl w
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
CLUSTAL X
Aim:
To perform multiple sequence alignment of 10 Blast hits using Clustal X.
Description:
Clustal X is windows interface for Clustal W multiple sequence alignment programs. It provides an
integrated environment for performing multiple sequence and profile alignments and analyzing the
results.Clustal X is available for a number of different platforms including SUN solaris, IRIX5.3, Digital
UNIX, MS Windows, Linux ELF and Macintosh power Mac.
Procedure:
Step 1: Find homologous sequences using BLAST
Step 2: Open the CLUSTAL X program
Step 3: Submit the sequences
Step 4: Choose” Alignment” and do input the sequences
Step 5: Open the “NJ Plot” and input the sequences
Step 6: View the plot
Step 7: Save the result
Output:
fig-59
BEFORE ALLAINMENT
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-60 before alignment graph
Fig-61 after allignment
Fig-62 after alignment graph
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-63 tree structure
Result:
multiple sequence alignment of 10 Blast hits using Clustal X. is prepared
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
BIOEDIT
Aim:
To perform editing, alignment and manipulation for protein and nucleic acid sequences.
Description:
Bioedit is intuitive, menu-driven and graphical tool offers a graphical interface for users to run external
analysis program. It runs in windows 95/98/NT and it provides basic functions for protein and nucleic
sequence editing, alignment, manipulation and analysis. Don Gilbert modeled BioEdit. BioEdit uses
Clustal W for multiple sequence alignment and output are displayed in tree view. Dot plots and pairwise
alignments are possible. Sequences can easily be analyzed for composition with graphical output. Basic
manipulations in BioEdit are lock and unlock gaps, translate or reverse-translate, toggle translation,
nucleotide composition, complement for DNA or RNA sequence, creates plasmid from sequences,
restriction map, amino acid composition, hydrophobicity profiles. BioEdit allows the option for very
simple, optimal sequence alignments directly within an alignment document. BioEdit currently reads and
writes Genbank, Fasta, NBRF/PIR, Phylip 3.2 and Phylip 4 formats and reads Clustal W and GCG
formats. BioEdit currently supports the simultaneous editing of up to 50 documents. A main control form
contains menus to open documents, create new documents, set global options such as color tables, codon
table and analysis preferences and a window manager. Dynamic memory allocation with support for up to
20,000 sequences per document. Sequences up to 4.6 million bases have been tested successfully.
Procedure:
Load FASTA sequence by the menu file→Open
PAIRWISE ALIGNMENT
Step 1: Select two sequences
Step 2: Go to menu→ sequence→Pairwise alignment→Align two sequences (allow ends to slide).
Step 3: Alignment score & save the result.
Step 4: Close the window.
DOT PLOT
Step 1: Select two sequences
Step 2: Go to menu→ sequence→Dot plot (Pairwise comparison)
Step 3: Check do full shaded matrix & select BLOSUM 62 as similarity matrix & click ok.
Step 4: Click ok in plot matrix dialog box.
Step 5: From the plot, go to view→Data Examiner→Check color option.
Step 6: Save the result.
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Step 7: Close the window.
AMINO ACID COMPOSTION
Step 1: Select one protein sequence
Step 2: Go to sequence→Protein→Amino acid composition.
Step 3: note down the number of each Amino acid present in the protein.
Step 4: Save & Close the window.
HELICAL WHEEL DIAGRAM
Step 1: Select one protein sequence.
Step 2: Go to sequence→Protein→Helical wheel diagram.
Step 3: Save the diagram & close.
CONSENSUS SEQUENCE
Step 1: Select 2 protein sequence.
Step 2: Go to alignment→Create consensus sequence.
Step 3: Note down the consensus region.
Step 4: Save & Close the window.
CONSERVATION PLOT
Step 1: Select 5 protein sequence.
Step 2: Go to view→Conservation plot.
Step 3: Save the conserved sequence.
Step 4: Close the window.
PROTEIN DISTANCE
Step 1: Select five sequences.
Step 2: Go to accessory Application→protodist Protein distance matrix→ run application.
Step 3: Note down the distance among the sequences.
Step 4: close the window.
Output:
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-64 protein sequence
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-65 amino acid sequence
Fig-66 helical wheel diagram
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-67 hydrophobic moment
Fig-68 Kyte &dolite scale mean hydrophobicity
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-69 eisenbergscale mean hydrobhobicity
Fig-70 cornette scale mean hydrophobicity profile
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-71 parker hplc scale mean hydrophobicity profile
Fig-72 boyko scale mean hydrophobicity profile
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-73 hopp&woods scale hydrophobicity profile
Fig-74 eisenberg hydrophobic moment profile
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-75 mean einsenberg hydrophobic moment profile
Fig-76 dot plot pair wise allignment—sequence comparision
Result:
For the aligned sequences the Pairwise alignment, Dot plot, Amino acid composition, Consensus
sequences, conservation plot and Protein distance observed.
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
GENSCAN
Aim:
To predict the structure and function of a particular gene using GENSCAN.
Description:
GENSCAN is a general-purpose gene identification program which analyzes genomic DNA sequences
from a variety of organisms including human, other vertebrates, invertebrates and plants.
For each sequence, the program determines the most likely "parse" (gene structure) under a probabilistic
model of the gene structural and compositional properties of the genomic DNA for the given organism.
This set of exons/genes is then printed to an output file (the text output) together with the corresponding
predicted peptide sequences. A graphical (PostScript) output may also be created which displays the
location and DNA strand of each predicted exon.
Unlike the majority of other currently available gene prediction programs, the model treats the most
general case in which the sequence may contain no genes, one gene, or multiple genes on either or both
DNA strands and partial genes as well as complete genes are considered. The probabilistic model used by
GENSCAN accounts for many of the essential gene structural properties of genomic sequences, e.g.,
typical gene density, the typical number of exons per gene, the distribution of exon sizes for different
types of exon.
The novel features of the program include the capacity to predict multiple genes in a single sequence, to
deal with partial as well as complete genes. Genscan is shown to have substantially higher accuracy than
the existing methods when tested on standardized set of human and vertebrate genes with 75-80% of
exons identified exactly.
Procedure:
Step1: The nucleotide sequence of (organism chosen) is obtained from NCBI site.
Step2: Go to http://genes.mit.edu/gensacn.html.
Step3: Select the organism, sub-optimal exon cut-off, print options and paste the nucleotide
sequence in the space provided.
Step4: Click on run GENSCAN.
Step5: Note down the gene number, exon number, type, DNA strand, beginning and end of exon or
signal, length of the exon or signal and exon score.
Step6: Click on “here” to view the PDF image of the predicted gene.
Step7: Save the PDF image page and the output page.
Step8: Close the window.
SeLnIkRaJ http://www.selnikraj.110mb.com Output:
Fig 77 – Genscan output
Fig 78 – PDF image of predicted gene at 1.00
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-79 – PDF image of predicted gene at 0.50
Fig 80 – PDF image of predicted gene at 0.25
Result:
Thus the structure and function of gene is predicted using GENSCAN.
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
LITERATURE SEARCH Aim:
To distinguish the various advanced and preference search engines as Google and Google Scholar.
Procedure:
Google Search
Step1: Go to http://www.google.co.in.
Step2: Type the topic name for which information is to be collected.
Step3: A preferred result is viewed and saved as soft copy.
Google Scholar Search Step1: Go to http://www.scholar.google.com.
Step2: Type the topic name for which literature or narrow search is to be collected.
Step3: A preferred result is viewed and saved as a soft copy.
Output:
Fig-81 home page of google
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-82 search result of cancer
Fig-83 cancer refrences
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-84 home page of google scholar
Fig-85 search result of articles in google scholar
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-86 scholary article of cancer
Result:
The google search and google scholar is searched for the topic of cancer and with refrence the
articles
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
PREPARATION OF BIBLIOGRAPHY Aim:
To prepare the bibliography of an article on Apoptosis. Its significance in cancer and cancer therapy
Procedure:
Step 1: Go to http:// www.google schlolar.com
Step 2: Type the topic of interest in space provided and performs Google
advanced search.
Step 3: Select an article from the result hits provided.
Step 4: Note down the name of the author, title of the article, the name of the
Journal in which the article was published, Volume, Year of publication
and page number.
Step 5: Save the page.
Step 6: Close the window.
Output:
Title of article: Apoptosis. Its significance in cancer and cancer therapy
Author name: kerr JF,winterford cm, harmon BV
Department of Pathology, University of Queensland Medical School, Herston, Australia Cancer. 1994 Apr 15;73(8):2013-26
Summary: Apoptosis is a distinct mode of cell death that is responsible for deletion of cells in normal
tissues; it also occurs in specific pathologic contexts. Morphologically, it involves rapid condensation and
budding of the cell, with the formation of membrane-enclosed apoptotic bodies containing well-preserved
organelles, which are phagocytosed and digested by nearby resident cells. There is no associated
inflammation.
SeLnIkRaJ http://www.selnikraj.110mb.com
Fig-88 result of google scholar of cancer
Fig-89 Articles in cancer - apoptosis
Any queries mail me at [email protected]
SeLnIkRaJ http://www.selnikraj.110mb.com
Any queries mail me at [email protected]
Result:
The bibliography of an article on Apoptosis. Is prepared
Article details Article title : Autosomal Dominant Inheritance of Early-Onset Breast Cancer:
Implications for Risk Prediction Author : Claus, E. B. Risch, N. Thompson, W. D. Journal title : OBSTETRICAL AND GYNECOLOGICAL SURVEY Bibliographic details : 1994, VOL 49; NUMBER 6, pages 401 Publisher : WILLIAMS & WILKINS Country of publication USA Language : GERMANY Pricing : To buy the full text of this article you pay:
£15.00 copyright fee + service charge (from £7.65) + VAT, if applicable
Article details Article title : Apoptosis: Its Significance in Cancer and Cancer Therapy Author : Kerr, J. F. R. Winterford, C. M. Harmon, B. V. Journal title : CANCER -PHILADELPHIA- Bibliographic details :1994, VOL 73; NUMBER 8, pages 2013 Publisher : J B LIPPINCOTT CO Country of publication : USA Language : ENGLISH Pricing : To buy the full text of this article you pay:
£17.00 copyright fee + service charge (from £7.65) + VAT, if applicable.