selnikraj - basic bioinformatics

SeLnIkRaJ http://www.selnikraj.110mb.com

Any queries mail me at [email protected]

INTRODUCTION TO BIOLOGICAL DATABASE

A biological database is a large, organized body of persistent data, usually associated with computerized

software designed to update, query, and retrieve components of the data stored within the system. A

simple database might be a single file containing many records each if information, for example, a record

associated with a nucleotide sequence database typically contains information such as contact name, the

input sequence with a description of the type of molecule, scientific source organism from which it was

isolated, and literature citations associated with the sequence. For researchers to benefit from the data

stored in a database, two additional requirements must be met.

• Easy access to the information.

• A method for extracting only that information needed to answer a specific biological question.

Currently, a lot of Bioinformatics work is concerned with technology of databases. These bases include

both “Public repositories” of gene data bank like Gen Bank or the Protein Data Bank (PDB) and the

private databases like those used in gene mapping projects or those held by Biotech companies.

Making such databases accessible via open standards like the web is very important since consumers of

Bioinformatics data use a range of computer platforms from the more powerful and forbidding UNIX

boxes favored by the developers and curators to the far friendlier MACS often found populating the labs

of computer biologists.

RNA and DNA or the Proteins store the hereditary information about an organism. These macromolecules

have a fixed structure, which can be analyzed by the biologist with the help of Bioinformatics tools and

databases.

A few popular databases are Gen Bank from NCBI (National Center for Biotechnology Information),

SWISS PROT from the Swiss Institute of Bioinformatics and PIR (Protein Information Resources).


SELECTION OF NUCLEOTIDES

Aim:

To search for nucleotide sequence in NCBI.

Procedure:

Go to Google and go to NCBI Home Page.

Choose Nucleotide

The drop down menu in NCBI shows several options. Choose Nucleotide. It has options like Pubmed,

Protein, Structure, Genome, Books, 3D domains, Domains, etc.

Fig 1 – Home page of NCBI

Search Window

Type name of the gene, gene ID, protein product or name of the disease. NCBI offers several search

options. For now type frog and set go. (According to the title chosen)

frog (according to the title chosen)

The Nucleotide database has been recently divided into three databases:

• Core nucleotide: It contains all nucleotide records that are not in EST and GSS

• EST: Expressed Sequence Tags which contains only EST records.

• GSS: Genome Survey Sequence which contains only GSS records.


SeLnIkRaJ http://www.selnikraj.110mb.com Under the name Homosapiens there are 252 core nucleotides, 312 EST (Expressed Sequence Tags)

records. Next select FASTA under display dropdown menu and let go.

Fig 2 – NCBI Nucleotide


SeLnIkRaJ http://www.selnikraj.110mb.com Select a nucleotide sequence, 1YVP_H in FASTA format that appears. Select the sequence along with

the Fasta description and save it in a notepad.

.

Fig 3 – Fasta format of Sequence

This particular sequence is taken for BLAST search and the result is obtained.

Result:

The nucleotide sequence for frog is searched and reported




SEQUENCE SIMILARTITY SEARCHING

Introduction

Sequence alignment provides a powerful way to compare Novel sequence with previously characterized

genes and proteins. Both functional and evolutionary information can be inferred from well-designed

queries and alignments. BLAST (Basic Local Alignment Tool) provides a method for rapid searching of

nucleotides and proteins. BLAST algorithm defects local as well as global alignment, regions of similarity

embedded in otherwise unrelated proteins could be detected. Both types of similarity may provide

important clues to function of uncharacterized proteins.

BLAST program are set of sequence comparison algorithms introduced in 1990 that are used to search

sequence database for optimal local alignment to a query. The BLAST programs improved the overall

speed of searches, while retaining good sensitivity by breaking the query and database sequence into

fragments and seeking matches between fragments.



SELECTION OF PROTEINS

Aim: To select a protein of interest and note down the biological features of that protein.

Procedure:

Step 1: Go to google search engine.

Step 2: Enter protein name.

Step 3: Note down the biological significance of the protein.

Step 4: Close the window.

catalase

Catalase was first noticed as a substance in 1811 when Louis Jacques Thénard, who discovered

H2O2 (hydrogen peroxide), suggested that the breakdown of H2O2 is caused by a substance".

Introduction

In 1900 Oscar Loew was the first to give it the name catalase, and found its presence in many

plants and animals[6]. In 1937 catalase from beef liver was crystallised by James B. Sumner [7] and the

molecular weight worked out in 1938[8]. In 1969 the amino acid sequence of bovine catalase was worked

out[9]. Then in 1981, the 3D structure of the protein was revealed[10].

Action of catalase

The reaction of catalase in the decomposition of hydrogen peroxide is:

2 H2O2 → 2 H2O + O2[11]

In microbiology, the catalase test is used to differentiate between bacterial species in the lab.[1] The test is

done by placing a drop of hydrogen peroxide on a microscope slide. Using an applicator stick, a scientist

touches the colony and then smears a sample into the hydrogen peroxide drop. If bubbles or froth form,

the organism is said to be catalase-positive; if not, the organism is catalase-negative.[2] This test is

particularly useful in distinguishing staphylococci and micrococci, which are catalase-positive, from

streptococci and enterococci, which are catalase-negative.[3] While the catalase test alone cannot identify

a particular organism, combined with other tests it can aid diagnosis. The presence of catalase in bacterial

cells depends on both the growth condition and the medium used to grow the cells.


Fig-4,, catalse 3d structure

Moloecular mechanism

While complete mechanism of catalase is not currently known, the reaction is believed to occur in

two stages:

H2O2 + Fe(III)-E → H2O + O=Fe(IV)-E(.+)

H2O2 + O=Fe(IV)-E(.+) → H2O + Fe(III)-E + O2[12]

Here Fe()-E represents the iron centre of the heme group attached to the enzyme. Fe(IV)-E(.+) ís a

mesomeric form of Fe(V)-E, meaning that iron is not completely oxidized to +V but receives some

"supporting electron" from the heme ligand. This heme has to be drawn then als radical cation (.+).

As hydrogen peroxide enters the active site it interacts with the amino acids Asn147 (asparagine at

position 147) and His74, causing a proton (hydrogen ion) to transfer between the oxygen atoms. The free

oxygen atom coordinates, freeing the newly formed water molecule and Fe(IV)=O. Fe(IV)=O reacts with

a second hydrogen peroxide molecule to reform Fe(III)-E and produce water and oxygen.[12] The

reactivity of the iron center may be improved by the presence of the phenolate ligand of Tyr357 in the

fifth iron ligand, which can assist in the oxidation of the Fe(III) to Fe(IV). The efficiency of the reaction

may also be improved by the interactions of His74 and Asn147 with reaction intermediates.[12]

Generally, the rate of the reaction can be determined by the Michaelis-Menten equation.[4]




Catalase can also oxidize different toxins, such as formaldehyde, formic acid, and alcohols. In doing so, it

uses hydrogen peroxide according to the following reaction:

H2O2 + H2R → 2H2O + R

Again, the exact mechanism of this reaction is not known.

Any heavy metal ion (such as copper cations in copper(II) sulfate) will act as a noncompetitive inhibitor

on catalase. Also, the poison cyanide is a competitive inhibitor of catalase, strongly binding to the heme of

catalase and stopping the enzyme's action.

Three-dimensional protein structures of the peroxidated catalase intermediates are available at the Protein

Data Bank. This enzyme is commonly used in laboratories as a tool for learning the effect of enzymes

upon reaction rates.

Cellular role

Hydrogen peroxide is a harmful by-product of many normal metabolic processes: To prevent

damage, it must be quickly converted into other, less dangerous substances. To this end, catalase is

frequently used by cells to rapidly catalyze the decomposition of hydrogen peroxide into less reactive

gaseous oxygen and water molecules.[13]

The true biological significance of catalase is not always straightforward to assess: mice genetically

engineered to lack catalase are phenotypically normal, indicating that this enzyme is dispensable in

animals under some conditions.[14]

Catalase works at an optimum temperature of 37 °C, which is approximately the temperature of the human

body.

Catalase is usually located in a cellular organelle called the peroxisome.[15] Peroxisomes in plant cells are

involved in photorespiration (the use of oxygen and production of carbon dioxide) and symbiotic nitrogen

fixation (the breaking apart of diatomic nitrogen (N2) to reactive nitrogen atoms).

Hydrogen peroxide is used as a potent antimicrobial agent when cells are infected with a pathogen.

Pathogens that are catalase positive, such as Mycobacterium tuberculosis, Legionella pneumophila, and

Campylobacter jejuni, make catalase in order to deactivate the peroxide radicals, thus allowing them to

survive unharmed within the host .[5]

Human application

Catalase is used in the food industry for removing hydrogen peroxide from milk prior to cheese

production.[7] Another use is in food wrappers, where it prevents food from oxidizing.[8] Catalase is also

used in the textile industry, removing hydrogen peroxide from fabrics to make sure the material is

peroxide-free.[9] A minor use is in contact lens hygiene - a few lens-cleaning products disinfect the lens



using a hydrogen peroxide solution; a solution containing catalase is then used to decompose the hydrogen

peroxide before the lens is used again.[19] Recently, catalase has also begun to be used in the aesthetics

industry. Several mask treatments combine the enzyme with hydrogen peroxide on the face with the intent

of increasing cellular oxygenation in the upper layers of the epidermis.

Reference:

http://en.wikipedia.org/wiki/catalase

Result:

The selected protein is catalase



NATIONAL CENTER FOR BIOTECHNOLOGY

INFORMATION (NCBI)

Aim:

To identify the features of NCBI server.

Description:

NCBI was established in 1988 as a national resource for molecular information.NCBI creates public

database, conduct research in computational biology disseminates biomedical information all for the better

understanding and molecular processes affecting human health and diseases.

NCBI includes the following

BLAST : Basic Local Alignment and Search Tool

PUBMED : Biomedical literature citations.

OMIM : Online Mendelian Inheritance In Man.

ENTREZ : Federated search Engine.

TAXONOMY : Taxonomic divisions.

HUMAN GENOME RESOURSES : Genomic data of Homo sapiens.

Procedure:

Step1: Go to://www.ncbi.nlm.nih.gov.

Step2: Go to the links of taxonomy, Pubmed, OMIM, BLAST and Entrez.

Step3: Note down the features of taxonomy, Pubmed, OMIM, BLAST and Entrez.

Step4: Close the window.

Blast

BLAST is an acronym for BASIC LOCAL ALIGNMENT AND SEARCH TOOL.This program compares

nucleotide or protein sequences to sequence databases and calculates the statistical significance of

matches.Sequences similarity is a powerful tool for identifying the unknowns in the sequence

world.BLAST is fast,reliable and flexible.

Protein

A protein-protein (BLAST P)is used to study sequences of protein type.

Position specific iterated and Pattern Mutated (PHI&PSI BLAST).

Search the conserved domain database.

Nucleotide

Quickly search for highly similar sequences.(mega blast)



Quickly search for divergent sequences.

Blast n searches for short nearly exact matches.Discontinuous mega blast.

Pubmed

PUBMED is available in the NCBI Entrez retrieval system and was developed at NCBI.Pubmed provides

access to citations from biomedical literature.Publishers participating in Pubmed submit their citations to

NCBI prior to or the time of publication.Pubmed provides links to biological databases, resourses, search

tool.

OMIM

OMIM is acronym for Online Mendelian Inheritance in Man.OMIM is a catalogue of human gene and

genetic disorders and edited by Dr.Victor and his colleagues. OMIM contains copious link to MEDLINE

and sequence records in Entrez system.

OMIM is open to public but was intented primary to physicians and other professional concerned in

genetic disorders.

Entrez

It is the integrated, text based search and retrievals system used at NCBI for the major databases including

Pubmed, Nucleotide and amino acid sequences, Taxonomy, books, genome etc.

Taxonomy

The NCBI taxonomy databases contain the name of the organisms that are represented in genbank

databases with at least one nucleotide or protein sequence.

Human Genome Resources

Complete human genome sequencing was completed in 2003.NCBI released its first assembled view of

human genome which is not based on the finished and draft sequences deposited by the human genome

sequences center in Genbank but also from sequences contributed to GenBank.

SeLnIkRaJ http://www.selnikraj.110mb.com Output:

fig-5..ncbi home page

Result:

The features of NCBI such as BLAST, PUBMED, OMIM, ENTREZ, TAXONOMY and HUMAN

GENOME RESOURCES are noted.



EUROPEAN MOLECULAR BIOLOGY LABORATORY (EMBL)

Aim:

To retrieve sequence information for a nucleotide using EBI.

Description:

The European Bioinformatics institute (EBI) is a non- profit academic organization that forms part of the

European molecular biology laboratory (EMBL). The EBI is a center for research and services in

bioinformatics. The institute manages databases of biological data including nucleic acid, protein

sequences and macromolecular structures. The mission of the EBI is to ensure that the growing body of

information from molecular biology and genome research is placed in the public domain and is accessible

freely to all facets of the scientific community in ways that promote scientific progress.

Procedure:

Step 1: Go to http://www.ebi.ac.uk.

Step 2: Select databases from EBI home page.

Step 3: In dropdown list choose nucleotide and in search box.

In search box, type drosophila and click go.

Step 4: Select any one of the hits from the entries.

Step 5: Note down the sequences information

Step 6: Close the window. Output:

Fig-6.. home page of embl


http://www.ebi.ac.uk/


Fig-7..search result of rabbit nucleotide in embl

Fig-8..search for rabbit in nucleotide sequence



Fig-9- general information about the rabbit nucleotides

Fig-10.. features of the rabbit nucleotides



Fig-11.. sequence of rabbit nucleotides

Result:

The nucleotide sequence for the species of rabbit is retrived using the embl




DNA DATABANK OF JAPAN (DDBJ)

Aim:

To perform editing, alignment and manipulation for protein and nucleic acid sequences.

Description:

DDBJ (DNA Data Bank of Japan) is a DNA databank. It is located at the National Institute of Genetics of

Japan. DDBJ has been functioning as the international nucleotide sequence database in collaboration with

EBI/EMBL and NCBI/ GenBank. DNA sequence records the organism evolution more directly than other

biological materials and thus, is invaluable not only for research in lifesciences, but human welfare in

general. The databases are, so to speak, a common treasure of human beings. The Center for Information

Biology at NIG was reorganized as the Center for Information Biology and DNA Data Bank of Japan

(CIB-DDBJ) in 2001. The new center is to play a major role in carrying out research in information

biology and to run DDBJ operation in the world. It is generally accepted that research in biology today

requires both computer and experimental equipment equally well. In particular, we must rely on

computers to analyze DNA sequence data accumulating at a remarkably rapid rate. Actually, this triggered

the birth and development of information biology. DDBJ is the sole DNA data bank in Japan, which is

officially certified to collect DNA sequences from researchers and to issue the internationally recognized

accession number to data submitters. We collect data mainly from Japanese researchers, but of course

accept data and issue the accession number to researchers in any other countries. Since we exchange the

collected data with EMBL/EBI and GenBank/NCBI on a daily basis, the three data banks share virtually

the same data at any given time.

Procedure:

Step 1: Go to DDBJ homepage at http://www.ddbj.nig.ac.jp

Step 2: Search DDBJ site for protein and search for CATALASE

Step 3: Go to the database & select UNIPROT browser

Step 4: Note down the statistics for casein

Step 5: Note down the features & references below it.


Output:

http://www.cib.nig.ac.jp/Welcome.html

http://www.ddbj.nig.ac.jp/


Fig-12..home page of DDBJ

Fig-13..search result for protein- catalase in DDBJ



Fig-14..genaral information about the entry of catalase protein

Fig-15..features &sequence information of catalase

Result:

The protein catalase is perform editing, alignment and manipulation for protein and nucleic acid

sequences




GENBANK

Aim:

To retrieve information about the given species.

Description:

Genbank nucleotide database is maintained by the National Centre for Bioinformatics

Information(NCBI),which is a part of National Institute Of Health(NIH),a federal agency of the US

Government.It can be accessed and searched through the enterz system at NCBI,or one can download the

entire databases as flat files.It is a part of International collobration each the EMBL,EBI,DDBJ. GenBank

is a collection of all publicaly available nucleotide sequences and is an open access database.Current

version available on the NCBI-FTP site.

Each Genbank entry includes a concise description of the sequence,the scientific name and taxonomy of

the source organism and a table of features that identifies coding regions and other sites of biological

significance,such as transcription units,sites of mutations or modifications and repeats.Protein translations

for coding regions are included in the feature table.Bibliographic references are included along with a link

to the Medline unique identifier for all published sequences.

Procedure:

Step 1: Go to http://www.ncbi.nlm.nih.gov/

Step 2: Select GenBank from the NCBI homepage.

Step 3: Type (the name of organism chosen) and click on Go to search for the nucleotide

sequence information.

Step 4: Note down the accession number, locus, and base pair, molecular type,

definition, source organism, organism classification, author, title, journal, university.

Step 5: Save the page.



fig-16.. home page of NCBI

Fig-17..search result for rabbit nucleotides



Fig-18..genbank format of rabbit nucleotides

Result:

The information about the species rabbit is retrived by the gen bank




SWISSPROT

Aim:

To retrieve sequence information about a given protein sequence about Swissprot database.

Description:

Swissprot is curated protein sequence database, which strives to provide a high level of annotation such

as description, function of protein , domain structure, post- translational modification , variants etc.,.

Minimal level of redundancy and high level of integration with other database.

Swissprot is a protein knowledge database established in 1986 and maintain collaboratively, since 1987,

by the department of medical biochemistry of university of Geneva and EMBL Data library. In Swissprot,

2 classes of data can be distinguished: core data and the annotation.

Procedure:

Step 1: Go to http://www.expasy.org/sprot

Step 2: Type the protein to search and click Go to in the Swissprot/TrEMBL homepage

Example: Protein: Estrogen Receptor site

Step 3: Select any one of the hits from the entries.

Step 4: Note down the sequence information like – Swiss-Prot id, protein name, gene

name and under comments note down the Sub cellular location, tissue specificity, functions,

catalytic activity, cofactor and similarity.

Step 5: Note down the feature table and sequence length.

Step 6: Click on the Swissprot id and save the Entrez Sequence.

Step 7: Note down the feature table and sequence length.

Step 8: Click on the Swissprot ID and save the Entrez page.


Output:


Fig-19..home page of swissprot

Fig-20..search result of catalase protein in swissprot



Fig-21..entry information of catalase protein

..cross refrence of the catalase protein



Fig-22..sequence information of catalase protein

Result:

The features of Swissprot such as Protein name and gene name, primary accession numbers are noted for

protein catalase and retrived




PROTEIN INFORMATION RESOURCE (PIR)

Aim:

To compute the molecular weight of each amino acid present in the given protein sequence.

Description:

The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC)., is

an integrated public bioinformatics resource to support genomic and proteomic research and scientific

studies .PIR was established in 1984 by the National Biomedical Research Foundation(NBRF) as a

resource to assist researchers in the identification and interpretation of protein sequences information.

Prior to that the NBRF compiled the first comprehensive collection of macromolecular sequences in the

Atlas of protein sequence and structure, published from 1965-1978 under the editorship of Margaret

O.Dayhoff. PIR has provided many protein databases and analysis tools freely accessible to the scientific

community, including the Protein Sequence Databasa (PSD) .In 2002, PIR along with its international

partners, EBI (European Bioinformatics Institute and SIB (Swiss Institute of Bioinformatics) calculations.

Today, PIR offers a wide variety of resources mainly oriented to assist the propagation and

standardization of protein annotation : PIRSF, iProClass, iProLINK.

Procedure:

Step1: Go to PIR home page at http://www.pir.georgekown.edu

Step2: From the search /analysis option, select composition/molecular weight

calculations.

Step3: Enter the protein sequence in FASTA format.

Step4: Note down the molecular weight composition of each residue.


Output:

]

http://en.wikipedia.org/w/index.php?title=PIRSF&action=edit

http://en.wikipedia.org/w/index.php?title=IProClass&action=edit

http://en.wikipedia.org/w/index.php?title=IProLINK&action=edit

http://www.pir.georgekown.edu/


Fig-23..home page of PIR

Fig-24..search result for catalase protein in PIR



Fig-25. summary report of catalase protein

Fig-26.. PIRSF family hierarchy



UniProtKB ID UniProtKB Accession

Protein Name

A2F9K9_TRIVA A2F9K9Cyclic nucleotide-binding domain containing protein

Fig-27..uniprot entry

Fig-28..general information

Fig-29.. entry information


http://www.pir.uniprot.org/cgi-bin/upEntry?id=A2F9K9

http://www.pir.uniprot.org/cgi-bin/upEntry?id=A2F9K9


Fig-30..bibliography report of catalase protein

Fig-31..sequence information



Fig-32.. id mapping report

Fig-33.. taxonomic distribution



Fig-34.. phylogenitic pattern

Fig-35.. domain display



Fig-36.. query sequence

Fig-37..Allignment between two sequence




Result:

molecular weight of each amino acid present in the given protein sequence of catalase is searched

by using the pir…



PROTEIN DATA BANK (PDB)

Aim:

To analyze the protein using protein data bank.

Description:

PDB is the structure databank. The PDB archive contains macromolecular structure data on proteins,

nucleic acids, protein-nucleic acid complexes and viruses. PDB data is freely available worldwide. A

variety of information available, including sequence 3D structure neighbors computed using various

methods etc.

PDB search can be performed using the output from one search as input. A search can return a single

structure or multiple structures.

The RCSB (research collaborator for structural bio informatics) is a non-profit consortium. The RCSB

PDB provides a variety of tools and resources for studying the structure of biological macromolecules and

their relationship to sequences, functions and disease.

Procedure:

Step1: Go to PDB home page. The home page is www.rcsb.org/pdb.

Step2: The protein to be searched is given in the search box. The structures related to the query protein

that are available in the PDB are listed in the next page.

Step3: The protein structure is then downloaded from the list in the PDB and further work is carried out.

Output:


Fig-38.. Protein data bank home page..

Fig-39.. Search result of protein-catalase



Fig-40.. catalase derived information

fig-41.. 3d view of catalase protein

Result:

The protein catalase is analysed by using the protein data bank


http://images.google.co.in/imgres?imgurl=http://www.juliantrubin.com/encyclopedia/biochemistry/catalase_files/250px-Catalase-1DGF.png&imgrefurl=http://www.juliantrubin.com/encyclopedia/biochemistry/catalase.html&h=232&w=250&sz=102&hl=en&start=20&tbnid=y61Pmv4ioEHbVM:&tbnh=103&tbnw=111&prev=/images%3Fq%3Dcatalase%26gbv%3D2%26svnum%3D10%26hl%3Den



CAMBRIDGE STRUCTURE DATABASE (CSD)

Aim:

To retrieve structural information regarding a protein using CSD.

Description:

CSD is the world repository of small molecule crystal structures the Cambridge structural database (CSD)

is the principal product of the CCDC. It is the central focus of the CSD system, which also comprises

software for Data base access, structure visualization and data analysis, and structural Knowledge bases

derived from the CSD.The CSD records bibliographic, Chemical and crystallographic information for:

organic molecules and Metal-organic compounds whose 3D structures have been determined using X-ray

diffraction and neuron diffraction. The Cambridge structural database(CSD) is distributed as part of the

CSD system which includes software for

• Search and information retrieval (ConQuest)

• Structure visualization (Mercury)

• Numerical analysis (Vista)

• Database creation (PreQuest)

Unlike the protein data bank, CSD does not store polypeptides and polysaccharides having more than 24

units as well as oligonucleotides and metals and alloys.

Deposit Structure

The Cambridge Crystallographic Data Centre accepts depositions of crystal Structure data from X-ray

and neuron diffraction studies. Data depositions with the CCDC are of two main types: Pre-publication:

structures are being submitted for publication in a journal. Private communication to the CSD: structures

are not intended for publication, but you wish them to be available to other scientists through the CSD.

Depositions that include the list of authors and the full journal reference only will be accepted. On

receipt, all depositions are stored In the secure and separate CCDC supplementary data archive. After

publication, Deposited structures will be processed to the main distributed CSD, and the original

Deposited data will be made freely available to all scientists on request, for research purpose.

Request Structure

Cambridge crystallographic data centre (CCDC) has provided copies of the supplementary data of

individual published structures for research purpose. Data arriving electronically at CCDC in CIF format

are held in CCDC supplementary data archive. After publication, these data are converted into CSD

entries by the addition of



Bibliographic and chemical text, chemical structural data, and the results of crystal structure validation.

Each database entry is identified by a CSD reference code, comprising: six alphabetic characters (e.g.

BENZEN) to identify the chemical compound a possible two digit number (as in benzen05) to identify a

specific experimental determination of the crystal structure of BENZEN. A typical CSD entry would thus

comprise information categories such as

• Bibliography

• 2D chemical conductivity

• 3D molecular structure of a particular molecule.

• CSD entries have a total number of 366886 structures and 19000 publications.

Procedure:

Step1: Go to http://www.ccdc.com.ac.uk/

Step2: Note down the features of deposit structure and request structure in

Cambridge Structure Database.


Output:


Fig-42..home page of cambridge structural database

Result:

The features of CSD were noted down.




PUBMED

Aim:

To perform literature search using pubmed.

Description:

Pubmed, available via the NCBI Entrez retrieval system, was developed by the

National center for biotechnology information (NCBI) at the national library of medicine

(NLM), located at the U.S. national institutes of health (NIH), Entrez is the text based search and retrieval

system used at NCBI for services including pubmed, nucleotide and protein sequences, protein structures,

complete genomes, taxonomy, OMIM, and many others. Pubmed provides access to citations from

biomedical literature. Linkout provides access to full-text articles at journal websites and other related web

resources; pubmed provides a batch citation matcher, which allows users to match their citations to

pubmed citations using bibliographic information such as journal, volume, issue, page number, and year.

Procedure:

Step 1: Go to http:// www.ncbi.nlm.nih.gov/entrez

Step 2: Search pubmed for salmonella

Step 3: Select any article from the result page and note down the Article name, journal name, author name,

pubmed ID

Step 4: Use ‘limit’ in pubmed by selecting limit option and select any

criteria for searching like search by author, search by journal

or date etc,

Step 5: Note down one or two articles based on the given criteria


Output:


Fig-43.. Pubmed home page:

Fig-44.. Search result in pubmed



Fig-45.. Summary of search result

Result:

The literature search is done for salmonella by using the pubmed




BLAST

Aim:

To compare the protein sequence of interest to the entries in the non-redundent database using blast-

blastp.

Description:

Searching the database will provide similar sequence, which is potentially related to our sequence from

the several thousands of sequences present in the database. Searching the basic option is to align the query

sequence of the subject sequence in the database. The computer programs that come for help are BLAST

and FASTA. BLAST-the basic local alignment search tool.

(http://www.ncbi.nlm.nih.gov/BLAST) searches for sequences (nucleotide or protein) in the given

database that in similar to a given sequence. It was developed by Altschul and co-workers at the NCBI.

Procedure:

Step 1: Go to the NCBI site and choose the protein from the PDB in the FASTA format.

Step 2: Select “BLASTp” option from NCBI-BLAST.

Step 3: Paste the sequence in the FASTA format.

Step 4: Enter BLAST and press format once the search is over.

Step 5: The results in description and graphical representation are displayed.

Step 6: Sequences similar to the query are arranged in decreasing order of e- value.

Step 7: Note down the best to homology search results of BLAST.

Step 8: Close the BLAST.

Output

CAA00094. Reports amylase [Aspergil...[gi:14646] BLink, Links

>gi|14646|emb|CAA00094.1| amylase [Aspergillus niger]

MTIFLFLAIFVATALAATPAEWRSQSIYFLLTDRFARTDNSTTASCDLSARVSH


Fig-46.. blast result query sequence



Fig-47.. sequence allignment

Result:

Protein sequence of amylase to the entries in the non-redundent database using blast-blastp is compared by

using blast




FASTA

Aim:

To align the protein sequence using FASTA tool.

Description:

FASTA is a DNA and Protein sequence alignment software package first described (as FASTP) by David

J. Lipman and William R. Pearson in 1985. The original FASTP program was designed for protein

sequence similarity searching. FASTA, described in 1988 (Improved Tools for Biological Sequence

Comparison) added the ability to do DNA: DNA searches, translated protein: DNA searches, and also

provided a more sophisticated shuffling program for evaluating statistical significance. There are several

programs in this package that allow the alignment of protein sequences and DNA sequences. FASTA is

pronounced "FAST-Aye", and stands for "FAST-All", because it works with any alphabet, an extension of

"FAST-P" (protein) and "FAST-N" (nucleotide) alignment. This reflects the fact that it can be used for a

fast protein comparison or a fast nucleotide comparison.

FASTA program acheives a high level of sensitivity for similarity searching at a high speed. This is

achieved by performing optimized searches for local alignments using a substitution matrix. The high

speed of this program is achieved by using the observed pattern of word hits to identify potential matches

before attempting the more time consuming optimized search. The trade-off between speed and sensitivity

is controlled by the ktup parameter, which specifies the size of the word. Increasing the ktup decreases the

number of background hits. Not every word hit is investigated but instead initially looks for segment’s

containing several nearby hits.

The FASTA package provides SSEARCH, an implementation of the optimal Smith-Waterman algorithm.

The FASTA package is available from fasta.bioch.virginia.edu.

The web-interface to submit sequences for running a search of the European Bioinformatics Institute

(EBI)'s online databases is also available called fasta33. The FASTA file format used as input for this

software is now largely used by other sequence database search tools (such as BLAST) and sequence

alignment programs (Clustal,T-Coffee, etc).

Procedure:

Step1: Go to http://www.ebi.ac.uk/fasta33

The page appears as below:


Fig 48 – Home page of Fasta

Step2: The above parameters are set.

Step3: The protein sequence is either pasted in the query box or it is uploaded through the ‘Browse…’

option.

Step4: Finally, ‘Run Fasta3’ is clicked, which displays the result window.

Output:



fig-49.. fasta summary table

Fig-50 fasta results



Fig-51 fasta scores



Fig-52 general information of fasta result

Fig-53 fasta sequence

Result:

The protein sequence of catalase is alligned using FASTA




CLUSTAL W

AIM:

To perform multiple sequence alignment of 10 blast hits using Clustal W

DESCRIPTION:

Multiple alignments of protein sequence and nucleotide sequence are important tool in studying the

sequences. The basic information provides identification of conserved regions. This is very useful in

designing experiments to test and modify the function of specific proteins predicting the function and

structure of proteins and in identifying new members of protein families.

Clustal W is a general purpose multiple sequence alignment program for DNA or proteins.

Julie.D.Thomson and Toby Gibson of European Molecular Biology Laboratory produced Clustal W. It

produces biologically meaningful multiple sequence alignments of divergent species. It calculates the best

match for the selected sequence , and lines them up so that the identities, similarities and differences can

be seen. Evolutionary relationship can be seen via viewing cladograms or phylograms.

PROCEDURE:

Step 1: The multiple sequences were uploaded in the fasta format.

Step 2: The options are left default.

Step 3: If desired sequence title may be entered.

Step 4: Press the run button Clustal w alignment.

Step 5: The plain text version of the alignment will be temporarily stored in all

file. ‘*’ Means that the particular residue into that column are identical in all sequence

alignment.‘:’ Means that conserved substitution have been observed according to colour table.‘.’

Means that semi conserved substitutions are observed.

Step 6: Save the Phylogenic trees.

Output:


Fig-54.. multiple sequence allignment



Fig-55 multple aligned seqences



Fig-56 phylogram tree

Fig-57 cladogram tree view



Fig-58 sequence name view

- Result: the multiple sequence of 10 selected proteins is aligned using the clustrl w



CLUSTAL X

Aim:

To perform multiple sequence alignment of 10 Blast hits using Clustal X.

Description:

Clustal X is windows interface for Clustal W multiple sequence alignment programs. It provides an

integrated environment for performing multiple sequence and profile alignments and analyzing the

results.Clustal X is available for a number of different platforms including SUN solaris, IRIX5.3, Digital

UNIX, MS Windows, Linux ELF and Macintosh power Mac.

Procedure:

Step 1: Find homologous sequences using BLAST

Step 2: Open the CLUSTAL X program

Step 3: Submit the sequences

Step 4: Choose” Alignment” and do input the sequences

Step 5: Open the “NJ Plot” and input the sequences

Step 6: View the plot

Step 7: Save the result

Output:

fig-59

BEFORE ALLAINMENT



Fig-60 before alignment graph

Fig-61 after allignment

Fig-62 after alignment graph



Fig-63 tree structure

Result:

multiple sequence alignment of 10 Blast hits using Clustal X. is prepared




BIOEDIT

Aim:

To perform editing, alignment and manipulation for protein and nucleic acid sequences.

Description:

Bioedit is intuitive, menu-driven and graphical tool offers a graphical interface for users to run external

analysis program. It runs in windows 95/98/NT and it provides basic functions for protein and nucleic

sequence editing, alignment, manipulation and analysis. Don Gilbert modeled BioEdit. BioEdit uses

Clustal W for multiple sequence alignment and output are displayed in tree view. Dot plots and pairwise

alignments are possible. Sequences can easily be analyzed for composition with graphical output. Basic

manipulations in BioEdit are lock and unlock gaps, translate or reverse-translate, toggle translation,

nucleotide composition, complement for DNA or RNA sequence, creates plasmid from sequences,

restriction map, amino acid composition, hydrophobicity profiles. BioEdit allows the option for very

simple, optimal sequence alignments directly within an alignment document. BioEdit currently reads and

writes Genbank, Fasta, NBRF/PIR, Phylip 3.2 and Phylip 4 formats and reads Clustal W and GCG

formats. BioEdit currently supports the simultaneous editing of up to 50 documents. A main control form

contains menus to open documents, create new documents, set global options such as color tables, codon

table and analysis preferences and a window manager. Dynamic memory allocation with support for up to

20,000 sequences per document. Sequences up to 4.6 million bases have been tested successfully.

Procedure:

Load FASTA sequence by the menu file→Open

PAIRWISE ALIGNMENT

Step 1: Select two sequences

Step 2: Go to menu→ sequence→Pairwise alignment→Align two sequences (allow ends to slide).

Step 3: Alignment score & save the result.


DOT PLOT

Step 1: Select two sequences

Step 2: Go to menu→ sequence→Dot plot (Pairwise comparison)

Step 3: Check do full shaded matrix & select BLOSUM 62 as similarity matrix & click ok.

Step 4: Click ok in plot matrix dialog box.

Step 5: From the plot, go to view→Data Examiner→Check color option.

Step 6: Save the result.




AMINO ACID COMPOSTION

Step 1: Select one protein sequence

Step 2: Go to sequence→Protein→Amino acid composition.

Step 3: note down the number of each Amino acid present in the protein.

Step 4: Save & Close the window.

HELICAL WHEEL DIAGRAM

Step 1: Select one protein sequence.

Step 2: Go to sequence→Protein→Helical wheel diagram.

Step 3: Save the diagram & close.

CONSENSUS SEQUENCE

Step 1: Select 2 protein sequence.

Step 2: Go to alignment→Create consensus sequence.

Step 3: Note down the consensus region.

Step 4: Save & Close the window.

CONSERVATION PLOT

Step 1: Select 5 protein sequence.

Step 2: Go to view→Conservation plot.

Step 3: Save the conserved sequence.


PROTEIN DISTANCE

Step 1: Select five sequences.

Step 2: Go to accessory Application→protodist Protein distance matrix→ run application.

Step 3: Note down the distance among the sequences.

Step 4: close the window.

Output:


Fig-64 protein sequence



Fig-65 amino acid sequence

Fig-66 helical wheel diagram



Fig-67 hydrophobic moment

Fig-68 Kyte &dolite scale mean hydrophobicity



Fig-69 eisenbergscale mean hydrobhobicity

Fig-70 cornette scale mean hydrophobicity profile



Fig-71 parker hplc scale mean hydrophobicity profile

Fig-72 boyko scale mean hydrophobicity profile



Fig-73 hopp&woods scale hydrophobicity profile

Fig-74 eisenberg hydrophobic moment profile



Fig-75 mean einsenberg hydrophobic moment profile

Fig-76 dot plot pair wise allignment—sequence comparision

Result:

For the aligned sequences the Pairwise alignment, Dot plot, Amino acid composition, Consensus

sequences, conservation plot and Protein distance observed.




GENSCAN

Aim:

To predict the structure and function of a particular gene using GENSCAN.

Description:

GENSCAN is a general-purpose gene identification program which analyzes genomic DNA sequences

from a variety of organisms including human, other vertebrates, invertebrates and plants.

For each sequence, the program determines the most likely "parse" (gene structure) under a probabilistic

model of the gene structural and compositional properties of the genomic DNA for the given organism.

This set of exons/genes is then printed to an output file (the text output) together with the corresponding

predicted peptide sequences. A graphical (PostScript) output may also be created which displays the

location and DNA strand of each predicted exon.

Unlike the majority of other currently available gene prediction programs, the model treats the most

general case in which the sequence may contain no genes, one gene, or multiple genes on either or both

DNA strands and partial genes as well as complete genes are considered. The probabilistic model used by

GENSCAN accounts for many of the essential gene structural properties of genomic sequences, e.g.,

typical gene density, the typical number of exons per gene, the distribution of exon sizes for different

types of exon.

The novel features of the program include the capacity to predict multiple genes in a single sequence, to

deal with partial as well as complete genes. Genscan is shown to have substantially higher accuracy than

the existing methods when tested on standardized set of human and vertebrate genes with 75-80% of

exons identified exactly.

Procedure:

Step1: The nucleotide sequence of (organism chosen) is obtained from NCBI site.

Step2: Go to http://genes.mit.edu/gensacn.html.

Step3: Select the organism, sub-optimal exon cut-off, print options and paste the nucleotide

sequence in the space provided.

Step4: Click on run GENSCAN.

Step5: Note down the gene number, exon number, type, DNA strand, beginning and end of exon or

signal, length of the exon or signal and exon score.

Step6: Click on “here” to view the PDF image of the predicted gene.

Step7: Save the PDF image page and the output page.



Fig 77 – Genscan output

Fig 78 – PDF image of predicted gene at 1.00



Fig-79 – PDF image of predicted gene at 0.50

Fig 80 – PDF image of predicted gene at 0.25

Result:

Thus the structure and function of gene is predicted using GENSCAN.



LITERATURE SEARCH Aim:

To distinguish the various advanced and preference search engines as Google and Google Scholar.

Procedure:

Google Search

Step1: Go to http://www.google.co.in.

Step2: Type the topic name for which information is to be collected.

Step3: A preferred result is viewed and saved as soft copy.

Google Scholar Search Step1: Go to http://www.scholar.google.com.

Step2: Type the topic name for which literature or narrow search is to be collected.

Step3: A preferred result is viewed and saved as a soft copy.

Output:

Fig-81 home page of google



Fig-82 search result of cancer

Fig-83 cancer refrences



Fig-84 home page of google scholar

Fig-85 search result of articles in google scholar



Fig-86 scholary article of cancer

Result:

The google search and google scholar is searched for the topic of cancer and with refrence the

articles




PREPARATION OF BIBLIOGRAPHY Aim:

To prepare the bibliography of an article on Apoptosis. Its significance in cancer and cancer therapy

Procedure:

Step 1: Go to http:// www.google schlolar.com

Step 2: Type the topic of interest in space provided and performs Google

advanced search.

Step 3: Select an article from the result hits provided.

Step 4: Note down the name of the author, title of the article, the name of the

Journal in which the article was published, Volume, Year of publication

and page number.

Step 5: Save the page.


Output:

Title of article: Apoptosis. Its significance in cancer and cancer therapy

Author name: kerr JF,winterford cm, harmon BV

Department of Pathology, University of Queensland Medical School, Herston, Australia Cancer. 1994 Apr 15;73(8):2013-26

Summary: Apoptosis is a distinct mode of cell death that is responsible for deletion of cells in normal

tissues; it also occurs in specific pathologic contexts. Morphologically, it involves rapid condensation and

budding of the cell, with the formation of membrane-enclosed apoptotic bodies containing well-preserved

organelles, which are phagocytosed and digested by nearby resident cells. There is no associated

inflammation.


Fig-88 result of google scholar of cancer

Fig-89 Articles in cancer - apoptosis




Result:

The bibliography of an article on Apoptosis. Is prepared

Article details Article title : Autosomal Dominant Inheritance of Early-Onset Breast Cancer:

Implications for Risk Prediction Author : Claus, E. B. Risch, N. Thompson, W. D. Journal title : OBSTETRICAL AND GYNECOLOGICAL SURVEY Bibliographic details : 1994, VOL 49; NUMBER 6, pages 401 Publisher : WILLIAMS & WILKINS Country of publication USA Language : GERMANY Pricing : To buy the full text of this article you pay:

£15.00 copyright fee + service charge (from £7.65) + VAT, if applicable

Article details Article title : Apoptosis: Its Significance in Cancer and Cancer Therapy Author : Kerr, J. F. R. Winterford, C. M. Harmon, B. V. Journal title : CANCER -PHILADELPHIA- Bibliographic details :1994, VOL 73; NUMBER 8, pages 2013 Publisher : J B LIPPINCOTT CO Country of publication : USA Language : ENGLISH Pricing : To buy the full text of this article you pay:

£17.00 copyright fee + service charge (from £7.65) + VAT, if applicable.

selnikraj - basic bioinformatics

Documents