bioinformatics
DESCRIPTION
Bioinformatics. Neha Barve Lecturer, Bioinformatics School of Biotechnology, DAVV Indore. Topics covered. Introduction to Bioinformatics Area of Bioinformatics Fields Biological databases Sequence analysis. What Is Bioinformatics?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/1.jpg)
Bioinformatics
Neha BarveLecturer, Bioinformatics
School of Biotechnology, DAVVIndore
![Page 2: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/2.jpg)
Topics covered
Introduction to Bioinformatics Area of Bioinformatics Fields Biological databases Sequence analysis
![Page 3: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/3.jpg)
What Is Bioinformatics?
Bioinformatics is the unified discipline formed from the combination of: Biology, Computer science, and Information technology.
"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.“ –Frank Tekaia
![Page 4: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/4.jpg)
Area
Algorithms, Databases and information
systems, Artificial intelligence and
soft computing, Information and
computation theory, Modeling and simulation,
Generating new knowledge of biology and medicine, and improving & discovering new models of computation (e.g. DNA computing, neural computing, evolutionary computing, immuno-computing, cellular-computing).
![Page 5: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/5.jpg)
Related fields of Bioinformatics
![Page 6: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/6.jpg)
Related Fields:Computational Biology
The study and application of computing methods for classical biology
Primarily concerned with evolutionary, population and theoretical biology, rather than the cellular or molecular level
![Page 7: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/7.jpg)
Related Fields:Medical Informatics
The study and application of computing methods to improve communication, understanding, and management of medical data
Generally concerned with how the data is manipulated rather than the data itself
![Page 8: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/8.jpg)
Related Fields:Cheminformatics
The study and application of computing methods, along with chemical and biological technology, for drug design and development
![Page 9: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/9.jpg)
Related Fields:Genomics
Analysis and comparison of the entire genome of a single species or of multiple species
A genome is the set of all genes possessed by an organism
Genomics existed before any genomes were completely sequenced, but in a very primitive state
![Page 10: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/10.jpg)
Related Fields:Proteomics
Study of how the genome is expressed in proteins, and of how these proteins function and interact
Concerned with the actual states of specific cells, rather than the potential states described by the genome
![Page 11: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/11.jpg)
Related Fields:Pharmacogenomics
The application of genomic methods to identify drug targets
For example, searching entire genomes for potential drug receptors, or by studying gene expression patterns in tumors
![Page 12: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/12.jpg)
Related Fields:Pharmacogenetics
The use of genomic methods to determine what causes variations in individual response to drug treatments
The goal is to identify drugs that may be only be effective for subsets of patients, or to tailor drugs for specific individuals or groups
![Page 13: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/13.jpg)
Genomics
Classic Genomics Post Genomic era
Comparative Genomics Functional Genomics Structural Genomics
![Page 14: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/14.jpg)
What is Genomics?
Genome complete set of genetic instructions for
making an organism Genomics
any attempt to analyze or compare the entire genetic complement of a species
Early genomics was mostly recording genome sequences
![Page 15: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/15.jpg)
What next?
Post Genomic era Comparative Genomics Functional Genomics Structural Genomics
![Page 16: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/16.jpg)
What Is Proteomics?
Proteomics is the study of the proteome—the “PROTEin complement of the genOME”
More specifically, "the qualitative and quantitative comparison of proteomes under different conditions to further unravel biological processes"
![Page 17: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/17.jpg)
What Makes Proteomics Important?
All cells in an organism contain the same DNA. This DNA encodes every possible cell type in
that organism—muscle, bone, nerve, skin, etc. If we want to know about the type and state of a
particular cell, the DNA does not help us, in the same way that knowing what language a computer program was written in tells us nothing about what the program does.
![Page 18: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/18.jpg)
What Makes Proteomics Important?
There are more than 160,000 genes in each cell, only a handful of which actually determine that cell’s structure.
Many of the interesting things about a given cell’s current state can be deduced from the type and structure of the proteins it expresses.
Changes in, for example, tissue types, carbon sources, temperature, and stage in life of the cell can be observed in its proteins.
![Page 19: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/19.jpg)
Proteomics In Disease Treatment
Nearly all major diseases—more than 98% of all hospital admissions—are caused by an particular pattern in a group of genes.
Isolating this group by comparing the hundreds of thousands of genes in each of many genomes would be very impractical.
Looking at the proteomes of the cells associated with the disease is much more efficient.
![Page 20: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/20.jpg)
Proteomics In Disease Treatment
Many human diseases are caused by a normal protein being modified improperly. This also can only be detected in the proteome, not the genome.
The targets of almost all medical drugs are proteins. By identifying these proteins, proteomics aids the progress of pharmacogenetics.
![Page 21: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/21.jpg)
Examples
What do these have in common? Alzheimer's disease Cystic fibrosis Mad Cow disease An inherited form of emphysema Even many cancers
![Page 22: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/22.jpg)
Biological databases
Data: any fact and statistics which has some useful information and used for analysis.
Biological system has tremendous data and it is difficult to maintain this much amount in paper
So need of a system to store and manipulate it.
![Page 23: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/23.jpg)
Definition A biological database is a collection of data that is
organized so that its contents can easily be accessed, managed, and updated. The activity of preparing a database can be divided in to:
· Collection of data in a form which can be easily accessed
· Making it available to a multi-user system ( always available for the user)
The biological information of nucleic acids is available as sequences while the data of proteins is available as sequences and structures. Sequences are represented in single dimension where as the structure contains the three dimensional data of sequences.
![Page 24: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/24.jpg)
Biological databases It is a library of scientific information. Which type of data?
DNA (sequence) RNA Protein (sequence, structure, functions, annotations, localiztion etc) Chemical (structure, reactions, assays) Literature Enzymes (Functions) Signaling pathways (prok, euk.) Evolutionary (Phylogenetics) Clinical
![Page 25: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/25.jpg)
From where this data come from: Experimentations (genome sequencing, structure
determination, chemical reactions) Literatures (scientific research papers) Computational analysis (predictions)
Need: Make it available to researcher all over the world Development / storage updation To make biological data available in computer-readable form.
![Page 26: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/26.jpg)
Types
Primary Secondary Based on type of data
Sequence databases (protein/ nucleic acid) Structure database Enzyme database Signaling pathway Interaction databases (PPI/PDI)
![Page 27: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/27.jpg)
04/24/23 14:07
Different classifications of databases….
Primary or derived databases Primary databases: experimental results directly into
database Secondary databases: results of analysis of primary
databases Aggregate of many databases
Links to other data items Combination of data Consolidation of data
![Page 28: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/28.jpg)
Data types:•Relationships between
sequence 3D structure protein functions
•Properties and evolution of genes, genomes, proteins, metabolic pathways in cells
•Use of this knowledge for prediction, modelling, and design
TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATMLV
![Page 29: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/29.jpg)
Nucleic acid EMBL GenBank DDBJ (DNA Data Bank of
Japan)
![Page 30: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/30.jpg)
Examples of databases
Primary: (Nucleic acid) DDBJ EMBL GeneBank(Protein) PIR MIPS SWISS-PROT TrEMBL
Secondary: PROSITE Pfam TrEMBL FSSP HSSP Etc………
![Page 31: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/31.jpg)
Genebank
GenBank is the NIH genetic sequence database of all publicly available DNA and derived protein sequences, with annotations describing the biological information these records contain.
![Page 32: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/32.jpg)
EMBL/GenBank/DDJB These 3 db contain mainly the same information (few
differences in the format and syntax) Serve as archives containing all sequences (single genes,
ESTs, complete genomes, etc.) derived from: Genome projects and sequencing centers Individual scientists Patent offices (i.e. USPTO, EPO)
Non-confidential data are exchanged daily There are approximately 126,551,501,141 bases in 135,440,924
sequence records in the traditional GenBank divisions and 191,401,393,188 bases in 62,715,288 sequence records in the WGS division as of April 2011.
Sequences from > 50,000 different species;
![Page 33: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/33.jpg)
33
GenBank
DDBJEMBL
EMBLEMBL
Entrez
SRS
getentry
NIGNIGCIB EBI
NCBI
NIHNIH
•Submissions•Updates
•Submissions•Updates
•Submissions•Updates
![Page 34: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/34.jpg)
Each database store data in specific file format.
For e.g. genbank file format Embl file format Ddbj file format
![Page 35: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/35.jpg)
35
GenBank Flat File (GBFF)LOCUS MUSNGH 1803 bp mRNA ROD 29-AUG-1997DEFINITION Mouse neuroblastoma and rat glioma hybridoma cell line NG108-15 cell TA20 mRNA, complete cds.ACCESSION D25291NID g1850791KEYWORDS neurite extension activity; growth arrest; TA20.SOURCE Murinae gen. sp. mouse neuroblastma-rat glioma hybridoma cell_line:NG108-15 cDNA to mRNA. ORGANISM Murinae gen. sp. Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae.REFERENCE 1 (sites) AUTHORS Tohda,C., Nagai,S., Tohda,M. and Nomura,Y. TITLE A novel factor, TA20, involved in neuronal differentiation: cDNA cloning and expression JOURNAL Neurosci. Res. 23 (1), 21-27 (1995) MEDLINE 96064354REFERENCE 3 (bases 1 to 1803) AUTHORS Tohda,C. TITLE Direct Submission JOURNAL Submitted (18-NOV-1993) to the DDBJ/EMBL/GenBank databases. Chihiro Tohda, Toyama Medical and Pharmaceutical University, Research Institute for Wakan-yaku, Analytical Research Center for Ethnomedicines; 2630 Sugitani, Toyama, Toyama 930-01, Japan (E-mail:[email protected], Tel:+81-764-34-2281(ex.2841), Fax:+81-764-34-5057)COMMENT On Feb 26, 1997 this sequence version replaced gi:793764.FEATURES Location/Qualifiers source 1..1803 /organism="Murinae gen. sp." /note="source origin of sequence, either mouse or rat, has not been identified" /db_xref="taxon:39108" /cell_line="NG108-15" /cell_type="mouse neuroblastma-rat glioma hybridoma" misc_signal 156..163 /note="AP-2 binding site" GC_signal 647..655 /note="Sp1 binding site" TATA_signal 694..701 gene 748..1311 /gene="TA20" CDS 748..1311 /gene="TA20" /function="neurite extensiion activity and growth arrest effect" /codon_start=1 /db_xref="PID:d1005516" /db_xref="PID:g793765" /translation="MMKLWVPSRSLPNSPNHYRSFLSHTLHIRYNNSLFISNTHLSRR KLRVTNPIYTRKRSLNIFYLLIPSCRTRLILWIIYIYRNLKHWSTSTVRSHSHSIYRL RPSMRTNIILRCHSYYKPPISHPIYWNNPSRMNLRGLLSRQSHLDPILRFPLHLTIYY RGPSNRSPPLPPRNRIKQPNRIKLRCR" polyA_site 1803BASE COUNT 507 a 458 c 311 g 527 tORIGIN 1 tcagtttttt tttttttttt tttttttttt tttttttttt tttttttttg ttgattcatg 61 tccgtttaca tttggtaagt tcacaggcct cagtcaacac aattggactg ctcaggaaat 121 cctccttggt gaccgcagta tacttggcct atgaacccaa gccacctatg gctaggtagg 181 agaagctcaa ctgtagggct gactttggaa gagaatgcac atggctgtat cgacatttca 241 catggtggac ctctggccag agtcagcagg ccgagggttc tcttccgggc tgctccctca 301 ctgcttgact ctgcgtcagt gcgtccatac tgtgggcgga cgttattgct atttgccttc 361 cattctgtac ggcattgcct ccatttagct ggagagggac agagcctggt tctctagggc 421 gtttccattg gggcctggtg acaatccaaa agatgagggc tccaaacacc agaatcagaa 481 ggcccagcgt atttgtaaaa acaccttctg gtgggaatga atggtacagg ggcgtttcag 541 gacaaagaac agcttttctg tcactcccat gagaaccgtc gcaatcactg ttccgaagag 601 gaggagtcca gaatacacgt gtatgggcat gacgattgcc cggagagagg cggagcccat 661 ggaagcagaa agacgaaaaa cacacccatt atttaaaatt attaaccact cattcattga 721 cctacctgcc ccatccaaca tttcatcatg atgaaacttt gggtcccttc taggagtctg 781 cctaatagtc caaatcatta caggtctttt cttagccata cactacacat cagatacaat 841 aacagccttt tcatcagtaa cacacatttg tcgagacgta aattacgggt gactaatccg 901 atatatacac gcaaacggag cctcaatatt ttttatttgc ttattccttc atgtcggacg 961 aggcttatat tatggatcat atacatttat agaaacctga aacattggag tacttctact 1021 gttcgcagtc atagccacag catttatagg ctacgtcctt ccatgaggac aaatatcatt 1081 ctgaggtgcc acagttatta caaacctcct atcagccatc ccatatattg gaacaaccct 1141 agtcgaatga atttgagggg gcttctcagt agacaaagcc accttgaccc gattcttcgc 1201 tttccacttc atcttaccat ttattatcgc ggccctagca atcgttcacc tcctcttcct 1261 ccacgaaaca ggatcaaaca acccaacagg attaaactca gatgcagata aaattccatt 1321 tcacccctac tatacatcaa agatatccta ggtatcctaa tcatattctt aattctcata 1381 accctagtat tatttttccc agacatacta ggagacccag acaactacat accagctaat 1441 ccactaaaca ccccacccca tattaaaccc gaatgatatt tcctatttgc atacgccatt 1501 ctacgctcaa tccccaataa actaggaggt gtcctagcct taatcttatc tatcctaatt 1561 ttagccctaa tacctttcct tcatacctca aagcaacgaa gcctaatatt ccgcccaatc 1621 acacaaattt tgtactgaat cctagtagcc aacctactta tcttaacctg aattgggggc 1681 caaccagtag acacccattt attatcattg gccaactagc ctccatctca tacttctcaa 1741 tcatcttaat tcttatacca atctcaggaa ttatcgaaga caaaatacta aaattatatc 1801 cat//
Features (AA seq)
DNA Sequence
Header•Title•Taxonomy•Citation
![Page 36: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/36.jpg)
GenBank file format
![Page 37: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/37.jpg)
37
Accession.version
LOCUS, Accession, gi and PIDLOCUS HSU40282 1789 bp mRNA PRI 21-MAY-1998DEFINITION Homo sapiens integrin-linked kinase (ILK) mRNA, complete cds.ACCESSION U40282VERSION U40282.1 GI:3150001
CDS 157..1515 /gene="ILK" /note="protein serine/threonine kinase" /codon_start=1 /product="integrin-linked kinase" /protein_id="AAC16892.1" /db_xref="PID:g3150002" /db_xref="GI:3150002"
LOCUS: HSU40282 ACCESSION: U40282 VERSION: U40282.1 GI: 3150001 PID: g3150002 Protein gi: 3150002 protein_id: AAC16892.1 Protein_idprotein gi
ACCESSIONLOCUS
PIDgi
![Page 38: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/38.jpg)
38
Which Tool? BankIt: Web based tool which is simple, easy
to use, great for simple submissions, but not ideal for complicated ones. Sakura (DDBJ) WebIn (EMBL)
Sequin: Client that you need to d/l to your computer, a little harder to learn, but has great documentation, and ideal for complicated, large, multiple submissions.
tbl2asn: ideal for batch records, command line, scriptable, can work with sequin
![Page 39: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/39.jpg)
39
Which tool?mRNA Genomic
EST Other Other STS/GSS HTGS
dbEST Simple •Better control of annotations•pop/phylo•segmented sets
Simple dbSTSdbGSS
Customized software or tbl2asn
WWWBankIt
WWWBankIt
E-mailor FTP
E-mailor FTP
E-mailor FTP
Sequinor tbl2asn
![Page 40: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/40.jpg)
Protein databases
•SWISS-PROT: Annotated Sequence Database•TrEMBL: Database of EMBL nucleotide translated sequences•InterPro:Integrated resource for protein families, domains and functional sites.•CluSTr:Offers an automatic classification of SWISS-PROT and TrEMBL.Etc………..
![Page 41: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/41.jpg)
Swiss prot It is a high quality annotated and non-redundant protein
sequence database, which brings together experimental results, computed features and scientific conclusions.Since 2002, it is maintained by the UniProt consortium and is accessible via the UniProt website.
Statistics: Release 2011_12 of 14-Dec-11 of UniProtKB/Swiss-Prot contains 0.5 million sequence entries, comprising 189261966 amino acids abstracted from 205244 references. 629 sequences have been added since release 2011_11, the sequence data of 146 existing entries has been updated and the annotations of 182184 entries have been revised.
![Page 42: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/42.jpg)
![Page 43: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/43.jpg)
Structure database PDB An Information
Portal to Biological Macromolecular Structures
As of Tuesday Dec 20, 2011 there are 78020 Structures
NMR X-ray Electron microscopy
![Page 44: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/44.jpg)
44
HEADER LEUCINE ZIPPER 15-JUL-93 1DGC 1DGC 2 COMPND GCN4 LEUCINE ZIPPER COMPLEXED WITH SPECIFIC 1DGC 3 COMPND 2 ATF/CREB SITE DNA 1DGC 4 SOURCE GCN4: YEAST (SACCHAROMYCES CEREVISIAE); DNA: SYNTHETIC 1DGC 5 AUTHOR T.J.RICHMOND 1DGC 6 REVDAT 1 22-JUN-94 1DGC 0 1DGC 7 JRNL AUTH P.KONIG,T.J.RICHMOND 1DGC 8 JRNL TITL THE X-RAY STRUCTURE OF THE GCN4-BZIP BOUND TO 1DGC 9 JRNL TITL 2 ATF/CREB SITE DNA SHOWS THE COMPLEX DEPENDS ON DNA 1DGC 10 JRNL TITL 3 FLEXIBILITY 1DGC 11 JRNL REF J.MOL.BIOL. V. 233 139 1993 1DGC 12 JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 0070 1DGC 13 REMARK 1 1DGC 14 REMARK 2 1DGC 15 REMARK 2 RESOLUTION. 3.0 ANGSTROMS. 1DGC 16 REMARK 3 1DGC 17 REMARK 3 REFINEMENT. 1DGC 18 REMARK 3 PROGRAM X-PLOR 1DGC 19 REMARK 3 AUTHORS BRUNGER 1DGC 20 REMARK 3 R VALUE 0.216 1DGC 21 REMARK 3 RMSD BOND DISTANCES 0.020 ANGSTROMS 1DGC 22 REMARK 3 RMSD BOND ANGLES 3.86 DEGREES 1DGC 23 REMARK 3 1DGC 24 REMARK 3 NUMBER OF REFLECTIONS 3296 1DGC 25 REMARK 3 RESOLUTION RANGE 10.0 - 3.0 ANGSTROMS 1DGC 26 REMARK 3 DATA CUTOFF 3.0 SIGMA(F) 1DGC 27 REMARK 3 PERCENT COMPLETION 98.2 1DGC 28 REMARK 3 1DGC 29 REMARK 3 NUMBER OF PROTEIN ATOMS 456 1DGC 30 REMARK 3 NUMBER OF NUCLEIC ACID ATOMS 386 1DGC 31 REMARK 4 1DGC 32 REMARK 4 GCN4: TRANSCRIPTIONAL ACTIVATOR OF GENES ENCODING FOR AMINO 1DGC 33 REMARK 4 ACID BIOSYNTHETIC ENZYMES. 1DGC 34 REMARK 5 1DGC 35 REMARK 5 AMINO ACIDS NUMBERING (RESIDUE NUMBER) CORRESPONDS TO THE 1DGC 36 REMARK 5 281 AMINO ACIDS OF INTACT GCN4. 1DGC 37 REMARK 6 1DGC 38 REMARK 6 BZIP SEQUENCE 220 - 281 USED FOR CRYSTALLIZATION. 1DGC 39 REMARK 7 1DGC 40 REMARK 7 MODEL FROM AMINO ACIDS 227 - 281 SINCE AMINO ACIDS 220 - 1DGC 41 REMARK 7 226 ARE NOT WELL ORDERED. 1DGC 42 REMARK 8 1DGC 43 REMARK 8 RESIDUE NUMBERING OF NUCLEOTIDES: 1DGC 44 REMARK 8 5' T G G A G A T G A C G T C A T C T C C 1DGC 45 REMARK 8 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 1 2 3 4 5 6 7 8 9 1DGC 46 REMARK 9 1DGC 47 REMARK 9 THE ASYMMETRIC UNIT CONTAINS ONE HALF OF PROTEIN/DNA 1DGC 48 REMARK 9 COMPLEX PER ASYMMETRIC UNIT. 1DGC 49 REMARK 10 1DGC 50 REMARK 10 MOLECULAR DYAD AXIS OF PROTEIN DIMER AND PALINDROMIC HALF 1DGC 51 REMARK 10 SITES OF THE DNA COINCIDES WITH CRYSTALLOGRAPHIC TWO-FOLD 1DGC 52 REMARK 10 AXIS. THE FULL PROTEIN/DNA COMPLEX CAN BE OBTAINED BY 1DGC 53 REMARK 10 APPLYING THE FOLLOWING TRANSFORMATION MATRIX AND 1DGC 54 REMARK 10 TRANSLATION VECTOR TO THE COORDINATES X Y Z: 1DGC 55 REMARK 10 1DGC 56 REMARK 10 0 -1 0 X 117.32 X SYMM 1DGC 57 REMARK 10 -1 0 0 Y + 117.32 = Y SYMM 1DGC 58 REMARK 10 0 0 -1 Z 43.33 Z SYMM 1DGC 59 SEQRES 1 A 62 ILE VAL PRO GLU SER SER ASP PRO ALA ALA LEU LYS ARG 1DGC 60 SEQRES 2 A 62 ALA ARG ASN THR GLU ALA ALA ARG ARG SER ARG ALA ARG 1DGC 61 SEQRES 3 A 62 LYS LEU GLN ARG MET LYS GLN LEU GLU ASP LYS VAL GLU 1DGC 62 SEQRES 4 A 62 GLU LEU LEU SER LYS ASN TYR HIS LEU GLU ASN GLU VAL 1DGC 63 SEQRES 5 A 62 ALA ARG LEU LYS LYS LEU VAL GLY GLU ARG 1DGC 64 SEQRES 1 B 19 T G G A G A T G A C G T C 1DGC 65 SEQRES 2 B 19 A T C T C C 1DGC 66 HELIX 1 A ALA A 228 LYS A 276 1 1DGC 67 CRYST1 58.660 58.660 86.660 90.00 90.00 90.00 P 41 21 2 8 1DGC 68 ORIGX1 1.000000 0.000000 0.000000 0.00000 1DGC 69 ORIGX2 0.000000 1.000000 0.000000 0.00000 1DGC 70 ORIGX3 0.000000 0.000000 1.000000 0.00000 1DGC 71 SCALE1 0.017047 0.000000 0.000000 0.00000 1DGC 72 SCALE2 0.000000 0.017047 0.000000 0.00000 1DGC 73 SCALE3 0.000000 0.000000 0.011539 0.00000 1DGC 74 ATOM 1 N PRO A 227 35.313 108.011 15.140 1.00 38.94 1DGC 75 ATOM 2 CA PRO A 227 34.172 107.658 15.972 1.00 39.82 1DGC 76
ATOM 842 C5 C B 9 57.692 100.286 22.744 1.00 29.82 1DGC 916 ATOM 843 C6 C B 9 58.128 100.193 21.465 1.00 30.63 1DGC 917 TER 844 C B 9 1DGC 918 MASTER 46 0 0 1 0 0 0 6 842 2 0 7 1DGC 919 END 1DGC 920
PDB HEADER COMPND SOURCE AUTHOR DATE JRNL REMARK SEQRES ATOM COORDINATES
![Page 45: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/45.jpg)
Other databases
Pathway databases Enzyme databases Ligand databases Motif, domain databases etc…………..
![Page 46: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/46.jpg)
Sequence analysis
term sequence analysis refers to the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution.
![Page 47: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/47.jpg)
Need The comparison of sequences in order to find
similarity often to infer if they are related (homologous) Identification of intrinsic features of the sequence such
as active sites, post translational modification sites, gene-structures, reading frames, distributions of introns and exons and regulatory elements
Identification of sequence differences and variations such as point mutations and single nucleotide polymorphism (SNP) in order to get the genetic marker.
Revealing the evolution and genetic diversity of sequences and organisms
Identification of molecular structure from sequence alone
![Page 48: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/48.jpg)
![Page 49: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/49.jpg)
![Page 50: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/50.jpg)
![Page 51: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/51.jpg)
![Page 52: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/52.jpg)
![Page 53: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/53.jpg)
Types
Pair wise sequence alignment (two sequnces e.g. BLAST)
Multiple sequence alignment (more than two e.g. ClustalW, Tcoffee, MUSCLE etc )
![Page 54: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/54.jpg)
Algorithms for sequence alignment
Needleman Wunsch algorithm Smithwaterman algorithms BLAST FASTA
![Page 55: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/55.jpg)
Needleman Wunsch algorithm Used for global alignment Used to align protein and nucleotide sequences Dynamic programming (complex to similar)A divide-and-conquer strategy: Break the problem into smaller subproblems. Solve the smaller problems optimally. Use the sub-problem solutions to construct an optimal solution for the original problem.The Needleman-Wunsch algorithm consists of three steps: 1. Initialization of the score matrix 2. Calculation of scores and filling the traceback matrix 3. Deducing the alignment from the traceback matrix
![Page 56: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/56.jpg)
![Page 57: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/57.jpg)
APPLICATIONS
Profile comparison (all sequences of a family)
Gene prediction Annotation Homology modelling (Protein Structure
Prediction) Primer designing
![Page 58: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/58.jpg)
Applications of bioinformatics Molecular medicine Gene therapy Drug development Microbial genome
applications Waste cleanup Climate change Studies Alternative energy
sources Biotechnology Vetinary Science
Antibiotic resistance Forensic analysis of
microbes Bio-weapon creation Evolutionary studies Crop improvement Insect resistance Improve nutritional quality Development of Drought
resistance varieties
![Page 59: Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062521/568155c6550346895dc39838/html5/thumbnails/59.jpg)
Classification
Protein sequence databases: UniProt Universal Protein Resource (UniProt Consortium: EBI,
Expasy, PIR) PIR Protein Information Resource (Georgetown University Medical
Center (GUMC)) Swiss-Prot Protein Knowledgebase (Swiss Institute of
Bioinformatics) PEDANT Protein Extraction, Description and ANalysis Tool
(Forschungszentrum f. Umwelt & Gesundheit) PROSITE Database of Protein Families and Domains DIP Database of Interacting Proteins (Univ. of California) Pfam Protein families database of alignments and HMMs (Sanger
Institute) PRINTS PRINTS is a compendium of protein fingerprints
(Manchester University) ProDom Comprehensive set of Protein Domain Families
(INRA/CNRS)