structure databases dna/protein structure-function analysis and prediction lecture 6 bioinformatics...
TRANSCRIPT
![Page 1: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/1.jpg)
Structure DatabasesStructure Databases
DNA/Protein structure-function DNA/Protein structure-function analysis and predictionanalysis and prediction
Lecture 6Lecture 6
Bioinformatics Bioinformatics SectionSection, Vrije Universiteit, Amsterdam, Vrije Universiteit, Amsterdam
![Page 2: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/2.jpg)
The dictionary definitionThe dictionary definition
Main Entry: Main Entry: da·ta·baseda·ta·base Pronunciation: 'dA-t&-"bAs, 'da- Pronunciation: 'dA-t&-"bAs, 'da- also also 'dä-'dä-Function: Function: nounnounDate: circa 1962Date: circa 1962
:: a usually large collection of data organized a usually large collection of data organized especially for rapid search and retrieval (as by especially for rapid search and retrieval (as by a computer) a computer)
- Webster dictionary- Webster dictionary
![Page 3: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/3.jpg)
WHAT is a database?WHAT is a database?A collection of data that needs to be:A collection of data that needs to be:
StructuredStructured SearchableSearchable Updated (periodically)Updated (periodically) Cross referencedCross referenced
Challenge:Challenge: To change “meaningless” data into useful information that can be To change “meaningless” data into useful information that can be
accessed and analysed the best way possible.accessed and analysed the best way possible.
For example: For example: HOW would YOU organise all biological sequences so that the HOW would YOU organise all biological sequences so that the biological information is optimally accessible?biological information is optimally accessible?
You need an appropriate database management system (DBMS)You need an appropriate database management system (DBMS)
![Page 4: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/4.jpg)
DBMSDBMS
Internal organizationInternal organization Controls speed and Controls speed and
flexibilityflexibility
A unity of programs that A unity of programs that StoreStore ExtractExtract ModifyModify
DatabaseDatabase
StoreStore ExtractExtract ModifyModify
USER(S)USER(S)
![Page 5: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/5.jpg)
DBMS organisation typesDBMS organisation types
Flat file databases (flat DBMS)Flat file databases (flat DBMS) Simple, restrictive, tableSimple, restrictive, table
Hierarchical databases (hierarchical DBMS)Hierarchical databases (hierarchical DBMS) Simple, restrictive, tablesSimple, restrictive, tables
Relational databases (RDBMS)Relational databases (RDBMS) Complex,versatile, tablesComplex,versatile, tables
Object-oriented databases (ODBMS)Object-oriented databases (ODBMS) Complex, versatile, objectsComplex, versatile, objects
![Page 6: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/6.jpg)
Relational databasesRelational databases
Data is stored in multiple Data is stored in multiple relatedrelated tables tables
Data relationships across tables can be Data relationships across tables can be either either many-to-onemany-to-one or or many-to-manymany-to-many
A few rules allow the database to be A few rules allow the database to be viewed in many waysviewed in many waysLets convert the “course details” to a Lets convert the “course details” to a relational databaserelational database
![Page 7: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/7.jpg)
Student 1 Chemistry Biology A B B A C …..Student 1 Chemistry Biology A B B A C …..
Student 2 Ecology Maths A D A A A …..Student 2 Ecology Maths A D A A A …..
..
..
..
..
Course detailsCourse detailsFLAT DATABASE 2FLAT DATABASE 2
Student 2 Ecology Biology A B A A A …..Student 2 Ecology Biology A B A A A …..
Student 1 Chemistry English A A A A A …..Student 1 Chemistry English A A A A A …..........
Name Depart. Course E1 E2 E3 P1 P2Name Depart. Course E1 E2 E3 P1 P2
Student 1 Chemistry Maths C C B A A …..Student 1 Chemistry Maths C C B A A …..
Our flat file databaseOur flat file database
![Page 8: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/8.jpg)
Normalize (1NF) …Normalize (1NF) …We remove repeating records (rows)We remove repeating records (rows)
sID Name dIDsID Name dID
1 Student1 11 Student1 1
2 Student2 22 Student2 2
cID Course cID Course
1 Biology1 Biology
2 Maths 2 Maths
3 English 3 English
dID Department dID Department
1 Chemistry1 Chemistry
2 Ecology 2 Ecology
1 1 A B B A C …..1 1 A B B A C …..
2 2 A D A A A …..2 2 A D A A A …..
..
..
..
..
2 1 A B A A A …..2 1 A B A A A …..
1 3 A A A A A …..1 3 A A A A A …..........
sID cID E1 E2 E3 P1 P2sID cID E1 E2 E3 P1 P2
1 2 C C B A A …..1 2 C C B A A …..
Primary keysPrimary keysForeign keysForeign keys
![Page 9: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/9.jpg)
sID Name dIDsID Name dID
1 Student1 11 Student1 1
2 Student2 22 Student2 2
cID Course cID Course
1 Biology1 Biology
2 Maths 2 Maths
3 English 3 English gID Grade gID Grade
1 A1 A
2 B 2 B
3 C 3 C
dID Department dID Department
1 Chemistry1 Chemistry
2 Ecology 2 Ecology
wID Project wID Project
1 E11 E1
2 E2 2 E2
3 E3 3 E3
4 P1 4 P1
5 P2 5 P2
sID cID gID wID sID cID gID wID
1 1 1 1 1 1 1 1 1 1 2 21 1 2 2
1 1 2 31 1 2 3
1 1 1 41 1 1 4
1 1 3 5 1 1 3 5
2 1 1 1 2 1 1 1 2 1 1 22 1 1 2
2 1 2 32 1 2 3
2 1 1 42 1 1 4
2 1 1 5 2 1 1 5
Normalize (2NF) …Normalize (2NF) …
We remove redundant fields (columns)We remove redundant fields (columns)
![Page 10: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/10.jpg)
Relational DatabasesRelational Databases
What have we achieved?What have we achieved? No repeating informationNo repeating information Less storage spaceLess storage space Better reality representationBetter reality representation Easy modification/managementEasy modification/management Easy usage of any combination of recordsEasy usage of any combination of records
RememberRemember the DBMS has programs to access and edit this the DBMS has programs to access and edit this information so ignore the human reading limitation of information so ignore the human reading limitation of the primary keysthe primary keys
![Page 11: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/11.jpg)
Accessing database informationAccessing database information
A request for data from a database is A request for data from a database is called a called a queryquery
Queries Queries can be of three forms:can be of three forms: Choose from a list of parametersChoose from a list of parameters Query by example (QBE)Query by example (QBE) Query languageQuery language
![Page 12: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/12.jpg)
Query LanguagesQuery Languages
The standard The standard SQL (Structured Query Language) originally SQL (Structured Query Language) originally
called SEQUEL (Structured English QUEry called SEQUEL (Structured English QUEry Language)Language)
Developed by IBM in 1974; introduced Developed by IBM in 1974; introduced commercially in 1979 by Oracle Corp.commercially in 1979 by Oracle Corp.
Standard interactive and programming Standard interactive and programming language for getting information from and language for getting information from and updating a database.updating a database.
RDMS (SQL), ODBMS (Java, C++, OQL etc)RDMS (SQL), ODBMS (Java, C++, OQL etc)
![Page 13: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/13.jpg)
Distributed databasesDistributed databases
From local to global attitudeFrom local to global attitudeData appears to be in one location but is most definitely Data appears to be in one location but is most definitely notnot
A definitionA definition: Two or more data files in different locations, : Two or more data files in different locations, periodically synchronized by the DBMS to keep data in periodically synchronized by the DBMS to keep data in all locations consistent (A,B,C)all locations consistent (A,B,C)
An intricate network for combining and sharing An intricate network for combining and sharing informationinformationAdministrators praise fast network technologies!!!Administrators praise fast network technologies!!!Users praise the internet!!!Users praise the internet!!!
![Page 14: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/14.jpg)
Data warehouseData warehouse
Periodically, one imports data from databases and store Periodically, one imports data from databases and store it (locally) in the data warehouse.it (locally) in the data warehouse.
Now a local database can be created, containing for Now a local database can be created, containing for instance instance protein family data (sequence, structure, protein family data (sequence, structure, function and pathway/process data integrated with the function and pathway/process data integrated with the gene expression and other experimental data).gene expression and other experimental data).
Disadvantage: expensive, intensive, needs to be Disadvantage: expensive, intensive, needs to be updated. updated.
Advantage: easy control of integrated data-mining Advantage: easy control of integrated data-mining pipeline. pipeline.
![Page 15: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/15.jpg)
So why do biologists care?So why do biologists care?
![Page 16: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/16.jpg)
Three main reasonsThree main reasons
Database proliferationDatabase proliferation Dozens to hundreds at the momentDozens to hundreds at the moment
More and more scientific discoveries result More and more scientific discoveries result from inter-database analysis and miningfrom inter-database analysis and mining
Rising complexity of required data-Rising complexity of required data-combinationscombinations E.g. translational medicine: “from bench to E.g. translational medicine: “from bench to
bedside” (genomic data vs. clinical data)bedside” (genomic data vs. clinical data)
![Page 17: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/17.jpg)
Biological databasesBiological databases
Like any other databaseLike any other database Data organization for optimal analysisData organization for optimal analysis
Data is of different typesData is of different types Raw data (DNA, RNA, protein sequences)Raw data (DNA, RNA, protein sequences) Curated data (DNA, RNA and protein Curated data (DNA, RNA and protein
annotated sequences and structures, annotated sequences and structures, expression data)expression data)
![Page 18: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/18.jpg)
Raw Biological dataRaw Biological dataNucleic Acids (DNA)Nucleic Acids (DNA)
![Page 19: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/19.jpg)
Raw Biological dataRaw Biological dataAmino acid residues (proteins)Amino acid residues (proteins)
![Page 20: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/20.jpg)
Curated Biological DataCurated Biological Data
DNA, nucleotide sequences
Gene boundaries, topologyGene boundaries, topology Gene structureGene structure
Introns, exons, ORFs, splicingIntrons, exons, ORFs, splicing
Expression dataExpression data Mass spectometry Mass spectometry
![Page 21: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/21.jpg)
Mass spectometry Mass spectometry (metabolomics, proteomics)(metabolomics, proteomics)
Post-Translational proteinPost-Translational proteinModification (PTM)Modification (PTM)
Curated Biological DataCurated Biological DataProteins, residue sequences
MCTUYTCUYFSTYRCCTYFSCDExtended sequence information Extended sequence information
Secondary structureSecondary structure
Hydrophobicity, motif dataHydrophobicity, motif data
Protein-protein interactionProtein-protein interaction
![Page 22: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/22.jpg)
Curated Biological dataCurated Biological data3D Structures, folds3D Structures, folds
![Page 23: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/23.jpg)
Biological DatabasesBiological Databases
The 2003 NAR Database Issue: http://nar.oupjournals.org/content/vol31/issue1/
![Page 24: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/24.jpg)
Distributed informationDistributed information
Pearson’s Law:Pearson’s Law: The usefulness of a column of The usefulness of a column of data varies as the square of the number of data varies as the square of the number of columns it is compared to.columns it is compared to.
![Page 25: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/25.jpg)
A few biological databasesA few biological databasesNucleotide DatabasesNucleotide DatabasesAlternative Splicing, EMBL-Bank, Ensembl, Genomes Server, Genome, Alternative Splicing, EMBL-Bank, Ensembl, Genomes Server, Genome, MOT, EMBL-Align, Simple Queries, dbSTS Queries, Parasites, Mutations, MOT, EMBL-Align, Simple Queries, dbSTS Queries, Parasites, Mutations, IMGTIMGTGenome DatabasesGenome DatabasesHuman, Mouse, Yeast, C.elegans, FLYBASE, ParasitesHuman, Mouse, Yeast, C.elegans, FLYBASE, ParasitesProtein DatabasesProtein Databases Swiss-Prot, TrEMBL, InterPro, CluSTr, IPI, GOA, GO, Proteome Analysis, Swiss-Prot, TrEMBL, InterPro, CluSTr, IPI, GOA, GO, Proteome Analysis, HPI, IntEnz, TrEMBLnew, SP_ML, NEWT, PANDITHPI, IntEnz, TrEMBLnew, SP_ML, NEWT, PANDITStructure DatabasesStructure Databases PDB, MSD, FSSP, DALIPDB, MSD, FSSP, DALIMicroarray DatabaseMicroarray Database ArrayExpressArrayExpressLiterature DatabasesLiterature Databases MEDLINE, Software Biocatalog, Flybase ArchivesMEDLINE, Software Biocatalog, Flybase ArchivesAlignment DatabasesAlignment DatabasesBAliBASE, Homstrad, FSSPBAliBASE, Homstrad, FSSP
![Page 26: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/26.jpg)
Structural DatabasesStructural Databases
Protein Data Bank (PDB) Protein Data Bank (PDB) http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/
Structural Classification of Proteins Structural Classification of Proteins (SCOP)(SCOP)
http://scop.berkeley.eduhttp://scop.berkeley.edu
http://scop.mrc-lmb.cam.ac.uk/scop/http://scop.mrc-lmb.cam.ac.uk/scop/
![Page 27: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/27.jpg)
3D Macromolecular structural data3D Macromolecular structural data
Data originates from NMR or X-ray Data originates from NMR or X-ray crystallography techniquescrystallography techniques
Total nTotal noo of structures of structures 34.626 34.626 (17/01/2006)(17/01/2006)
If the 3D structure of a protein is solved ... If the 3D structure of a protein is solved ... they have itthey have it
PDBPDB
![Page 28: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/28.jpg)
PDB contentPDB content
![Page 29: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/29.jpg)
PDB informationPDB information
The PDB files have a standard format The PDB files have a standard format
Key featuresKey features
Informative descriptorsInformative descriptors
![Page 30: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/30.jpg)
PDB-mirror on the WWW …PDB-mirror on the WWW …
e.g.1AE5
![Page 31: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/31.jpg)
Example output: 1AE5Example output: 1AE5
![Page 32: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/32.jpg)
SCOPSCOP
SStructural tructural CClassification lassification OOf f PProteinsroteins3D Macromolecular structural data grouped 3D Macromolecular structural data grouped based on structural classification based on structural classification
Data originates from the PDBData originates from the PDBCurrent version (v1.69)Current version (v1.69)25973 PDB Entries (July 2005).25973 PDB Entries (July 2005).70859 Domains 70859 Domains
![Page 33: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/33.jpg)
SCOP levelsSCOP levels bottom-up bottom-up1.Family: Clear evolutionarily relationshipProteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absence of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%.
2.Superfamily: Probable common evolutionary originProteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily.
3.Fold: Major structural similarityProteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favouring certain packing arrangements and chain topologies.
![Page 34: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/34.jpg)
SCOP-mirror on the WWW …SCOP-mirror on the WWW …
![Page 35: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/35.jpg)
Enter SCOP at the top of the hierarchy
![Page 36: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/36.jpg)
Keyword search of SCOP entries
![Page 37: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/37.jpg)
CATHCATHCClasslass, derived from secondary structure content, is , derived from secondary structure content, is assigned for more than 90% of protein structures assigned for more than 90% of protein structures automatically. automatically. AArchitecturerchitecture, which describes the gross orientation of , which describes the gross orientation of secondary structures, independent of connectivities, is secondary structures, independent of connectivities, is currently assigned manually. currently assigned manually. TTopologyopology level clusters structures according to their level clusters structures according to their toplogical connections and numbers of secondary toplogical connections and numbers of secondary structures. structures. The The HHomologous superfamiliesomologous superfamilies cluster proteins with cluster proteins with highly similar structures and functions. The assignments highly similar structures and functions. The assignments of structures to topology families and homologous of structures to topology families and homologous superfamilies are made by sequence and structure superfamilies are made by sequence and structure comparisons.comparisons.
![Page 38: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/38.jpg)
CATH-mirror on the WWW …CATH-mirror on the WWW …
![Page 39: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/39.jpg)
DSSPDSSP
Dictionary of secondary structure of proteinsDictionary of secondary structure of proteins
The DSSP database comprises the secondary The DSSP database comprises the secondary structures of all PDB entriesstructures of all PDB entries
DSSP is actually software that translates the DSSP is actually software that translates the PDB structural co-ordinates into secondary PDB structural co-ordinates into secondary (standardized) structure elements(standardized) structure elements
A similar example is STRIDEA similar example is STRIDE
![Page 40: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/40.jpg)
WHY bother???WHY bother???
Researchers create and use the dataResearchers create and use the data
Use of known information for analyzing Use of known information for analyzing new datanew data
New data needs to be screenedNew data needs to be screened
Structural/Functional informationStructural/Functional information
Extends the knowledge and information on Extends the knowledge and information on a higher level than DNA or protein a higher level than DNA or protein sequencessequences
![Page 41: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/41.jpg)
In the end ….In the end ….
Computers can figure out all kinds of problems, except the things in the
world that just don't add up. James Magary
We should add:For that we employ the human brain,
experts and experience.
![Page 42: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/42.jpg)
Bio-databases: A short word on Bio-databases: A short word on problemsproblems
Even today we face some key limitationsEven today we face some key limitations There is no standard formatThere is no standard format
Every database or program has its own formatEvery database or program has its own format There is no standard nomenclatureThere is no standard nomenclature
Every database has its own namesEvery database has its own names Data is not fully optimizedData is not fully optimized
Some datasets have missing information without indications Some datasets have missing information without indications of itof it
Data errorsData errorsData is sometimes of poor quality, erroneous, misspelledData is sometimes of poor quality, erroneous, misspelled
Error propagation resulting from computer annotationError propagation resulting from computer annotation
![Page 43: Structure Databases DNA/Protein structure-function analysis and prediction Lecture 6 Bioinformatics Section, Vrije Universiteit, Amsterdam](https://reader030.vdocuments.us/reader030/viewer/2022032517/56649c7d5503460f94932dc4/html5/thumbnails/43.jpg)
What to take homeWhat to take home
Databases are a collection of dataDatabases are a collection of data Need to access and maintain easily and flexiblyNeed to access and maintain easily and flexibly
Biological information is vast and sometimes Biological information is vast and sometimes very redundantvery redundantDistributed databases bring it all together with Distributed databases bring it all together with quality controls, cross-referencing and quality controls, cross-referencing and standardizationstandardizationComputers can only create data, they do not Computers can only create data, they do not give answersgive answersReview-suggestion: “Integrating biological Review-suggestion: “Integrating biological databases”, Stein, Nature 2003databases”, Stein, Nature 2003