biological information and biological databases meena k sakharkar bioinformatics centre national...
TRANSCRIPT
![Page 1: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/1.jpg)
Biological Information and Biological Databases
Meena K Sakharkar
Bioinformatics Centre
National University of Singapore
![Page 2: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/2.jpg)
Biological Information
![Page 3: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/3.jpg)
Nature of Life Science Information
• Descriptive
• Classification and Nomenclatural
• Observational and Phenomenological
• Experimental
• Deduced/Computed
• Simulated?
• Theoretical?
![Page 4: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/4.jpg)
Descriptive
![Page 5: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/5.jpg)
Classify and Give Names
• Classification and Nomenclature
• Linnaeus - binomial nomenclature
• Group into kingdoms, phyla, classes, orders, families, genera, species, subspecies, strains, etc
• Associate descriptions to these classification schema, and classify according to description etc
![Page 6: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/6.jpg)
Observational/Phenomenological
• Like descriptive, yet more active
• Observe a lot of biological phenomenon
• Charles Darwin
• Gregor Mendel to McClintock
• Start to do some experiments
![Page 7: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/7.jpg)
Experimental
• From dissections to complex genetic engineering experiments
![Page 8: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/8.jpg)
BioInformatics
• Deduced/Computed
• Simulated?
• Theoretical?
![Page 9: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/9.jpg)
What is BioInformatics?
• Many related terms and buzzwords • A multiplicity of names:
– bioinformatics
– biocomputing
– biological computing
– computational biology
– computational genomics
– biological data mining
![Page 10: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/10.jpg)
Overview of the challenges of Molecular Biology
Computing
• The huge dataset problem – automated DNA sequencers – the Human Genome Project – bulk sequencing of cDNAs (ESTs)
![Page 11: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/11.jpg)
Human Genome Project
• What is the Human Genome Project? – 15-year effort formally begun in October 1990. coordinated by the
U.S. Department of Energy and the National Institutes of Health.
– identify all the estimated 80,000 genes in human DNA, – determine the sequences of the 3 billion chemical bases that make
up human DNA,
– store this information in databases,
– develop tools for data analysis, and
– address the ethical, legal, and social issues (ELSI) that may arise from the project.
![Page 12: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/12.jpg)
• Who is head of the U.S. Human Genome Project? – The DOE Human Genome Program is directed by Ari Patrinos,
and Francis Collins directs the NIH Human Genome Program.
– Ari Patrinos also heads the Department of Energy Office of Biological and Environmental Research.
![Page 13: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/13.jpg)
• What are the comparative genome sizes of humans and other organisms being studied?
If compiled in books, the data would fill an estimated 200 volumes the size of a Manhattan telephone book (at 1000 pages each), and reading it would require 26 years working around the clock
![Page 14: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/14.jpg)
![Page 15: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/15.jpg)
Informatics: Data Collection and Interpretation
HUMAN GENETIC DIVERSITY
• The Ultimate Human Genetic Database
• Any two individuals differ in about 3 x 106 bases (0.1%).
• The population is now about 5 x 109.
• A catalog of all sequence differences would require 15 x 1015 entries.
• This catalog may be needed to find the rarest or most complex disease genes.
![Page 16: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/16.jpg)
Databases
![Page 17: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/17.jpg)
Basic Terminology
What is a nucleotide/protein sequence database and
databank?
• Database is a collection of Nucleotide/protein sequence and their Associated annotations.
• Databanks
Groups which collect, compile, maintain and distribute the database.
![Page 18: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/18.jpg)
Fundamental
Dogma
![Page 19: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/19.jpg)
Work from the Code of Life
![Page 20: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/20.jpg)
![Page 21: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/21.jpg)
Deduced and Computed Information in the Era of Computational Biology
![Page 22: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/22.jpg)
![Page 23: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/23.jpg)
Databases
• What are the different kinds of databases and their formats?
Nucleic Acid Sequence EMBL at EBI. GENBANK at NCBI. DDBJ at Japan.
Protein Sequence SWISS PROT NBRF(PIR)
![Page 24: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/24.jpg)
Database
• Protein structure databases
PDB
• Information on the structural data for the proteins/nucleic acids.
• whose 3-D structure solved by X-ray crystallography/NMR
• PDB database
NRL 3D Database
• NRL_3D is a sequence-structure database.
• Can be used in conjunction with PIR.
• PDB with PIR.
![Page 25: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/25.jpg)
GenBank Entry
![Page 26: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/26.jpg)
EMBL Entry
![Page 27: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/27.jpg)
SwissProt Entry
![Page 28: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/28.jpg)
Other databases
• Genome Databases– GDB :Genome Data Bank– OMIM
• Pattern Databases– Prosite– TFD
![Page 29: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/29.jpg)
Usage of databases• Annotation Searches - KW, Authors, Features.
– What is the protein sequence for human insulin?
– How does the 3D structure of calmodulin look like?
– What is the genetic location of cystic fibrosis gene?
– List all introns in rat?
• Homology Searches– Is there any protein sequence that is similar to mine?
– Is this gene known in any other species?
– Has someone already cloned this sequence?
![Page 30: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/30.jpg)
Usage of databases• Pattern searches
– Does my sequence contain any known motif (that can give me a clue about the function)?
– Which known sequences contain this motif?
– Is any part of my sequence recoganised by a transcription factor?
– List all known start, splice and stop signals in my genomic sequence
• Prediction - Use the database as knowledge database– What may the structure of my protein be?
• Secondary structure prediction
• Modeling by homology
– What is the gene structure of my genomic sequence?
– Which parts of my protein have a high antigenicity?
![Page 31: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/31.jpg)
Usage of Databases
• Comparisons:– Gene Families– Phylogenetic Trees
![Page 32: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/32.jpg)
GenBank Growth Chart
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1400000000
1600000000
Dec-82
Sep-84
May-85
May-86
Feb-87
Sep-87
Jun-88
Dec-88
Sep-89
Jun-90
Mar-91
Dec-91
Sep-92
Apr-93
Oct-93
Apr-94
Oct-94
Apr-95
Oct-95
Apr-96
Oct-96
Apr-97
Oct-97
Apr-98
Year
Bases
![Page 33: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/33.jpg)
Evolutionary basis of Alignment
• Enable the researcher to determine if two sequences display sufficient similarity to justify the inference of homology.
• Similarity is an observable quantity that may be expressed as say %identity or some other measure.
• Homology is a conclusion drawn from this data that the two genes share a common evolutionary history.
![Page 34: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/34.jpg)
Sequence Formats
![Page 35: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/35.jpg)
Fasta Format
>SANJAY REFORMAT of: SANJAY.seq check: 8826 from: 1 to: 573 March 12, 1998
MASSSVPPMITEEEARFEAEVSAVESWWRTDRFRLTRRPYSARDVVSLRGTLHHSYASDQ
MAKKLWRTLKSHQSAGTASRTFGALDPVQVTMMAKHLDTIYVSGWQCSSTHTATNEPGPD
LADYPYNTVPNKVEHLFFAQLYHDRKQHEARVSMTREQRAKTPYVDYLRPIIADGDTGFG
GATATVKLCKLFVERGAAGVHIEDQSSVTKKCGHMAGKVLVAVSEHINRLVAARLQFDVM
GVETVLVARTDAVAATLIQSNVDLRDHQFILGATNPDFKRRSLAAVLSAAMAAGKTGAVL
QAIEDDWLSRAGLMTFSDAVINGINRQLPEYEKQRRLNEWAAATEYSKCVSNEQGREIAE
RLGAGEIFWDWDIARTREGFYRFRGSVEAAVVRGRAFAPHADLIWMETSSPDLVECGKFA
QGMKASHPEIMLAYNLSPSFNWDAAGMTDEEMRDFIPRIAKMGFCWQFITLGGFHADALV
TDTFAREFAKQGMLAYVERIQREERNNGVDTLAHQKWSGANYYDRYLKTVQGGISSTAAM
GKGVTEEQFKEESRTGTRGLDRGGITVNAKSRL
![Page 36: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/36.jpg)
GCG Format
ckl.seq Length: 473 September 15, 1999 12:25 Type: P Check: 8103 .. 1 MSTKYSASAE SASSYRRTFG SGLGSSIFAG HGSSGSSGSS RLTSRVYEVT
51 KSSASPHFSS HRASGSFGGG SVVRSYAGLG EKLDFNLADA INQDFLNTRT
101 NEKAELQHLN DRFASYIEKV RFLEQQNSAL TVEIERLRGR EPTRIAELYE
151 EEMRELRGQV EALTNQRSRV EIERDNLVDD LQKLKLRLQE EIHQKEEAEN
201 NLSAFRADVD AATLARLDLE RRIEGLHEEI AFLRKIHEEE IRELQNQMQE
251 SQVQIQMDMS KPDLTAALRD IRLQYEAIAA KNISEAEDWY KSKVSDLNQA
301 VNKNNEALRE AKQETMQFRH QLQSYTCEID SLKGTNESLR RQMSEDGGAA
351 GREAGGYQDT IARLEAEIAK MKDEMARHLR EYQDLLNVKM ALDVEIATYR
401 KLLEGEESRI SLPVQSFSSL SFRESSPEQH HHQQQQPQRS SEVHSKKTVL
451 IKTIETRDGE VVSESTQHQQ DVM
![Page 37: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/37.jpg)
Taxonomy Database
![Page 38: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/38.jpg)
![Page 39: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/39.jpg)
![Page 40: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/40.jpg)
![Page 41: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/41.jpg)
![Page 42: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/42.jpg)
![Page 43: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/43.jpg)
![Page 44: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/44.jpg)
![Page 45: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/45.jpg)
![Page 46: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/46.jpg)
![Page 47: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/47.jpg)
![Page 48: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/48.jpg)
![Page 49: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/49.jpg)
![Page 50: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/50.jpg)
![Page 51: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/51.jpg)
![Page 52: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/52.jpg)
![Page 53: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/53.jpg)
Blast Results
![Page 54: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/54.jpg)
Examples of the New Biology
1. Full genome-genome comparisons
2. Rapid assessment of polymorphic genetic variations
3. Complete construction of orthologous or paralogous groups of genes
4. Structure determination of large macromolecular assemblies/complexes
5. Dynamically simulation of realistic oligomeric systems
6. Rapid structural/topological clustering of proteins
7. Prediction of unknown molecular structures; Protein folding
8. Computer simulation of membrane structure and dynamic function
9. Simulation of genetic networks and the sensitivity of these pathways to component stoichiometry and kinetics
10.Integration of observations across scales of vastly different dimensions and organization to yield realistic environmental models for basic biology and societal needs
![Page 55: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/55.jpg)
Theoretical?• The day will dawn when we
will have sufficient information to understand how basic life functions are integrated into a living cell, and how such cells intercommunicate and interoperate to function as a living whole. Then maybe, we can start talking about theoretical biology
![Page 56: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/56.jpg)
Categories of BioDbs - by domain of information
• DNA• RNA• Protein• Genomic Mapping• Pathways• Structure• Bibliographic• Biochemical/Molecular/Miscellaneous
![Page 57: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/57.jpg)
Other categories
• By category of species
• By families or superfamilies of molecules
etc
• Demo
http://www.infobiogen.fr/services/dbcat/
![Page 58: Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore](https://reader035.vdocuments.us/reader035/viewer/2022062806/5697c02d1a28abf838cd973b/html5/thumbnails/58.jpg)
Demonstration of BioDatabases
• Majority of Life Science databases are online, accessible with Web via Internet
• Catalogs of databases available
• Need for a Registry to keep track and offer quality control