introduction to biological database
TRANSCRIPT
![Page 1: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/1.jpg)
1NTNU-SUN
Advanced Bioinformatics and Systems Biology 2008
Introduction to biological databaseIntroduction to biological database
Lecturer: Dr. Chih-Wen SunDept. of Life Sciences, NTNU
References:Molecular cell biology, 6th ed., Lodish et al. (2007)Various web resources
![Page 2: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/2.jpg)
2NTNU-SUN
Bioinformatics•Use or development of techniques (mathematics, informatics,
statistics, computer science, chemistry, biochemistry) to solvebiological problems
•Core principle: using computing tools and approaches toacquire, store, organize archive, analyze or visualizesequence/structure
•Major research efforts- Sequence alignment- Gene finding- Genome assembly- Protein structure alignment and prediction- Prediction of gene expression- Prediction of protein-protein interaction- Modeling of evolution
http://en.wikipedia.org/wiki/Bioinformatics
![Page 3: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/3.jpg)
3NTNU-SUN
Systems biology
•Quantitative and systematic study of complexinteraction in biological processes
•Biological systematics:- Study the diversity and relationship of lives on the
planet earth
http://en.wikipedia.org/wiki/systems_biology
![Page 4: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/4.jpg)
4NTNU-SUN
Strategies to determine the function,location, and structure of gene products
Molecular cell biology, 6th ed.Protein-protein interaction
Gene expression pattern
![Page 5: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/5.jpg)
5NTNU-SUN
Genomics: genome wide analysis ofgene structure and expression
![Page 6: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/6.jpg)
6NTNU-SUN
DNA sequencing by dideoxy method
Molecular cell biology, 6th ed.
![Page 7: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/7.jpg)
7NTNU-SUN
T C C A T G G A C CT C C A T G G A C
T C C A T G G A
T C C A T G G
T C C A T G
T C C A T
T C C A
T C C
T C
T
Electrophoresis gel
one of the manyfragments of DNAmigrating through the gel
CGCTTGACATCA
Detection of fluorescent signals
Molecular cell biology, 6th ed.
![Page 8: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/8.jpg)
8NTNU-SUN
•GenBank: National Center for Biotechnology Information(NCBI) server, National Institute of Health (NIH), Bethesda,Maryland, USA
•EMBL: European Bioinformatics Institute (EBI) server,European Molecular Biology laboratory, Heidelberg,Germany
•DDBJ: DNA Database of Japan, Mishima, Japan.
Three primary data banks
![Page 9: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/9.jpg)
9NTNU-SUN
Sequence comparison•BLAST program: basic local alignment search tool
- http://www.ncbi.nlm.nih.gov/BLAST/- BLAST algorithm divides the query sequence into shortersegments and then searches the database for significantmatches to any of the stored sequences
Paste or import the query sequence thatyou want to compare
![Page 10: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/10.jpg)
10NTNU-SUN
Motifs and domains•Motif: Short sequence segment on a protein that is functionally important•Domain: Region of a protein with a distinct tertiary structure and
characteristic activity•If a protein with no significant similarity to other proteins with the BLAST
algorithm, search for motif similarity might give clues
Molecular cell biology, 6th ed.
![Page 11: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/11.jpg)
11NTNU-SUN
Evolutionary relationship b/w genes•Protein family: related protein sequences•Gene family: corresponding genes of protein family•Gene homologs
- Orthologs- Paralogs
Phylogenic treeMolecular cell biology, 6th ed.
![Page 12: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/12.jpg)
12NTNU-SUN
Gene expression comparison•To monitoring the expression of few genes for organisms
during specific physiological responses or developmentalprocesses
•To monitoring the expression of thousands of genessimultaneously for organisms during specific physiologicalresponses or developmental processes
![Page 13: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/13.jpg)
13NTNU-SUN
DNA microarray•Probe sources:
•Fix ssDNA to glass slides or membranes
![Page 14: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/14.jpg)
14NTNU-SUN
DNA chip•Probe sources:
•Fix ssDNA to glass slides
www.carleton.ca/catalyst/2006s/hms7.html
![Page 15: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/15.jpg)
15NTNU-SUN
Microarray examples
Laser excitation
Cy5: ~650 nmCy3: ~550 nm
Image overlay
No changes
Flower genes
Leaves genesMolecular cell biology, 6th ed.
![Page 16: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/16.jpg)
16NTNU-SUN
Cluster analysis•Cluster analysis groups sets of genes which exhibit similar
expression changes or are co-regulated in a specific cellularprocess or pathway.
•This is very useful in analyzing microarray data
•Softwares:
Gene expression profile at time intervals over a 24h period after starved fibroblasts were providedwith serum: A) cholesterol biosynthesis, B) the cell cycle, C) the immediate-early response, D)signaling and angiogenesis, E) would healing and tissue remodeling
Molecular cell biology, 6th ed.
![Page 17: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/17.jpg)
17NTNU-SUN
Strategies to determine the function,location, and structure of gene products
Molecular cell biology, 6th ed.Protein-protein interaction
Gene expression pattern
![Page 18: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/18.jpg)
18NTNU-SUN
Proteomics: large-scale study ofprotein structures and functions
![Page 19: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/19.jpg)
19NTNU-SUN Molecular cell biology, 6th ed.
Protein localization
![Page 20: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/20.jpg)
20NTNU-SUN
Determination of protein location•Wet experiments
•Dry experiments- ExPASy (Expert Protein Analysis System) server: Swiss
institute of Bioinformatics (SIB)
![Page 21: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/21.jpg)
21NTNU-SUN http://au.expasy.org/
ExPASy server
![Page 22: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/22.jpg)
22NTNU-SUN http://au.expasy.org/tools/
![Page 23: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/23.jpg)
23NTNU-SUN http://au.expasy.org/tools/
![Page 24: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/24.jpg)
24NTNU-SUN http://psort.ims.u-tokyo.ac.jp/
PSORT server
![Page 25: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/25.jpg)
25NTNU-SUN
Determination of protein function•Wet experiments
•Dry experiments- ExPASy (or InterPro)
- BLAST- Pfam (protein family)
![Page 26: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/26.jpg)
26NTNU-SUN http://au.expasy.org/
ExPASy server
![Page 27: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/27.jpg)
27NTNU-SUNhttp://au.expasy.org/sprot/www.uniprot.org
Swiss-Prot server
![Page 28: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/28.jpg)
28NTNU-SUN http://au.expasy.org/prosite/
Prosite server
![Page 29: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/29.jpg)
29NTNU-SUN http://au.expasy.org/tools/
BLAST at ExPASy
![Page 30: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/30.jpg)
30NTNU-SUN
BLAST at NCBI
Paste or import the query sequence thatyou want to compare
http://www.ncbi.nlm.nih.gov/BLAST/
![Page 31: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/31.jpg)
31NTNU-SUN
Pfam server
•Pfam is a large collection of multiple sequence alignmentsand hidden Markov models covering many common proteindomains and families. For each family in Pfam you can:- Look at multiple alignments- View protein domain architectures- Examine species distribution- Follow links to other databases- View known protein structures
http://pfam.sanger.ac.uk/
![Page 32: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/32.jpg)
32NTNU-SUN
Determination of protein structure•Wet experiments
•Dry experiments- ExPASy
- PDB (protein data bank) server
![Page 33: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/33.jpg)
33NTNU-SUN http://au.expasy.org/tools/
![Page 34: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/34.jpg)
34NTNU-SUN http://www.rcsb.org/pdb/
PDB server
![Page 35: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/35.jpg)
35NTNU-SUN
Protein-protein interaction•Wet experiments
•Dry experiments- APID (Agile Protein Interaction DataAnalyzer) and APID2NET
(unified interactome graphic analyzer)- cons-PPISP (consensus neural-network Protein-Protein Interaction
Site Predictor)- InterPreTS (Interaction Prediction through Tertiary Structure)- InterProSurf (Prediction of functional sites in monomeric
protein surface)- PIP (Potential Interactions of Proteins)- PRISM (Protein interaction by structure matching)- SCOPPI (Structural Classification of Protein-Protein Interfaces)
http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction
![Page 36: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/36.jpg)
36NTNU-SUN
Genome projects of variouseukaryotic organisms
![Page 37: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/37.jpg)
37NTNU-SUN
Assembled genome database at NCBI
http://blast.ncbi.nlm.nih.gov/Blast.cgi
![Page 38: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/38.jpg)
38NTNU-SUN
Example of animal genome projects
2006(2001)
Human Genome ProjectConsortium and Celera Genomics
250003.2 GbHuman
Homosapiens
2003Washington Univ., Sanger Inst.and Cold Spring Harbor Lab.
19500104 MbNematode
Caenorhabditisbriggsae]
2006Honeybee Genome SequencingConsortium
101571.8 GbHoneybee
Apismellifera
2002]International Fugu GenomeConsortium
22000-29000
390 MbPufferfish
Takifugurubripes]
2002International Collaboration for theMouse Genome Sequencing
241742.5 GbMouse]Musmusculus
2000Celera, UC Berkeley, EuropeanDGP, Baylor College of Medicine
13600165 MbFruitfly
Drosophilamelanogaster
Complete year
OrganizationGenes#
Genomesize
TypeOrganism
http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes
![Page 39: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/39.jpg)
39NTNU-SUN
Examples of plant genome projects
2008US Dept. of Energy Office ofScience Joint Genome Inst.
39458500 MbBryophyte
Physcomitrellapatens
2007The French-Italian PublicConsortium for GrapevineGenome Characterization
30434490 MbGrapevine
Vitis vinifera
2006The International PoplarGenome Consortium
45555550 MbPoplarPopulustrichocarpa
2004Univ. of Tokyo, Rikkyo Univ.,Saitama Univ., KumamotoUniv.
533116.5 MbRedalga
Cyanidioschyzon merolae
2002Syngenta and MyriadGenetics
46022-55615
466 MbRiceOryza sativassp japonica
2000Arabidopsis Genome Initiative27235125 MbCressArabidopsisthaliana
Completeyear
OrganizationGenes#
Genomesize
TypeOrganism
http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes
![Page 40: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/40.jpg)
40NTNU-SUN
Organism-specific genomeresources
http://www.ncbi.nlm.nih.gov/Genomes/
![Page 41: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/41.jpg)
41NTNU-SUN
Organism-specific genomeresources
http://www.ncbi.nlm.nih.gov/projects/genome/guide/cat/http://www.ncbi.nlm.nih.gov/projects/genome/guide/dog/http://www.ncbi.nlm.nih.gov/projects/genome/guide/pig/
![Page 42: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/42.jpg)
42NTNU-SUN
http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=7227
Fly databases
http://flybase.bio.indiana.edu/
http://www.fruitfly.org/
![Page 43: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/43.jpg)
43NTNU-SUN
Examples of unigene identifier
•Am for honey bee•Bt for cow•Dm for fruitfly•Dr for zebrafish•Hs for human•Mm for mouse•Rn for mouse•Xl for frog
•At for Arabidopsis•Hv for barley•Os for rice•Ta for wheat•Zm for maize
Plants Animals
![Page 44: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/44.jpg)
44NTNU-SUN
General terms in GenBank•Accession number
- 1 letter + 5 digits (e.g., M12345)- 2 letters + 6 digits (e.g., AC123456)
•GenInfo identifier (GI)- 1 or more digits
•Protein ID- 3 letters + 5 digits (e.g., AAA35650)
•Version- M12345.1- M12345.2
![Page 45: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/45.jpg)
45NTNU-SUN
Refseq accession numbers
•NT_123456 constructed genomic contigs•NM_123456 mRNA•NP_123456 proteins•NC_123456 chromosomes•XM_123456 predicted mRNA•XP_123456 Predicted protein
![Page 46: Introduction to biological database](https://reader035.vdocuments.us/reader035/viewer/2022071600/613d28b5736caf36b75a0375/html5/thumbnails/46.jpg)
46NTNU-SUN
Exercises of biological databases