2/16/2018 biological databases lecture 9 · biological databases lecture 9 2/16/2018 instructor:...
TRANSCRIPT
![Page 2: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/2.jpg)
Class Objectives
● Why are databases the backbone of bioinformatics ?● The basic structure of a database● Data storage versus annotation- Refseq Database● Types of DBs: Genbank, PubMed, and NCBI● Query strategies● Quality of data issues
2
![Page 3: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/3.jpg)
Biologists Collect Lots of Data
● Hundreds of thousand of species● Million of articles in scientific literature● Genetic Information
○ Gene names (thousands)○ Phenotype of mutants ○ Location of genes/mutations on chromosomes○ Linkage (distances between genes)
3
![Page 4: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/4.jpg)
What is a Database ?
● A collection data that needs to be :○ Structured○ Searchable○ Updated (periodically)○ Cross referenced
● Challenge: ○ To change “meaningless” data into useful
information that can be accessed and analysed the best way possible.
● For example: ○ How would you organise all biological
sequences so that the biological information is optimally accessible?
4
![Page 5: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/5.jpg)
A spreadsheet can be a Database
● Columns are Fields● Rows are Records● Can search for a term
within just one field● Or combine searched
across several fields.
5
![Page 6: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/6.jpg)
Database Organisation
● Internal Organisation○ Controls speed and flexibility.
● A unit of programs that ○ Store○ Extract○ Modify
● Flat file databases (flat DBMS)○ Simple, restrictive, table
● Hierarchical databases○ Simple, restrictive, tables
● Relational databases (RDBMS)○ Complex, versatile, tables
● Object-oriented databases (ODBMS)● Data warehouses and distributed databases
6
![Page 7: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/7.jpg)
Where do the data come from ?
7
![Page 8: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/8.jpg)
Types of Data
● Sequence or Structure● Nucleic acid or protein● Important biological information such as about genes and their metabolic
pathways, mutations, diseases, drugs, images etc.
8
![Page 9: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/9.jpg)
9
Biological Database Architecture
![Page 10: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/10.jpg)
Types of Database
● Primary Databases:○ Original submissions by
experimentalists○ Content controlled by the submitter○ Examples: GenBank, Trace, SRA,
SNP, GEO● Secondary databases:
○ Results of analysis of primary databases
○ Aggregate of many databases○ Content controlled by third party
(NCBI)○ Examples: NCBI Protein, Refseq,
TPA, RefSNP, GEO datasets, UniGene, Homologene, Structure, Conserved Domain
10
![Page 11: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/11.jpg)
11
![Page 12: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/12.jpg)
International Sequence Database Collaboration
12
International Sequence Database Collaboration: http://www.insdc.org/National Centre for Biotechnology Information (NCBI) : https://www.ncbi.nlm.nih.gov/European Nucleotide Archive (ENA) : https://www.ebi.ac.uk/ena DNA Data Bank of Japan (DDBJ) : http://www.ddbj.nig.ac.jp/
![Page 13: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/13.jpg)
Data sharing collaboration
13
● Ensure data consistency
● Avoid duplication● Open data
sharing
![Page 14: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/14.jpg)
Biological Databases I:Biomedical Literature
14
![Page 15: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/15.jpg)
Biological Database I : Biomedical Literature Database
● Medline:https://www.nlm.nih.gov/bsd/pmresources.html ○ NLM journal citation database.○ Includes citations 5,600
scholarly journals published around the world.
● PubMed https://www.ncbi.nlm.nih.gov/pubmed/○ ~28 million citations mainly
from:■ MEDLINE indexed journals■ journals/manuscripts
deposited in PMC■ NCBI Bookshelf
15
![Page 16: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/16.jpg)
Pubmed query builder using MeSH terms
● MeSH (Medical Subject Headings) is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed.○ the U.S. National Library of Medicine's controlled vocabulary (thesaurus).○ arranged in a hierarchical manner called the MeSH Tree Structures.○ updated annually
16
![Page 17: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/17.jpg)
PubMed search demo
17
![Page 18: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/18.jpg)
Hands On Exercise I
● Find all article related to PTEN gene on pubmed. ○ How many articles did you find ?
● Modify your search to find entries in Pubmed for PTEN related work from authored by Hui Liang○ How many articles did you find?
● Restrict your search and find PTEN related articles by author Hui Liang in Cell Metabolism Journal.○ What is the full title of the article?○ Which year it was published in ?
● Reflection question: What are some advantages of using MeSH term builder?
More tutorials on building Pubmed queries for efficient search : https://www.nlm.nih.gov/bsd/disted/pubmedtutorial/cover.html
18
![Page 19: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/19.jpg)
Biological Databases II:Genomics and Transcriptomics
19
![Page 20: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/20.jpg)
Biological Database II- Genomics and Transcriptomics
● GenBank: https://www.ncbi.nlm.nih.gov/genbank/
○ Flat file
○ Nucleotide only sequence database
○ Archival in nature: Historical, Redundant
○ Data: Direct submissions (traditional records), Batch submissions, FTP
accounts (genome data)
○ Sample GenBank record (accession number U49845)
■ NCBI:
https://www.ncbi.nlm.nih.gov/genbank/samplerecord/#OtherFeaturesB
■ ENA: https://www.ebi.ac.uk/ena/data/view/U49845
■ DDBJ: http://getentry.ddbj.nig.ac.jp/top-e.html
20
![Page 21: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/21.jpg)
GenBank Flat File
21
![Page 22: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/22.jpg)
Ensembl● Contains all the vertebrate genome DNA sequences currently available in the public domain. ● Automated annotation: by using different software tools, features are identified in the DNA
sequences:○ Genes (known or predicted)○ Single nucleotide polymorphisms (SNPs)○ Repeats○ Homologies
● Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
● www.ensembl.org
22
![Page 23: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/23.jpg)
Nucleic Acid Structure Database
● NDB Nucleic acid-containing structures http://ndbserver.rutgers.edu/
● NTDB Thermodynamic data for nucleic acids http://ntdb.chem.cuhk.edu.hk/
● RNABase RNA-containing structures from PDB and NDB
http://www.rnabase.org/
● SCOR Structural classification of RNA: RNA motifs by structure, function
and tertiary interactions http://scor.lbl.gov/
23
![Page 24: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/24.jpg)
Biological Databases III:Proteomics
24
![Page 25: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/25.jpg)
Biological Database III- Proteomics
● Protein sequence database: https://www.ncbi.nlm.nih.gov/protein/
25
![Page 26: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/26.jpg)
Genpept
26
![Page 27: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/27.jpg)
Uniprot
● The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.
● UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
● the entry belongs to the Swiss-Prot section of UniProtKB (reviewed) or to the computer-annotated TrEMBL section (unreviewed).
● http://www.uniprot.org/
27
![Page 28: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/28.jpg)
Protein Structure database- PDB
● Protein Data Bank (PDB) http://www.rcsb.org/ ● Archive-information about the 3D shapes of proteins, nucleic acids, and complex
assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
28
![Page 29: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/29.jpg)
Protein Family Database
● http://pfam.xfam.org/family/piwi ● Pfam is a database of protein families that includes their annotations and multiple
sequence alignments generated using hidden Markov models
29
![Page 30: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/30.jpg)
Protein-Protein Interaction Database
● STRING: https://string-db.org/ (Search Tool
for the Retrieval of Interacting
Genes/Proteins) is a biological database and
web resource of known and predicted
protein–protein interactions.
● Information from numerous sources, including
experimental data, computational prediction
methods and public text collections
○ Nodes: Network nodes represent proteins
○ Edges: Edges represent protein-protein
associations
30
![Page 31: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/31.jpg)
Hands-on Exercise II
● Search Genbank or ensembl for human PTEN gene.
○ What chromosome is this gene located on?
○ Is it a protein coding gene ?
○ How many transcripts this gene have?
○ How many transcripts are functional ?
○ Does this gene has an alternative splicing events
● What protein does PTEN gene code for?
○ How many of those protein entries are reviewed?
● Number of protein-protein interactions for PTEN gene in humans?
● Are there any records of Post Translational Modification (PTM) ?
31
![Page 32: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/32.jpg)
Data vs Annotation Database● RefSeq provide a scientist-curated nonredundant set of biological sequences. (Derivative)
https://www.ncbi.nlm.nih.gov/refseq/ ○ Source: Genbank (INSDC)○ Annotated: Community collaboration, automated computer, NCBI staff curation
● Advantages of using RefSeq○ Non-redundancy ○ Updates to reflect current sequence data and biology○ Data validation○ Format consistency○ Distinct accession series
32
![Page 33: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/33.jpg)
Selected Refseq Accession
33
![Page 34: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/34.jpg)
High-Throughput Sequencing Database
● Gene Expression Omnibus (GEO) archives and freely distributes high throughput
gene expression data submitted by the scientific community.
● NCBI Sequence Read Archive (SRA) archives raw sequencing data and alignment
information from high-throughput sequencing platforms. SRA experiment includes
sequence data and metadata regarding how a biological sample was sequenced. Example
dataset : https://www.ebi.ac.uk/ena/data/view/SRR494099
● database of Genotype and Phenotype(dbGAP): public repository for individual-level
phenotype, exposure, genotype, and sequence data, and the associations between them.
https://www.ncbi.nlm.nih.gov/gap
● European Genome Phenome Archive: repository for a sequence and genotype
experiments, case-control, population, and family studies.
https://www.ebi.ac.uk/ega/about
34
![Page 35: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/35.jpg)
Other Specialised Databases
● UCSC Xena: https://xenabrowser.net/datapages/● Genotype-Tissue Expression Gtex: https://www.gtexportal.org/home/ Correlations
between genotype and tissue-specific gene expression levels will help identify regions of the genome that influence whether and how much a gene is expressed.
● mirBase:http://www.mirbase.org/ ○ Database of published miRNA sequences and annotation. ○ Each entry represents a predicted hairpin portion of a miRNA transcript (termed
mir in the database), with information on the location and sequence of the mature miRNA sequence (termed miR).
● Pubchem: https://pubchem.ncbi.nlm.nih.gov/ chemical information with structures, information and links
● DrugBank: https://www.drugbank.ca/ combines detailed drug data with comprehensive drug target information.
AND Many MORE !!!!!
35
![Page 36: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/36.jpg)
Database Retrieval
● Problem with Traditional link method○ Rapidly growing databases with complex and changing relationships○ Rapidly changing interfaces to match the above○ Many people don’t know:
■ Where to begin■ Where to click on a Web page■ Why it might be useful to click there
● Entrez GQuery is a retrieval system for searching several linked databases such as: Pubmed, GenBank etc. https://www.ncbi.nlm.nih.gov/gquery/
36
![Page 37: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/37.jpg)
Blast types
● BLASTN○ The query is a nucleotide sequence○ The database is a nucleotide database○ No conversion is done on the query or
database● DNA :: DNA homology
○ Mapping oligos to a genome○ Annotating genomic DNA with
transcriptome from ESTs and RNA-Seq○ Annotating untranslated regions
BLASTX○ The query is a nucleotide sequence○ The database is an amino acid database○ All six reading frames are translated on
the query and used to search the database
● Coding nucleotide seq :: Protein homology○ Gene finding in genomic DNA○ Annotating ESTs and transcripts
assembled from RNA-Seq data
37
● BLASTP○ The query is an amino acid sequence○ The database is an amino acid
database○ No conversion is done on the query
or database● Protein :: Protein homology
○ Protein function exploration○ Novel gene make parameters more
sensitive
● TBLASTN○ The query is an amino sequence○ The database is a nucleotide database○ All six frames are translated in the
database and searched with the protein sequence
● Protein :: Coding nucleotide DB homology○ Mapping a protein to a genome○ Mining ESTs and RNA-Seq data for
protein similarities
![Page 38: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/38.jpg)
BLAST
● BLAST stands for Basic Local Alignment Search Tool○ Good balance of sensitivity and speed○ Reliable○ Flexible
● Produce local alignments: short significant stretches of similarity, irrespective of where they are in the sequence
● Blast applies heuristic approach, it does not necessarily find the best hit for your search.
38
![Page 39: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/39.jpg)
BLAST Output
● List of sequences with scores○ Raw score○ Higher is better○ Depends on aligned length
● Expect Value (E-value)○ Smaller is better○ Independent of length and database size
● The Expect value (E) is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.
● Where can I BLAST ?○ NCBI BLAST web service : https://blast.ncbi.nlm.nih.gov/Blast.cgi○ EBI BLAST web service : https://www.ebi.ac.uk/Tools/sss/ncbiblast/○ FlyBase BLAST : http://flybase.org/blast/○ Drosophila and other insects
39
![Page 40: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/40.jpg)
40
Hands on Exercise III
This fragment of genomic DNA belong to a part of gene.
>query 1CTAAACTACCAAGGCCATCTCTACTTAAAAACAGTTGTCTTTTGTTTGTGATTTCAGGGGCCCTGGGTATAAGCGAAGTCCCTGTTTAGAGACCTTGTGATGGGTTCAAAATATCAAGAAAGATAGCAAAATATCACAAGCCTCCTGACCCGAGAAGATTAGCGTTGAAAGGGTCTGTCGTGTTTGTTTGGGCCTGGGGCTAAATTCCCAGCCCAAGTGCTGAGGCTGATAATAATCGGGGCGGCGATCAGACAGCCCCGGTGTGGGAAATCGTCCGCCCGGTCTCCCTAAGTCCCCGAAGTCGCCTCCCACTTTTGGTGACTGCTTGTTTATTTACATGCAGTCAATGATAGTAAATGGATGCGCGCCAGTATAGGCCGACCCTGAGGGTGGCGGGGTGCTCTTCGCAGCTTCTCTGTGGAGACCGGTCAGCGGGGCGGCGTGGCCGCTCGCGGCGTCTCCCTGGTGGCATCCGCACAGCCCGCCGCGGTCCGGTCCCGCTCCGGGTCAGAATTGGCGGCTGCGGGGACAGCCTTGCGGCTAGGCAGGGGGCGGGCCGCCGCGTGGGTCCGGCAGTCCCTCCTCCCGCCAAGGCGCCGCCCAGACCCGCTCTCCAGCCGGCCCGGCTCGCCACCCTAGACCGCCCCAGCCACCCCTTCCTCCGCCGGCCCGGCCCCCGCTCCTCCCCCGCCGGCCCGGCCCGGCCCCCTCCTTCTCCCCGCCGGCGCTCGCTGCCTCCCCCTCTTCCCTCTTCCCACACCGCCCTCAGCCGCTCCCTCTCGTACGCCCGTCTGAAGAAGAATCGAGCGCGGAACGCATCGATAGCTCTGCCCTCTGCGGCCGCCCGGCCCCGAACTCATCGGTGTGCTCGGAGCTCGATTTTCCTAGGCGGCGGCCGCGGCGGCGGAGGCAGCAGCGGCGGCGGCAGTGGCGGCGGCGAAGGTGGCGGCGGCTCGGCCAGTACTCCCGGCCCCCGCCATTTCGGACTGGGAGCGAGCGCGGCGCAGGCACTGAAGGCGGCGGCGGGGCCAGAGGCTCAGCGGCTCCCAG
● Using BLAST search determine which gene/genes is this query fragment associated with?
![Page 41: 2/16/2018 Biological Databases Lecture 9 · Biological Databases Lecture 9 2/16/2018 Instructor: Kritika Karri kkarri@bu.edu. Class Objectives Why are databases the backbone of bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062505/5ed53122763f026d777412e4/html5/thumbnails/41.jpg)
41