bio sql presentation

56
BioSQL: A Generic relational model for Bioinformatics Chandan Kumar Deb 10272 Ph.D. (Computer Application) BI-691

Upload: chandan-deb

Post on 06-Aug-2015

62 views

Category:

Software


1 download

TRANSCRIPT

BioSQL: A Generic relational model for Bioinformatics

Chandan Kumar Deb10272

Ph.D. (Computer Application)

BI-691

Contents

Generic Data Model Overview of BioSQL SchemaPreface of BioSQLDependency of BioSQLIntroductionInstallation BioSQL

Contents

Application of BioSQLAdvantages of BioSQLLimitation of BioSQLConclusionReferences

IntroductionFor database management Relational model is very important

Conceptualization of real world thing into logical model

First formulated and proposed in 1969 by Eadger F. Codd

Logical model is use making relation and their relationship

Introduction..

Relational Model

• Table• Tuple

• Relation Instance• Relation

schema• Relation

Key•Attribute

Domain

• Key Constraint• Domain Constraint•Referenti

al Integrity

Constraint

Introduction

This model is represented in terms of tuples, grouped into relations

A database organized in terms of the relational model is a relational database

Relational data model is the primary data model

This used widely around the world for data storage and processing

Introduction..

Generic Data Model

Generic Data ModelThe generic data model is the generalization of the conventional data model

This generic data model defines the standardised relation types

Consensus among the different Relational Modeler of can produce a generic model of a particular domain

Preface of BioSQL

Preface of BioSQL

Generic Data Model

Ewan Birney started BioSQL in 2001

Major Redesign and Refactorings 2002-2003

PhyloDb module added at 2006

V1.0 released in March 2008

Preface of BioSQL

Not a Query Language, It is a schema/dbmodel!!!

Preface of BioSQLCovering sequences, features, sequence and feature annotation, a reference taxonomy, and ontologies

Required highly normalized relational model

Local storage of global biological data

Overview of BioSQL shcema

BioSQL schema is not strongly typed paradigm

Derived entity always is in object oriented sense

Weakly typed paradigm

Generic, but can hold any number of specialization

Overview of BioSQL schema

Annotation Bundle

Overview of BioSQL schema

SeqfeatureWith

locationAnd

Annotation

Ontology term and

Relationship

Bioentry with taxon and names

spaces

Schema overview

BioEntry&Taxon

BiodatabaseBioentryBiosequenceBioentry

RelationshipTaxonTaxon Name

BioEntry

Core entity of BioSQL

Track any single entry or record in a biological databasesThe BIOENTRY contains information about the record's public name, public accession and version

BioDatabase

A BIODATABASE is simply a collection of bioentries

one BIOENTRY may only belong to one BIODATABASE

one BIODATABASE may contain many bioentries

BioSequence

In BioSQL, all relation have bioentries

BIOSEQUENCE table contains the raw sequence information associated with a BIOENTRY

Alphabet information ('protein', 'dna', 'rna')

One to One Relationship with BIOENTRY

BioEntryRelationship

BIOENTRY may themselves be related to one another

(e.g., a PDB record may be composed of multiple subrecords for separate chains)

Taxon,Taxon Name

Basic taxonomic information about the organism to which a given BIOENTRY refers

Reflect the structure of NCBI's taxonomy database

Each BIOENTRY can be associated with only one taxon

Many BIOENTRY can be associated with the same taxon

Annotation Bundle

Overview of BioSQL shcema

SeqfeatureWith

locationAnd

Annotation

Ontology term and

Relationship

Bioentry with taxon and names

spaces

Schema overview

Seqfeatures Location &Annotation

LocationSeqFeatureSEQFEATURE_RELATIONSHIPLocationQ.valueS.Q.ValueS.F DBxref

Seqfeature and Location

Semantic of Sequence

Describing the stop and start coordinates

and strand

Annotation Bundle

Overview of BioSQL shcema

SeqfeatureWith

locationAnd

Annotation

Ontology term and

Relationship

Bioentry with taxon and names

spaces

Schema overview

Ontology term and RelationshipTerm RelationTermTerm SynonymTermdbxrefOntology

Term and Ontology

Term is used to "label" a seqfeature's

name

An ontology is essentially a

dictionary of terms in a somewhat-

controlled vocabulary

Annotation Bundle

Overview of BioSQL shcema

SeqfeatureWith

locationAnd

Annotation

Ontology term and

Relationship

Bioentry with taxon and names

spaces

Schema overview

Annotation BundleReferencesBioentryReferencesCommentDbxrefBioentryDbxrefB&D QValue

Annotation Bundle

Overview of BioSQL shcema

SeqfeatureWith

locationAnd

Annotation

Ontology term and

Relationship

Bioentry with taxon and names

spaces

Dependency of BioSQL

Dependency of BioSQL

Installation of BioSQL

Installation of BioSQL

Installation of BioSQL

http://www.biosql.org/wiki/Downloads

Installation of BioSQL

Installation of BioSQL

Installation of BioSQL

Installation of BioSQL

Installation of BioSQL

Local MySQL Database

Advantages of BioSQL

The BioSQL project provides a well thought out relational database schema for storing biological sequences and annotations

Advantages of reusability

Compatible with several programming languages like BioPython, BioPerl, BioJava, BioRuby etc

Flexible storage of data via a key/value pair model

Advantages of BioSQL

Extensible with the required situation

Overall data model based on GenBank flat files

It also allows great flexibility in choosing the data used by Snapshot since sequence data from any source, including online databases

locally generated sequence data can be added

Advantages of BioSQL

Application of BioSQL

Application of BioSQL

Application of BioSQL

Limitation…

Limitation…

This is a single user solution

This is the least flexible since the database can not be shared

No Consideration of protein secondary structure prediction

Demonstration

Conclusion…

Conclusion…Local ‘GenBank’ with random access

‘GenBank’ in Relational format

Easy load of NCBI taxonomy data into Local DB

Integrated sequence and annotation databases

Handy Tool For Bioinformatics Community

References

•http://biojava.org/wiki/BioJava:Tutorial:Installing_and_using_BioSQL

•http://biopython.org/wiki/BioSQL

•http://biosqlweb.appspot.com/

•http://en.wikipedia.org/wiki/Generic_data_model

•http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/BIOSQL_tutorial.pdf

Thank you