bodhi1 bodhi, a bio-diversity database pla(n)tform jayant haritsa database systems lab supercomputer...
TRANSCRIPT
BODHI 1
BODHI,A Bio-diversity Database Pla(n)tform
Jayant HaritsaDatabase Systems Lab
Supercomputer Education and Research Centre
Indian Institute of Science
BODHI 2
Team
B. J. Srikanta (next talk)
Prof. Madhav GadgilProf. V. Nanjundiah(Centre for Ecological Sciences, IISc)
Several Masters Students
Funded by DBT
BODHI 3
Motivation
GATT – Patent Laws To be in place by 2005
Loss Neem Basmati (estimated export value: Rs. 1,198 crore) Turmeric
Global and local efforts GBIF (Global Biodiversity Information Facility) Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000]
BODHI 4
Bio-diversity Data
Taxonomy of species Phenetic (physical) characteristics Phylogenetic (evolutionary) characteristics
Habitat / Spatial distribution Political Layout Geographic Layout Biospheres
Genetic information Bio-molecular sequences Structural information
BODHI 5
MULTI-DOMAIN QUERY
Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”.
BODHI 6
Difficulties:
Complex range of data types sets, hierarchies, aggregations, sequences,
geometries, maps, audio, images …
Multidimensional data spatial (latitude, longitude, elevation) to
proteins (hundreds of coordinates)
Computationally-intensive operators species relationships, spatial distributions,
sequence alignments, ...
BODHI 7
Current Solutions
Small-Scale MS-Access / FoxPro / Excel / ... Pentium PCs
Large-Scale RDBMS: Oracle / DB2 / Informix / Sybase / … Unix servers: Sun / SGI / IBM / HP / ...
BODHI 8
Limitations:
RDBMS approach of “the world is a flat collection of tables with simple attributes”
suits financial applications,
NOT scientific (biological) applications In particular, taxonomic / spatial / sequence /
multimedia data modeling and processingare very cumbersome and coarse
BODHI 9
Limitations (contd)
Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle.
Problem: Slow and ugly!
BODHI 10
Is there Hope?
Object-Oriented DBMS “Natural” for biological applications
High-performance data access methods Path Dictionary Index, Multi-key Type Index,
Pyramid Tree, ...
High-performance specialized operators spatial join, data mining, sequence processing, …
XML = HTML + Semantics
BODHI 11
Goals of BODHI
Seamless integration of taxonomic, spatial and genomic data using OO technology
Latest access methods and operatorsfor all three types of data
Utilize XML for data exchangeLow-cost (ideally, free!)
BODHI 12
Architecture of BODHI
The Internet
Object Operations Genome Operations
Genome ModelSpatial Model
Spatial Operations
OBJECT STORAGE MANAGERSpatial Services Object Services Sequence Services
Taxonomy Model
Spatial Indexes Object Indexes Genome Indexes
Client Interface FrameworkQuery Processor
BODHI 13
Implementation of BODHI
The Internet
Inheritance Aggregation
AlignmentBLAST, FASTA
DNA, ProteinCountry, State,
City, River, Road
Overlaps, Contains,Closest, Within
SHORE MICRO-KERNEL
Spatial Services Object Services Sequence Services
Species, Genera, Family, Order
R*-tree, Hilbert-Rtree Multi-Key Type,Path-Dictionary
??? Indexes(next talk)
Client Interface Framework–DB
Basic Types (Point, Line, Polygon, Sets, Sequences, ...)
BODHI 16
Project Status
Prototype (minus Client Interface Framework) is operational since last month !
Platform: PIII-700MHz running Redhat Linux.
For Code, contact “[email protected]”
BODHI 17
Performance Evaluation
SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin
Taxonomy + Spatial Queries: Reasonably fast
But Genomics slows things down a lot due to absence of indexes (next talk)
BODHI 18
More details
“Design and Implementation of a Biodiversity Information System”,Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000
“The Building of BODHI, A Bio-diversity Database System”,TechRep-2001-02, DSL/SERC, IISc
Available at http://dsl.serc.iisc.ernet.in