Download - BODHI, A Bio-diversity Database Pla(n)tform
![Page 1: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/1.jpg)
BODHI 1
BODHI,A Bio-diversity Database Pla(n)tform
Jayant HaritsaDatabase Systems Lab
Supercomputer Education and Research Centre
Indian Institute of Science
![Page 2: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/2.jpg)
BODHI 2
Team
B. J. Srikanta (next talk)
Prof. Madhav GadgilProf. V. Nanjundiah(Centre for Ecological Sciences, IISc)
Several Masters Students
Funded by DBT
![Page 3: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/3.jpg)
BODHI 3
Motivation
GATT – Patent Laws To be in place by 2005
Loss Neem Basmati (estimated export value: Rs. 1,198 crore) Turmeric
Global and local efforts GBIF (Global Biodiversity Information Facility) Karnataka Bio-diversity Board [Deccan Herald - Aug 26 2000]
![Page 4: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/4.jpg)
BODHI 4
Bio-diversity Data
Taxonomy of species Phenetic (physical) characteristics Phylogenetic (evolutionary) characteristics
Habitat / Spatial distribution Political Layout Geographic Layout Biospheres
Genetic information Bio-molecular sequences Structural information
![Page 5: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/5.jpg)
BODHI 5
MULTI-DOMAIN QUERY
Retrieve all plant species that share a common habitat, have identical Inflorescence characteristics, and have a DNA sequence within BLAST score of 80, with respect to “Michelia-champa”.
![Page 6: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/6.jpg)
BODHI 6
Difficulties:
Complex range of data types sets, hierarchies, aggregations, sequences,
geometries, maps, audio, images …
Multidimensional data spatial (latitude, longitude, elevation) to
proteins (hundreds of coordinates)
Computationally-intensive operators species relationships, spatial distributions,
sequence alignments, ...
![Page 7: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/7.jpg)
BODHI 7
Current Solutions
Small-Scale MS-Access / FoxPro / Excel / ... Pentium PCs
Large-Scale RDBMS: Oracle / DB2 / Informix / Sybase / … Unix servers: Sun / SGI / IBM / HP / ...
![Page 8: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/8.jpg)
BODHI 8
Limitations:
RDBMS approach of “the world is a flat collection of tables with simple attributes”
suits financial applications,
NOT scientific (biological) applications In particular, taxonomic / spatial / sequence /
multimedia data modeling and processingare very cumbersome and coarse
![Page 9: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/9.jpg)
BODHI 9
Limitations (contd)
Spatial and other applications are not within the database kernel but are connected externally. E.g. Many GIS systems have ArcInfo and MS-Access hooked up in a “black-box” manner. Or, Blast/FASTA utilizing sequence files generated from Oracle.
Problem: Slow and ugly!
![Page 10: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/10.jpg)
BODHI 10
Is there Hope?
Object-Oriented DBMS “Natural” for biological applications
High-performance data access methods Path Dictionary Index, Multi-key Type Index,
Pyramid Tree, ...
High-performance specialized operators spatial join, data mining, sequence processing, …
XML = HTML + Semantics
![Page 11: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/11.jpg)
BODHI 11
Goals of BODHI
Seamless integration of taxonomic, spatial and genomic data using OO technology
Latest access methods and operatorsfor all three types of data
Utilize XML for data exchangeLow-cost (ideally, free!)
![Page 12: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/12.jpg)
BODHI 12
Architecture of BODHI
The Internet
Object Operations Genome Operations
Genome ModelSpatial Model
Spatial Operations
OBJECT STORAGE MANAGERSpatial Services Object Services Sequence Services
Taxonomy Model
Spatial Indexes Object Indexes Genome Indexes
Client Interface FrameworkQuery Processor
![Page 13: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/13.jpg)
BODHI 13
Implementation of BODHI
The Internet
Inheritance Aggregation
AlignmentBLAST, FASTA
DNA, ProteinCountry, State,
City, River, Road
Overlaps, Contains,Closest, Within
SHORE MICRO-KERNEL
Spatial Services Object Services Sequence Services
Species, Genera, Family, Order
R*-tree, Hilbert-Rtree Multi-Key Type,Path-Dictionary
??? Indexes(next talk)
Client Interface Framework–DB
Basic Types (Point, Line, Polygon, Sets, Sequences, ...)
![Page 14: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/14.jpg)
BODHI 15
Query Flow
![Page 15: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/15.jpg)
BODHI 16
Project Status
Prototype (minus Client Interface Framework) is operational since last month !
Platform: PIII-700MHz running Redhat Linux.
For Code, contact “[email protected]”
![Page 16: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/16.jpg)
BODHI 17
Performance Evaluation
SEQUOIA 2000 spatial benchmark: Competitive with Paradise GIS from Wisconsin
Taxonomy + Spatial Queries: Reasonably fast
But Genomics slows things down a lot due to absence of indexes (next talk)
![Page 17: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/17.jpg)
BODHI 18
More details
“Design and Implementation of a Biodiversity Information System”,Proc. of Intl. Conf. On Management of Data (COMAD), Pune, December 2000
“The Building of BODHI, A Bio-diversity Database System”,TechRep-2001-02, DSL/SERC, IISc
Available at http://dsl.serc.iisc.ernet.in
![Page 18: BODHI, A Bio-diversity Database Pla(n)tform](https://reader036.vdocuments.us/reader036/viewer/2022062314/5681408a550346895dac16af/html5/thumbnails/18.jpg)
BODHI 19
End of Talk