toward a distributed information system for marine biology and limnology ( aka pakt project)...

32
Toward a distributed information system for marine biology and limnology (aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter Arzberger (PI), Paul Brewin, Li Chen, Heasoo Hwang, Yannis Papakonstantinou, Xufei Qian, Simone Santini, Reza Wahadj, Ilya Zaslavsky + Rutgers University, University of Auckland, U. Wisconsin Funding from the Gordon and Betty Moore Foundation

Upload: tobias-griffith

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Toward a distributed information system for marine biology and limnology

(aka PAKT project)

Presenting: Karen Stocks, Amarnath Gupta, Chris Condit

Peter Arzberger (PI), Paul Brewin, Li Chen, Heasoo Hwang, Yannis Papakonstantinou, Xufei Qian, Simone Santini, Reza Wahadj, Ilya Zaslavsky

+ Rutgers University, University of Auckland, U. Wisconsin

Funding from the Gordon and Betty Moore Foundation

Page 2: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

The Big Challenge:Integrating distributed and heterogeneous

data resources to advance marine ecology and limnology

Opening the “Data Closet”

Page 3: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Lakes Testbed Marine Testbed

Information Technology Development

Seamounts

OBIS

CalCOFI

Page 4: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Seamounts(undersea mountains)

Seamounts are

- biologically unique

- heavily fished habitats

Page 5: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

SeamountsOnline: Centralized relational database

Page 6: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Seamount Science Example

Can seamount diversity be predicted from seamount depth, distance from continental margin, geological age, surface productivity, etc.? Does endemism follow the predictions if Island Biogeography Theory?

Page 7: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Seamount Challenges

Combine multiple, distributed datatypes:

• relational species distributions data in SeamountsOnline (seamounts.sdsc.edu)

• bathymetry data and seamount morphology data in the Seamount Catalog (earthref.org)

• raster physical data from World Ocean Atlas, satellite imagery, etc.

Page 8: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Users

Research: CenSeam

– Data Analysis Working Group

– Expedition Planning

Management

– United Nations: IUCN-sponsored workshop on deepwater corals on Seamount

– International Seabed Authority workshop

Seamount Research Coordination Network, NSF

Page 9: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

OBIS: Ocean Biogeographic Information System (www.iobis.org)

Page 10: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

OBIS

• The Ocean Biogeographic Information System is an international federation of 50+ distributed data providers (7 mil data records) sharing species distribution data

• OBIS has a well established community (secretariat funding, 10 regional node centers, etc.) but limited resources to build infrastructure

• The current DiGIR client-server system allows ~70 fields of data to be transferred (an extended Darwin Core) (www.iobis.org)

Page 11: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter
Page 12: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

OBIS Science Examples

• Evaluating biogeographic provinces with real data

• Predicting the spread of invasive species

• Identifying diversity hotspots/siting marine protected areas

• Evaluating our state of knowledge

Page 13: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

OBIS Challenges

• integrate OBIS biological data with emerging physical data resources

• hierarchical data• allow habitat-specific data exploration• extend query functionality (e.g. to complex

spatial queries)• capture more data when registering new data

providers/serve specific communities better

Page 14: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Integrate OBIS biological data with emerging physical data resources

Page 15: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

CalCOFI

- CalCOFI (the California Cooperative Ocean Fisheries Investigations) is a 50+ year long monitoring study off of Southern California

- 4 times per year a regular grid of stations is sampled for larval fish, zooplankton, and physical ocean parameters

Page 16: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

CalCOFI Science Examples

• Determining scales of variability in biological components in space and time

• Correlating fluctuations in larval fish abundance with physical parameters over time.

• Developing ecosystem models for habitat-based management

Page 17: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Technical Challenges

• Multiple data types: relational, hierarchical, raster, point, voxel, etc.

• Geospatial data operations

• Ontologies

• Higher knowledge sources

Page 18: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Integrating Physical and Biological Oceanographic Data

The Information Systems Viewpoint

Page 19: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

What are we integrating and why?• The Science Goals

– Explain biodiversity• Of a species• Of any taxonomic grouping of species• Around a habitat• By correlating distribution of a taxonomic group with the

spatial (temporal) distribution of physical phenomena• By creating groupings of physical and biological parameters

that correlate with the distribution and abundance of species– Perhaps for specific habitats

– Create predictive models• Given physical parameters or habitat characteristics, predict

species distribution and abundance• Given species distribution, predict physical parameters• …

Page 20: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Observations

Organisms

Location

EnvironmentalParameters

Samplestaken-from

CollectionMethod

CollectionSystem

CollectionTarget

Organism-ClassExistence

Organism-ClassAbundance

IndividualOrganism

Partial-mapping

Partial-mapping

Environ.Ontology-k

Environ.Ontology-1

Point-in-space

Surface-in-space

Spatial-Volume

GenericLocationalReference

Of Organisms

OrganismProperties

Time/Frequency

Studies

OrganismClassesOrganismClasses

LocClassesLoc.

Classes

Partial-mapping

collected-for

ReferredObject

GenericEnviron.

Reference Of

Organisms

enviro-location-relationships

spatial relationships

solid annular

Intra-class-relationships(parameterized)

Intra-class-relationships(parameterized)

OrganismProperties

Environmental Region

Properties

A Conceptual Framework for a Global Biodiversity

Schema

Contributions

occur-at

collected-from

observed-at

associated-with

Organism-ClassRel. Abundance

Page 21: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Measurement(data/function)

parameters spatial collection pattern

dense sparse

point

surface

volume

coverage

time/frequency

collectionmetadata

value prob.

scalar vector

resolution

Point-in-space

Surface-in-space

Spatial-Volume

ReferredObject

solid annular

A Conceptual Framework for a Global Physical Oceanography Schema

Phenomena

name properties

view-definition

Page 22: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

What are we integrating and why?

• Data elements– The central elements

• Distribution of biological and physical variables– Point distributions– Field distributions– Object-bound distributions

• Grouping of biological and physical variables– Hierarchical groupings– Hypergraph groupings

– Additional elements• Geographic boundaries• Details of observations• Details of habitats and objects therein• …

Page 23: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Point, Field & Object-bound Distributions

• Distributions– Point distributions are sparse

• Continuous distributions– Field distributions are dense

• Often discrete– Object-bound distributions are sparse

• Around objects• Associated with other object-related properties

• Modeling field distributions as arrays– Can be modeled using nested-relational

calculus (algebra) + indices + counting (Libkin 95)

• Special access functions can be useful (Marathe 98)

– Non-uniform field (NUF) distributions: aligned-arrays with nulls

• NRC + indices + counting + list operations• Dimension transformation + interpolation

– Containment vs. overlap semantics

We are yet to show the relationship between Map Algebra and Array Algebra

Page 24: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Integration of Point with NUF Distribution Data Sources

• Some issues– Value AT POINT queries– Neighborhood queries

• Two possible “join” semantics– “snap” points to array-cells– “regrid” arrays to point resolution with interpolation

• Planning the joins in a mediator– Scenario

• A prior sub query selects a set of points P• Another prior subquery selects a set of array cells by condition C• Find value of function F for the points at the corresponding cells

– Solutions• Get P and C-result at the mediator and compute F at the mediator• Collect the set P at the mediator, call function F on array with condition C for

each element of P• Send an array indexing function to point source and return indexes, and

perform an indexed selection from array source– Not implemented yet

Page 25: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

The General Integration Problem

• Sources need to export different data models– Different algebras– Semantics of structures– Semantics of values– Constraints among values and domains

• How do we register this information?• What combined algebra does the mediator support?• How do we control addition of newer sources?• How does this work in the GAV or GLAV integration

framework?• How do we include type and structure transformations,

and domain-specific value-association as part of the mediation process?

Page 26: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

The Current Integration Framework

• Some Decisions– All data are “relationalized”– Algebraic operations are implemented on top of relational

sources as functions– Functions are modeled in the BIRN mediator as relations with

binding patterns– Popular native formats like OpenDAP are semantically too

heterogeneous and has poor query capabilities• Value based queries are disallowed• We need to augment the registration mechanism to (semi-

automatically) ingest all metadata• We will ingest the data and store it relationally in a network-

accessible relational system

– Will consider the problems of adding vector-data and unaligned array data as a next step

Page 27: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

The Demonstration• The global schema

The marked tables are augmented with physical parameters from the World Ocean Atlas – over two different grids

Page 28: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Technology Overview

• Microsoft ASP.NET

• Asynchronous Javascript and XML (AJAX)

• Google Maps

Page 29: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Google Maps

• Pros– Intuitive U.I.– Bathymetry– Simple Javascript API– Speed– Cost

• Cons– Google dependant– Data volume limitation

• Alternatives Under Consideration– ESRI ArcGIS Server– 3D Client (ArcGlobe, GoogleEarth, WorldWind)– Some combination

Page 30: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Data Sources

• SeamountsOnline– Biological Oceanography Information

• World Ocean Atlas– Physical Oceanography Information

• Biological and Physical Combination

Page 31: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Next Steps

• Interface Refinement

• Apply learning to OBIS

• Questions?

Page 32: Toward a distributed information system for marine biology and limnology ( aka PAKT project) Presenting: Karen Stocks, Amarnath Gupta, Chris Condit Peter

Contact Information

• Amarnath Gupta ([email protected])

• Karen Stocks ([email protected])

• Chris Condit ([email protected])