cloud computing for ecological modeling in the d4science infrastructure a. manzi (cern), l. candela,...

Download Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR)

If you can't read please download the document

Upload: britton-lindsey

Post on 19-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

3 Species distribution models aiming at estimating the presence of a species in a given area are essential instruments in the development of strategies and policies for the management and the sustainable and equitable use of living resources. 2 Main issues to face: Need for large computing capabilities and appropriate modeling tools Need for both a sufficient amount of good quality occurrence point datasets and suitable environmental datasets Species Distribution modeling

TRANSCRIPT

Cloud Computing for Ecological Modeling in the D4Science Infrastructure A. Manzi (CERN), L. Candela, D. Castelli, G. Coro, P. Pagano, F. Sinibaldi (ISTI-CNR) EGI Community Forum 2013 Manchester 8-12 Apr 2013 The Species Distribution Modeling The AquaMaps Scenario D4Science Infrastructure gCube Framework gCube Statistical Manager D4Science Cloud ComputingConclusions 2 Overview 3 Species distribution models aiming at estimating the presence of a species in a given area are essential instruments in the development of strategies and policies for the management and the sustainable and equitable use of living resources. 2 Main issues to face: Need for large computing capabilities and appropriate modeling tools Need for both a sufficient amount of good quality occurrence point datasets and suitable environmental datasets Species Distribution modeling Model-based, large-scale predictions of known natural occurrence of marine species. Predictions are made by matching species tolerances against local environmental conditions. ( e.g. salinity, temperature) Computation is based on algorithms such as AquaMaps: Developed by Kashner et al. (2006) to predict global distributions of marine mammals Color-coded species range map, using a half-degree latitude and longitude dimensions The AquaMaps scenario 4 HSPENHSPEC HCAF Species Environmental Envelope (HSPEN) Range of environmental tolerance and preference of a species Cells Authority File (HCAF) Metadata about half degree cells: membership, physical attributes Cells Species Assignments (HSPEC) Probability of occurrence of a species in a given cell Defining Environmental Envelopes Generating Species Occurrence Probabilities Plotting Occurrence Maps and GIS Layers The AquaMaps scenario 5 Very large volume of input and output data Less than 7,000 species: HSPEC native range = 56,468,301 HSPEC suitable range = 114,989,360 Estimate for 50,000 species: HSPEC native range = 350,000,000 HSPEC suitable range = 715,000,000 [Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center] Very large number of computation One Multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations One global map (extended to all species and cells around the world) requires about 400 billions computations [N. Bailly, WorldFish Center] 11,549 species ( from FishBase) 2 Days of sequential computation The AquaMaps scenario 6 GBIF DRIVER EGI AquaMaps VRE GENESI-DEC VENUS-C GeoNetwork OBISs 7 Production level infra deployed and maintained during D4Science (2007) and D4Science II (2009) projects D4Science Infrastructure D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science hosts biodiversity communities federated by the iMarine and the EUBrazilOpenBio initiatives D4Science will provide ENVRI RIs with seed resources D4Science Infrastructure Well suited for typical biodiversity processes like Ecological Modeling Provides access to computational and storage resources offered by commercial cloud providers new storage technologies generally identified as no-sql databases several algorithms for performing data analysis and mining Offers scalable platforms for data interoperability and efficient data management Offers a scalable infrastructure for efficient spatial data access, processing, and visualization 8 D4Science: example of communities Collaborators, 33 M Hits/month 50 K/month unique visitors from 26 countries Aquamaps Operational Data Observation Data 400 Experts OpenModeller Cloud gCube is a JAVA service-oriented framework managing: creation and interconnection of e-Infrastructures in a controlled and highly configurable environment. deployment of dynamic Virtual Research Environments Enabling Layer Allows deployments of: Native components on Tomcat (hot deployments) gCube components on Axis container (dynamic deployments) Implements Infrastructure components optimal deployment and allocation (automatic or admin driven) 10 gCube Framework Information System This service is a key one in a gCube-based infrastructure since it offers functionalities for publishing, monitoring, discovering and accessing the set of resources forming the infrastructure Storage Manager the management of files storage is based on a network of distributed storage nodes managed via specialized software for document- oriented databases. The Storage Manager Library in its current implementation offers files management over two possible document store software: MongoDB and Terrastore. Message Queue This service is based on Apache Active Message Broker to support a queue-based mechanism for distributing messages to consumers 11 gCube Framework: Main Components Executor This service is a key component to endow a gCube-empowered infrastructure with cloud processing. It acts as a container for gCube tasks ( as plugins of the service) which can be dynamically deployed into the service and executed through its interface. Generic Worker task of the Executor which is exploited in cloud computation tasks. It is able to execute processes, either binary executables or scripts, along with their dependencies in a sandbox. 12 gCube Framework: Main Components Geospatial Data Manager Service for discovering and accessing to distributed environmental data and maps. This service relies on maps stored on several GeoServer instances. A set of PostGIS databases store the concrete values and geometries and the GeoServer distributes them according to standard Open Geospatial Consortium (OGC) protocols like Web Map Service (WMS), Web Coverage Service (WCS) and Web Feature Service (WFS). A GeoNetwork instance is endowed with an OGC CSW based search engine which allows for retrieving meta-information 13 gCube Framework: Main Components Statistical Manager Java Objects Users DB D4Science DB Source D4Science DB Source SDMX Storage Manager Storage Manager CSV HTTP CALLS D4Science Workspace External Features Sources gCube Statistical Manager 14 Statistical Manager is able to: Generate Geographical Probability models for species (e.g. Aquamaps) Perform transformations on data (e.g. interpolations) Perform data mining operations (e.g. modeling, clustering) Evaluate models, distributions and experiments (e.g. ROC curve, AUC, Accuracy) Perform data quality analysis (e.g. Habitat Representativeness Score) Scope 15 Architecture 16 Advanced Graphical Interfaces 17 D4Science Cloud Processing 18 The Statistical Manager is instantiated with the AquaMaps algorithm Data generation is up to 50-times faster on D4Science cloud Adds the generation and publication of GIS layers representing the species distribution Supports generation of transect Supports dataset management facilities Solves scalability issues 19 Statistical Manager & AquaMaps Conclusions Ecological Modeling in D4Science: Perform modeling by using Cloud Computing in a transparent way to users Take care of parallelization issues Evaluate models performances Next Step: Transparent generation of Geospatial features at different resolutions by implementing geospatial data processing by means of cloud computing facilities, endowed with a WPS protocol interface. D4Science 20 Go mobile with iMarine 21 iMarine application for iOS and Android to discover over 500 world marine species and stay informed on iMarine news & activities Try AppliFish ! iOS Android AppliFish Landscape D4Science e-Infrastructure gCube Framework gCube Apps Discussion Thanks for your attention Questions? https://portal.i-marine.d4science.org 22