using e-infrastructures for biodiversity conservation - module 3

43
Using e- Infrastructures for Biodiversity Conservation Gianpaolo Coro ISTI-CNR, Pisa, Italy

Upload: gianpaolo-coro

Post on 15-Aug-2015

19 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Using e-Infrastructures for Biodiversity Conservation

Gianpaolo Coro ISTI-CNR, Pisa, Italy

Page 2: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data

Module 3 - Outline

Page 3: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

D4ScienceD4Science is both a Data and a Computational e-Infrastructure

• Used by several Projects: i-Marine, EUBrazil OpenBio, ENVRI;

• Implements the notion of e-Infrastructure as-a-Service: it offers on demand access to data management services and computational facilities;

• Hosts several VREs for Fisheries Managers, Biologists, Statisticians…and Students.

Page 4: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

D4Science - ResourcesLarge Set of Biodiversity and Taxonomic Datasets connected

A Network to distribute and access to Geospatial Data

Distributed Storage System to store datasets and documents

A Social Networkto share opinions and useful news

Algorithms for Biology-related experiments

Page 5: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data

Module 3 - Outline

Page 6: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Biodiversity and Geospatial Data

Page 7: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Biodiversity Data Providers

i-Marine hosts biodiversity datasets coming from several data providers:• Some are remotely accessed and are maintained by the respective owners;• Other ones are resident in the e-Infrastructure.

Currently, the accessible datasets are:• Catalogue of Life (CoL) • Global Biodiversity Information Facility (GBIF), • Integrated Taxonomic Information System (ITIS), • Interim Register of Marine and Nonmarine Genera (IRMNG), • Ocean Biogeographic Information System (OBIS), • World Register of Marine Species (WoRMS) • World Register of Deep-Sea Species ( WoRDSS )

Some data providers are collectors of other data providers, but the alignment is not guaranteed!The datasets allow to retrieve:• Occurrence points (presence points or specimen)• Taxa names

Page 8: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Online Examples:http://www.catalogueoflife.org/

http://www.gbif.org/http://www.iobis.org/

Page 9: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Geospatial Data Providers

Bio-ORACLE

NetCDF NetCDFASCIIArcGIS

ASCII Raw formatsWorld Ocean Atlas

Page 10: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Online Examples:http://www.myocean.eu

https://www.nodc.noaa.gov/OC5/woa13/http://www.oracle.ugent.be/

ToolsUI ftp://ftp.unidata.ucar.edu/pub/netcdf-java/v4.5/toolsUI-4.5.jar

Page 11: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data

Page 12: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

TrendylyzerTrendylyzer allows to discover species observation trends.It is based on the OBIS collector

OBIS

This trend tells the story of the Coelacanth discovery

Page 13: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Online Example: the i-Marine Trendylyzer

https://i-marine.d4science.org/group/biodiversitylab/trends-production

Page 14: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data

Page 15: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Cleaning

Page 16: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Union – Difference - Intersection

Page 17: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Occurrences Points Operations

A

x,y

Event Date

Modif Date

Author

Species Scientific Name

d(x,y) < Distance Thr

=

LD(Author) * LD(SciName) > Lexical Thr

<Take the most recent>

B

x,y

Event Date

Modif Date

Author

Species Scientific Name

Evaluate

Page 18: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Experiment

Solea solea57 085 Records2 324 Records

1 871 Records10 542 Records

Duplicates Deletionwith Exact Match(DThr=0; LThr=1)

Subtraction

DThr=0.01; LThr=0 DThr=0.01; LThr=1DThr=0.0001;

LThr=0.8

183 Records 0 Records 0 Records

Main remarks:

• The “recordedBy” fields contain differences in names formats

• The Scientific Names fields are different (names vs names and codes)

• D4Science helps in collecting a larger number of Solea solea unique occurrence records

• Even if GBIF collects data from OBIS, the coverage is not updated

Page 19: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Occurrences Points Operations

Occurrences Duplicates Deleter:An algorithm for deleting similar occurrences in a sets of occurrence points coming from the Species Discovery Facility of D4Science.

A

Page 20: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Occurrences Points Operations

Occurrences Intersection: Between two Ocurrence Sets A and B, keeps the elements of the B that are similar to elements in A.

A B

Page 21: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Occurrences Points Operations

Occurrences Subtraction:Between two Ocurrence Sets A and B, keeps the elements of the A that are not similar to any element in B

A B

Page 22: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Occurrences Points Operations

Occurrences Merger:Between two Ocurrence Sets A and B, enriches A with the elements of B that are not in the A. Updates the elements of the A with more recent elements in B. If one element in A corresponds to several recent elements in B, these are substituted to the element of A.

A

B

Page 23: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Online experiments: the i-Marine

Occurrence Management systemhttps://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 24: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

• Biodiversity and geospatial data• Trends in biodiversity observations• Combining species observations• Combining biodiversity and geospatial data

Module 3 - Outline

Page 25: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Combining Biodiversity and Geospatial data

Environmental layers

Species occurrence dataset

Enriched dataset

Page 26: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Online Experiments:https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 27: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

One practical application

Page 28: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

The giant squid - Architeuthis

16th century 2012

The giant squid (Architeuthis) has been reported worldwide even before the 16th century, and has recently been observed live in its habitat for the first time.

Page 29: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Why rare species?• Biological and evolutionary investigations• Fisheries management policies and conservation• Vulnerable Marine Ecosystems• Key role in affecting biodiversity richness• Indicators of degradation for aquatic ecosystems

Page 30: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Detecting rare species

• How to build a reliable distribution from few observations?

• How to account for absence locations?• Is there any approach forrare species?

Page 31: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Data qualityFor rare species, data quality is fundamental:

• Reliable presence data • Reliable absence locations• High quality environmental features• Non-noisy environmental features

Page 32: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Tools – i-marine.d4science.orgD4Science e-Infrastructure:

• Retrieve presence data• Generate absence data• Get environmental data• Model, adjust data and

produce maps• Share results

Page 33: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

1. Presence data of A. dux from D4S

https://i-marine.d4science.org/group/biodiversitylab/species-data-discovery

Page 34: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

2. Simulating A. dux absence locations from AquaMaps

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

0<Prob. < 0.2AquaMaps Native

Page 35: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

3. Environmental Features

https://i-marine.d4science.org/group/biodiversitylab/geo-visualisation

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Most of these layers were available in D4Science

Depth and Distance from landwere imported using the Statistical Manager

Page 36: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

4. MaxEnt model as filter

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

MaxEntEnv. features most

correlated to the giant squid

Presence data

Env. data

Page 37: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Filtered Environmental Features

Page 38: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

5. Presence/absence modelling: Artificial Neural Networks (ANN)

Model trained on positive and negative examplesIn terms of env. features

Binary file

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Presence/absence data

Filtered env. features

Page 39: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

6. Projection of the Neural Network

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 40: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

7. Comparison

MaxEnt (presence-only)

22.01% 21.68%

Similarity calculated using Maps Comparison, by Coro, Ellenbroek, Pagano DOI: 10.1080/15481603.2014.959391

Expert map, Nesis, 2003

Aquamaps Suitable

(expert system)

Neural Network (presence/absence)

42.83%

https://i-marine.d4science.org/group/biodiversitylab/processing-tools

Page 41: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Conclusions

• Using data quality enhancement produces high performance distribution

• A presence/absence ANN combines these data• Biological, observation and expert evidence confirm the prediction

by the ANN

Page 42: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Summary: modelling rare species distributions

1. Retrieve high quality presence locations by relying on the metadata of the records,

2. Use expert knowledge or an expert system to detect absence locations. Select absence locations as widespread as possible,

3. Select a number of environmental characteristics correlated to the species presence,

4. Use MaxEnt to filter the environmental characteristics that are really important with respect to the presence points,

5. Train an Artificial Neural Network on presence and absence locations and select the best learning topology,

6. Project the ANN at global scale, using the a resolution equal to the maximum in the environmental features,

7. Train a MaxEnt model as comparison system.

Page 43: USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3

Just another exampleCoelacanth, Smith 1939

GARP

MaxEnt

AquaMaps

Neural Network

Coro, Gianpaolo, Pasquale Pagano, and Anton Ellenbroek. "Combining simulated expert knowledge with Neural Networks to produce Ecological Niche Models for Latimeria chalumnae." Ecological Modelling 268 (2013): 55-63.