rdc - benoit pierenne: data interoperability i

19
DISCOVER THE OCEAN. UNDERSTAND THE PLANET. BEYOND INFRASTRUCTURE GAPS CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014

Upload: casrai

Post on 08-Aug-2015

62 views

Category:

Science


2 download

TRANSCRIPT

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

BEYOND INFRASTRUCTURE GAPS

CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

OR: HOW WILL WE SOLVE RESEARCH DATA MANAGEMENT ISSUES IN CANADA?

CASRAI Canada ReConnect14 Benoît Pirenne, Director, User Engagement, Ocean Networks Canada. Ottawa, November 19, 2014

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Why data management?❖ Research Data Management has recently received a lot of attention

- Science research equipment and programmes are costly to setup and/or operate and therefore data must be re-used and shared with many other users

- There is potential for new insight to emerge from a re-use of the data

- Too many (smaller) research programme don’t have a data management plan and data end up being lost

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM❖ People focus on the hardware issues:

- That’s chasing the wrong rabbit!

- [LHC’s 25PB/yr]: “Storing the data is not a problem: hard drives are cheap and getting cheaper. The challenge is preserving knowledge that is less commonly stored — the software, algorithms and reference plots specific to each experiment. These often degrade or disappear with time”, says Cristinel Diaconu (nature.com Nov. 26, 2013)!

- Funding agencies prefer the hardware focus, because funding is a one-off!

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Real challenge: data description (metadata) - Requires: gathering, indexing, describing and curating research data

at all stages of data collection, preparation, archival and distribution

- Metadata is essential for, and part of, data quality assessment

- Includes source, full description, calibration, annotations, space-time info, …, ownership, access authorizations, …

- Includes the link between data and resulting publications

Challenges of DM

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

Metadata

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Real challenge: data description (metadata) - Not popular with funding agencies because metadata

requires having expert and dedicated staff to curate data

- Metadata requires software systems to be maintained to support the activity

- Metadata is a long term commitment

Challenges of DM

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM❖ Data access

- Search through data (not always possible), search through metadata

- Metadata encoding and transport standards needed

- Data formats are discipline-specific

- Uniform, interoperable access is a huge challenge (e.g., VO)

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Challenges of DM

- Convince PIs and funding agencies that good Data Management is important. - But this battle is by now almost won. (NSF, TC3+, … )

- New CFI Cyber-Infrastructure initiative to be announced to support most needs of data stewardship

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

How can we afford DM?❖ Data Management is affordable

- Experience shows that across disciplines, the average cost to set up a DM is ~10% of the costs of the projects it supports

- Experience shows that the burden of operating a DM is about 10% of the overall projects operating costs

- DM costs fall down further when projects are no longer operational

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Towards Data Stewardship facilities

❖ At the service of many projects in related disciplines ❖ Provides long-term data storage, access and stewardship, well beyond the lifetime of individual projects

❖ Need is particularly acute for small projects ❖ Avoid the creation of many ad-hoc systems that can’t be maintained long-term

❖ International quality standards exists (ICSU’s World Data System)

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DM Activities

Sensors, Other Digital

DataArchive Initial Users

Other Users (≠ disciplines,

public)

Data Acquisition

Format translation,

data products

Data Stewardship Facilty

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users❖ Address the following:

❖ Too many data repositories for similar datasets

❖ poorly described results

❖ untraceable sources

❖ unreadable digital media

❖ “abandoned”, inaccessible records

❖ incomplete dataset description

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users❖ Are a one-stop-shop for data in a given discipline, and a portal to international resources

❖ Allow scientists to focus on science, not on data management

❖ Ensure stewardship of data beyond project funding ❖ Ensure data will remain citable

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for users- Buy-in from users and PIs regarding:

- Development of trust with external entities managing their data

- The definition of a(n open) data policy, sharing of data

- Being thorough with data/experimentation description (Metadata)

- Realizing that data management is not achieved with a bit of hardware and software

In progress: use of clouds

increasingIn progress: more

and more open data policies around

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

DSF for Funding agencies❖ Ability make economies of scale ❖ DSF have expertise in data management and relevant science disciplines ❖ DSF have the wherewithal to remain at the leading edge of technology ❖ Users already used to entrust their data to “the Cloud”, and work using remote compute resources

❖ With similar international peers, have a voice at the interoperability and standards table

❖ Newest CFI Cyber Infrastructure program is a step in the right direction

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

❖ Has to deal with users for whom the data volumes are unheard of!

Challenges For DSF’s

DISCOVER THE OCEAN. UNDERSTAND THE PLANET.

Canadian DSF examples❖ Canadian Astronomy Data Centre (CADC) is a great example of discipline specific Data Stewardship Facility

❖ Canadian Polar Data Network (CPDN) — includes multi-disciplinary data

❖ Canadian Research Data Centre Network (CRDCN) (social and population health statistics)

❖ …