data intensive research initiative for south africa · health, bio & food data intensive...

Post on 30-Sep-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Intensive Research Initiative for South Africa

A. VahedResearch Data Management Workshop

Pretoria, 11 August 2015

• Background

• Objectives

• RDM tasks

• Activities & Outputs

• Organisational Structure & Implementation

Outline

11 August 2015 © CSIR, 2015 2

NICIS

• National data integrative enabler supporting

– MTSF

– RDP

– SARIR,…

• Overarching coordination & national strategy

– National (Tier1)

– Regional (Tier2)

11 August 2015 © CSIR, 2015 3

NICIS

AcademiaSciencecouncils

::

RI’s

CoreServices

Networkedresources

Computing Services (CHPC +)

Networking Services

(SANReN)

Data Services (DIRISA)

Materials & Manuf.

Energy

Earth & Environment

Phy Sci & Eng.

Humans & Society

Health, Bio & Food

Data intensive research environments (SA_Grid … Cloud)Skills & expertise

Ph

ysic

al-S

ervi

ceSu

pp

ort

Ap

plic

atio

n

e-Research, e-ScienceIntegrated

distributed

cyber platform

for

– Data

Management

– Data

Intensive

Research

11 August 2015 © CSIR, 2015 4

DIRISA

AcademiaSciencecouncils

::

RI’s

CoreServices

Networkedresources

Computing Services (CHPC +)

Networking Services

(SANReN)

Data Services (DIRISA)

Materials & Manuf.

Energy

Earth & Environment

Phy Sci & Eng.

Humans & Society

Health, Bio & Food

Data intensive research environments (SA_Grid … Cloud)Skills & expertise

Ph

ysic

al-S

ervi

ceSu

pp

ort

Ap

plic

atio

n

11 August 2015 © CSIR, 2015 5

Other views

Industrial Sector Awareness

Capability

D. Tildesley: Vision of integrated e-infrastructure ecosystem

Connections Computing & Data Skills

Sector Domain Knowledge

Computing Data

Networks

SecuritySoftware

Hardware

11 August 2015 © CSIR, 2015 7

Value

Rich world of discipline, cross- and multi disciplinary data analytics

(enrichment, annotation,…)

Harmonised world of data management (preservation, workflows,…)

Federated world of data generators and sensory observations

(models and measurements)

Services & Environments

Data stewardship

Data Acquisition

Standards & Policies

Literature Sharing

Skill

s an

d e

xpe

rtis

eA

dvo

cacy

& O

utr

eac

h

• Extreme Data– Global, massive,

well-typed, homogeneous volumes

– LHC & SKA

• Research Big Data– Large, mixed-typed

volumes – Imagery, text, audio, etc

• Business Big Data– Lots of usually closed

transactional, serialised data

– Social data (Facebook, Twitter, Google, etc)

• Long Tail Data– Lots of (poorly managed)

relatively small data sets

11 August 2015 © CSIR, 2015 8

Data landscape

The underlying attributes are very different!

11 August 2015 © CSIR, 2015 9

Data class characteristicsClass Ownership Big Data Vs Technology Skills Research Env

Extreme International Vol, Vel, Open

Exascale Comp Maths / Stats / Astro, Visual

Distributed teams

Big Data –Business

Businesses Vol, Vel, Var, Closed

Clusters, SAS, Cloud, Hadoop

Data Engineers Team

Big Data –Research

National, Institutional

Vol, Vel, Var, Ver, “Open” access

HPC, Clusters, Grid, Cloud, data transfer

Data Scientists, Domain Researchers,Comp Scientists, Maths, Model

VRE, multi-disc, RIs

Long Tail Department, Individual

Var, Ver Grid, cloud Stats, Comp Science

Individuals, PhD, PD, Ris

1. Robust infrastructure and services– Federated Tier 1 & Tier 2 repositories– Virtual research environments

2. Ensure good data stewardship– Policies, protocols & standards– Internationally benchmarked

3. Develop capacity & expertise– Data intensive research– Programmes with HEIs

4. Advocacy & outreach– Data stewardship and data sharing– Stakeholder engagement – forums (SADA)

5. Coordination & strategy– National data intensive research activities– Inform on and guide aligned & consolidated strategic agenda

11 August 2015 © CSIR, 2015 10

How

Data Stewardship

Infrastructure & Services

Policies & Standards

Capacity & Expertise

Advocacy & Outreach

Coordination & Strategy

11 August 2015 © CSIR, 2015 11

DIRISA PlanYear 1 (2014/15)Survey & assess

• Institutional arrangements

• Infrastructure, policies & capacity

• Proposals

• Early adopters

Year 2 (2015/16)Develop & build

• Infrastructure & services

• Big projects

• Policies & processes

• Capacity building

Year 3 (2016/17)Grand research

• Federated network

• Open data & publishing

• Business partnerships

• E2E Data Mngt

Year 5+ (> 2017)Global competitive

• Beyond eResearch

• Long running & real-time science

• Fused & streamed data

Year 1

Action/Task Outputs

- Institutional arrangements- Set up forums & events- Engage & consult - Survey, assess “As-Is” situation- Prioritise areas & needs- Coordinate new & ongoing

projects

- Tier 1 & core services- Data stewardship policies & framework

(RDA, etc)- University data science programmes- Solicited proposals in data stewardship- Data intensive research strategy coordinated with

funders, strategies and key initiatives

Role: national capstone orchestration

• NOT a data custodian or data owner BUT supports data stewardship in federated context

• NOT a research funder BUT promote & support research (with caveats)

• Coordinate and guide, NOT prescribe, data intensive research and strategy

• Promote & support, NOT require, data contribution and adoption of Open Data and Open Science

• Coordinate, NOT prescribe, data research capacity development

11 August 2015 © CSIR, 2015 12

What DIRISA is not

• Stakeholder engagement– Workshops in CTN & PTA

– Survey of current data intensive research activities and needs

• Data stewardship & data intensive research strategic frameworks

• Architecture for federated data infrastructure

• DIRISA website

• NRF Open Access statement

11 August 2015 © CSIR, 2015 13

So far...

Stakeholder engagement for coordinated data management and data intensive research

• Avoid, minimize duplication of efforts and resources

• Promote sound data stewardship practices

• Promote cross-disciplinary research collaboration

• Share or federate resources where feasible

• Consolidate training interventions• Promote and advance projects that

address priority issues• Promote data intensive research

activity

11 August 2015 © CSIR, 2015 14

South African Data Alliance

• Deploy data services (Phase 1): Dropbox-like interface: upload/deposit, DOI registry, search & browse (Phase 2: VRE)

• RDM plan template

• RDM policy

• Research and capacity development with DST & NRF

• Call Tier 2 data nodes

11 August 2015 © CSIR, 2015 15

Now:

• Research data lifecycle– Observation / Generation– :– Preservation/ Expunction

• Quality, (Re)usability• Intellectual Property, Copyright,

Licensing, Policy• Identity & Stacking• Ethics & Privacy

– Re-identification– Discriminative profiling – Who watches the watchers?

• Access, Trust & Security– Laws have borders; data does not

• Infrastructure (institutional & technical)• Persistence & Provenance• Data sharing mind-set• Data sharing mind-set (What’s in it for

me?)

11 August 2015 © CSIR, 2015 16

Issues

Collection & formats

Organising & storing

Long term preservation

Ethics & IP

Sharing & re-use

Metadata

11 August 2015 © CSIR, 2015 17

Mindset

• “You want me to give you my data?”

• “You want me to share my (hard-collected) data?”

• “But we’ve tried this before”

• “We’re ok, we don’t need help”

• “Why should we get involved?”

• “This won’t work”

• “Show us the money”

Thank you for your attention

avahed@csir.co.za

11 August 2015 18© CSIR, 2015

• CODATA Data Citation Task Group • Board on Research Data and Information (BRDI)• International Council for Scientific and Technical Information (ICSTI)• DataCite• The Dataverse Network• National Information Standards Organization (NISO)• Creative Commons and Science Commons• CENDI – U.S. interagency group focused on scientific and technical information

issues and coordination of activities• Global Biodiversity Information Facility (GBIF)• World Data System (WDS)• STM Association (“Out of Cite, Out of Mind” publication)• Digital Curation Centre, UK• Research Data Alliance• DataFirst (UCT)• South African Data Archive (SADA)

11 August 2015 © CSIR, 2015 19

Data Citation Initiatives

11 August 2015 © CSIR, 2015 20

Capacity Development

Core & conversion modules

DataScience

(Analytics, Visualisation,

…)

DataEngineering

Technologies(Hadoop,

MapReduce,…)

DataStewardship

Management(Policy,

Standards,…)

Spe

cial

isat

ion

11 August 2015 © CSIR, 2015 21

Innovation & Discovery

Support & Enablement

Infrastructure

Capacity Development

Research Communities

CHPC DIRISA

Services & Standards

SANReN

Advocacy & Outreach

Science strategies

Collaboration

top related