data intensive research initiative for south africa · health, bio & food data intensive...

20
Data Intensive Research Initiative for South Africa A. Vahed Research Data Management Workshop Pretoria, 11 August 2015

Upload: others

Post on 30-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

Data Intensive Research Initiative for South Africa

A. VahedResearch Data Management Workshop

Pretoria, 11 August 2015

Page 2: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• Background

• Objectives

• RDM tasks

• Activities & Outputs

• Organisational Structure & Implementation

Outline

11 August 2015 © CSIR, 2015 2

Page 3: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

NICIS

• National data integrative enabler supporting

– MTSF

– RDP

– SARIR,…

• Overarching coordination & national strategy

– National (Tier1)

– Regional (Tier2)

11 August 2015 © CSIR, 2015 3

NICIS

AcademiaSciencecouncils

::

RI’s

CoreServices

Networkedresources

Computing Services (CHPC +)

Networking Services

(SANReN)

Data Services (DIRISA)

Materials & Manuf.

Energy

Earth & Environment

Phy Sci & Eng.

Humans & Society

Health, Bio & Food

Data intensive research environments (SA_Grid … Cloud)Skills & expertise

Ph

ysic

al-S

ervi

ceSu

pp

ort

Ap

plic

atio

n

Page 4: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

e-Research, e-ScienceIntegrated

distributed

cyber platform

for

– Data

Management

– Data

Intensive

Research

11 August 2015 © CSIR, 2015 4

DIRISA

AcademiaSciencecouncils

::

RI’s

CoreServices

Networkedresources

Computing Services (CHPC +)

Networking Services

(SANReN)

Data Services (DIRISA)

Materials & Manuf.

Energy

Earth & Environment

Phy Sci & Eng.

Humans & Society

Health, Bio & Food

Data intensive research environments (SA_Grid … Cloud)Skills & expertise

Ph

ysic

al-S

ervi

ceSu

pp

ort

Ap

plic

atio

n

Page 5: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 5

Other views

Industrial Sector Awareness

Capability

D. Tildesley: Vision of integrated e-infrastructure ecosystem

Connections Computing & Data Skills

Sector Domain Knowledge

Computing Data

Networks

SecuritySoftware

Hardware

Page 6: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 7

Value

Rich world of discipline, cross- and multi disciplinary data analytics

(enrichment, annotation,…)

Harmonised world of data management (preservation, workflows,…)

Federated world of data generators and sensory observations

(models and measurements)

Services & Environments

Data stewardship

Data Acquisition

Standards & Policies

Literature Sharing

Skill

s an

d e

xpe

rtis

eA

dvo

cacy

& O

utr

eac

h

Page 7: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• Extreme Data– Global, massive,

well-typed, homogeneous volumes

– LHC & SKA

• Research Big Data– Large, mixed-typed

volumes – Imagery, text, audio, etc

• Business Big Data– Lots of usually closed

transactional, serialised data

– Social data (Facebook, Twitter, Google, etc)

• Long Tail Data– Lots of (poorly managed)

relatively small data sets

11 August 2015 © CSIR, 2015 8

Data landscape

The underlying attributes are very different!

Page 8: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 9

Data class characteristicsClass Ownership Big Data Vs Technology Skills Research Env

Extreme International Vol, Vel, Open

Exascale Comp Maths / Stats / Astro, Visual

Distributed teams

Big Data –Business

Businesses Vol, Vel, Var, Closed

Clusters, SAS, Cloud, Hadoop

Data Engineers Team

Big Data –Research

National, Institutional

Vol, Vel, Var, Ver, “Open” access

HPC, Clusters, Grid, Cloud, data transfer

Data Scientists, Domain Researchers,Comp Scientists, Maths, Model

VRE, multi-disc, RIs

Long Tail Department, Individual

Var, Ver Grid, cloud Stats, Comp Science

Individuals, PhD, PD, Ris

Page 9: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

1. Robust infrastructure and services– Federated Tier 1 & Tier 2 repositories– Virtual research environments

2. Ensure good data stewardship– Policies, protocols & standards– Internationally benchmarked

3. Develop capacity & expertise– Data intensive research– Programmes with HEIs

4. Advocacy & outreach– Data stewardship and data sharing– Stakeholder engagement – forums (SADA)

5. Coordination & strategy– National data intensive research activities– Inform on and guide aligned & consolidated strategic agenda

11 August 2015 © CSIR, 2015 10

How

Data Stewardship

Infrastructure & Services

Policies & Standards

Capacity & Expertise

Advocacy & Outreach

Coordination & Strategy

Page 10: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 11

DIRISA PlanYear 1 (2014/15)Survey & assess

• Institutional arrangements

• Infrastructure, policies & capacity

• Proposals

• Early adopters

Year 2 (2015/16)Develop & build

• Infrastructure & services

• Big projects

• Policies & processes

• Capacity building

Year 3 (2016/17)Grand research

• Federated network

• Open data & publishing

• Business partnerships

• E2E Data Mngt

Year 5+ (> 2017)Global competitive

• Beyond eResearch

• Long running & real-time science

• Fused & streamed data

Year 1

Action/Task Outputs

- Institutional arrangements- Set up forums & events- Engage & consult - Survey, assess “As-Is” situation- Prioritise areas & needs- Coordinate new & ongoing

projects

- Tier 1 & core services- Data stewardship policies & framework

(RDA, etc)- University data science programmes- Solicited proposals in data stewardship- Data intensive research strategy coordinated with

funders, strategies and key initiatives

Page 11: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

Role: national capstone orchestration

• NOT a data custodian or data owner BUT supports data stewardship in federated context

• NOT a research funder BUT promote & support research (with caveats)

• Coordinate and guide, NOT prescribe, data intensive research and strategy

• Promote & support, NOT require, data contribution and adoption of Open Data and Open Science

• Coordinate, NOT prescribe, data research capacity development

11 August 2015 © CSIR, 2015 12

What DIRISA is not

Page 12: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• Stakeholder engagement– Workshops in CTN & PTA

– Survey of current data intensive research activities and needs

• Data stewardship & data intensive research strategic frameworks

• Architecture for federated data infrastructure

• DIRISA website

• NRF Open Access statement

11 August 2015 © CSIR, 2015 13

So far...

Page 13: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

Stakeholder engagement for coordinated data management and data intensive research

• Avoid, minimize duplication of efforts and resources

• Promote sound data stewardship practices

• Promote cross-disciplinary research collaboration

• Share or federate resources where feasible

• Consolidate training interventions• Promote and advance projects that

address priority issues• Promote data intensive research

activity

11 August 2015 © CSIR, 2015 14

South African Data Alliance

Page 14: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• Deploy data services (Phase 1): Dropbox-like interface: upload/deposit, DOI registry, search & browse (Phase 2: VRE)

• RDM plan template

• RDM policy

• Research and capacity development with DST & NRF

• Call Tier 2 data nodes

11 August 2015 © CSIR, 2015 15

Now:

Page 15: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• Research data lifecycle– Observation / Generation– :– Preservation/ Expunction

• Quality, (Re)usability• Intellectual Property, Copyright,

Licensing, Policy• Identity & Stacking• Ethics & Privacy

– Re-identification– Discriminative profiling – Who watches the watchers?

• Access, Trust & Security– Laws have borders; data does not

• Infrastructure (institutional & technical)• Persistence & Provenance• Data sharing mind-set• Data sharing mind-set (What’s in it for

me?)

11 August 2015 © CSIR, 2015 16

Issues

Collection & formats

Organising & storing

Long term preservation

Ethics & IP

Sharing & re-use

Metadata

Page 16: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 17

Mindset

• “You want me to give you my data?”

• “You want me to share my (hard-collected) data?”

• “But we’ve tried this before”

• “We’re ok, we don’t need help”

• “Why should we get involved?”

• “This won’t work”

• “Show us the money”

Page 17: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

Thank you for your attention

[email protected]

11 August 2015 18© CSIR, 2015

Page 18: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

• CODATA Data Citation Task Group • Board on Research Data and Information (BRDI)• International Council for Scientific and Technical Information (ICSTI)• DataCite• The Dataverse Network• National Information Standards Organization (NISO)• Creative Commons and Science Commons• CENDI – U.S. interagency group focused on scientific and technical information

issues and coordination of activities• Global Biodiversity Information Facility (GBIF)• World Data System (WDS)• STM Association (“Out of Cite, Out of Mind” publication)• Digital Curation Centre, UK• Research Data Alliance• DataFirst (UCT)• South African Data Archive (SADA)

11 August 2015 © CSIR, 2015 19

Data Citation Initiatives

Page 19: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 20

Capacity Development

Core & conversion modules

DataScience

(Analytics, Visualisation,

…)

DataEngineering

Technologies(Hadoop,

MapReduce,…)

DataStewardship

Management(Policy,

Standards,…)

Spe

cial

isat

ion

Page 20: Data Intensive Research Initiative for South Africa · Health, Bio & Food Data intensive research environments (SA_Grid … Cloud) Skills & expertise al-e port on. Integrated e-Research,

11 August 2015 © CSIR, 2015 21

Innovation & Discovery

Support & Enablement

Infrastructure

Capacity Development

Research Communities

CHPC DIRISA

Services & Standards

SANReN

Advocacy & Outreach

Science strategies

Collaboration