esi supplemental webinar 2 - dataone presentation slides

64
DuraSpace/ARL/DLF E-Science Institute DataONE: Tools and Approaches for Supporting the Data Life Cycle Supplemental Webinar Thursday, November 15, 2012 1:00-2:30 pm EDT 1 1

Upload: duraspace

Post on 11-Nov-2014

1.340 views

Category:

Documents


3 download

DESCRIPTION

Presented by William Michener on 11-15-2012

TRANSCRIPT

Page 1: ESI Supplemental Webinar 2 - DataONE presentation slides

DuraSpace/ARL/DLFE-Science Institute

DataONE: Tools and Approaches for Supporting the Data Life Cycle

Supplemental WebinarThursday, November 15, 2012

1:00-2:30 pm EDT

11

Page 2: ESI Supplemental Webinar 2 - DataONE presentation slides

DataONE: Tools and Approaches for Supporting the Data Life Cycle

Presented by William Michener,

University of New Mexico

Professor and Director of e‐Science Initiatives for University Libraries

DuraSpace/ARL/DLF E‐Science Institute2

Page 3: ESI Supplemental Webinar 2 - DataONE presentation slides

3

Page 4: ESI Supplemental Webinar 2 - DataONE presentation slides

Three Key Challenges

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

AnalyzeInno

vati

on

4

Page 5: ESI Supplemental Webinar 2 - DataONE presentation slides

1. Data Preservation and Planning

✔ ?5

DuraSpace/ARL/DLF E‐Science Institute

Page 6: ESI Supplemental Webinar 2 - DataONE presentation slides

6

The Long Tail of Orphan DataVo

lum

e

Rank frequency of datatype

Specialized repositories(e.g. GenBank, PDB)

Orphan data

(B. Heidorn)

“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray

6DuraSpace/ARL/DLF E‐Science Institute

Page 7: ESI Supplemental Webinar 2 - DataONE presentation slides

Planning ?

Metadata standard?Data repository?

7DuraSpace/ARL/DLF E‐Science Institute

Page 8: ESI Supplemental Webinar 2 - DataONE presentation slides

Three major components for a flexible, scalable, sustainable network

Member Nodes• diverse institutions• serve local community• provide resources for managing their data

• retain copies of data

DataONE and the DMPToolSupport Data Preservation

8

Page 9: ESI Supplemental Webinar 2 - DataONE presentation slides

Three major components for a flexible, scalable, sustainable network

Member Nodes• diverse institutions• serve local community• provide resources for managing their data

• retain copies of data

Coordinating Nodes• retain complete metadata catalog 

• indexing for search• network‐wide services• ensure content availability (preservation)  

• replication services

DataONE and the DMPToolSupport Data Preservation

9

Page 10: ESI Supplemental Webinar 2 - DataONE presentation slides

Three major components for a flexible, scalable, sustainable network

Member Nodes• diverse institutions• serve local community• provide resources for managing their data

• retain copies of data

Coordinating Nodes• retain complete metadata catalog 

• indexing for search• network‐wide services• ensure content availability (preservation)  

• replication services

Investigator Toolkit

DataONE and the DMPToolSupport Data Preservation

10

Page 11: ESI Supplemental Webinar 2 - DataONE presentation slides

Dryad (>3,000 data products)

Coordinated submission of articles and underlying data

Handshaking with specialized repositories

Promotion of reuse and incentives for deposit

11DuraSpace/ARL/DLF E‐Science Institute

Page 12: ESI Supplemental Webinar 2 - DataONE presentation slides

Contributors• Individual investigators• Field stations and networks• Government agencies• Non‐profit partnerships• Synthesis centers

Data Types• Ecological• Environmental• Demographic• Social/Legal/Economic

< 1

1‐10

10‐200

>200

0

15

3045

60DataSizes

%

12MB

Knowledge Network for Biocomplexity (20,000+ data packages)

Page 13: ESI Supplemental Webinar 2 - DataONE presentation slides

13

✔Check for best practices✔Create metadata✔Connect to ONEShare

Data & Metadata (EML)

Page 14: ESI Supplemental Webinar 2 - DataONE presentation slides

14

Page 15: ESI Supplemental Webinar 2 - DataONE presentation slides

15

Page 16: ESI Supplemental Webinar 2 - DataONE presentation slides

16DuraSpace/ARL/DLF E‐Science Institute

Page 17: ESI Supplemental Webinar 2 - DataONE presentation slides

17DuraSpace/ARL/DLF E‐Science Institute

Page 18: ESI Supplemental Webinar 2 - DataONE presentation slides

18

Page 19: ESI Supplemental Webinar 2 - DataONE presentation slides

19DuraSpace/ARL/DLF E‐Science Institute

Page 20: ESI Supplemental Webinar 2 - DataONE presentation slides

20

Page 21: ESI Supplemental Webinar 2 - DataONE presentation slides

21

Page 22: ESI Supplemental Webinar 2 - DataONE presentation slides

22

Page 23: ESI Supplemental Webinar 2 - DataONE presentation slides

23

Page 24: ESI Supplemental Webinar 2 - DataONE presentation slides

24DuraSpace/ARL/DLF E‐Science Institute

Page 25: ESI Supplemental Webinar 2 - DataONE presentation slides

25DuraSpace/ARL/DLF E‐Science Institute

Page 26: ESI Supplemental Webinar 2 - DataONE presentation slides

2. Data Discovery

26

Page 27: ESI Supplemental Webinar 2 - DataONE presentation slides

Data Silos

27

Page 28: ESI Supplemental Webinar 2 - DataONE presentation slides

The DataONE Federation

28

Page 29: ESI Supplemental Webinar 2 - DataONE presentation slides

• Tier 1: Read only, public contentping(), getLogRecords(), getCapabilities(),get(), getSystemMetadata(), getChecksum(),listObjects(), synchronizationFailed()

• Tier 2: Read only, with access controlisAuthorized(), setAccessPolicy()

• Tier 3: Read/Write using client toolscreate(), update(), delete()

• Tier 4: Able to operate as a replication target–replicate(),getReplica()

• http://mule1.dataone.org/ArchitectureDocs‐current/apis/MN_APIs.html

Member Node Functional Tiers

29DuraSpace/ARL/DLF E‐Science Institute

Page 30: ESI Supplemental Webinar 2 - DataONE presentation slides

30

NASA collectors DAAC Users   (UWG)

DataONE Users

ORNL DAAC  as a DataONE Member Node 

Investigator Toolkit

30

Page 31: ESI Supplemental Webinar 2 - DataONE presentation slides

31DuraSpace/ARL/DLF E‐Science Institute

Page 32: ESI Supplemental Webinar 2 - DataONE presentation slides

32

Page 33: ESI Supplemental Webinar 2 - DataONE presentation slides

33DuraSpace/ARL/DLF E‐Science Institute

Page 34: ESI Supplemental Webinar 2 - DataONE presentation slides

34

Page 35: ESI Supplemental Webinar 2 - DataONE presentation slides

35DuraSpace/ARL/DLF E‐Science Institute

Page 36: ESI Supplemental Webinar 2 - DataONE presentation slides

36

3. Innovation

36

The Fourth Paradigm:1. Observational and 

experimental 2. Theoretical research 3. Computer simulations of 

natural phenomena4. Data‐intensive research

• new tools, techniques, and ways of working

Page 37: ESI Supplemental Webinar 2 - DataONE presentation slides

37

Decreasin

g Spatial Coverage

Increasin

g Process K

nowledge

Adapted from CENR‐OSTP

Remotesensing

Intensive science sitesand experiments

Extensive science sites

Volunteer & education networks

“Data Intensive Science” and the “80:20 Rule”

37

Page 38: ESI Supplemental Webinar 2 - DataONE presentation slides

Kepler

DMP-Tool

Investigator Toolkit Support 

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

38

Page 39: ESI Supplemental Webinar 2 - DataONE presentation slides

Spatio‐Temporal Exploratory Model identifies factors affecting patterns of migration

Diverse bird observations and environmental data from 300,00 locations in the US integrated and analyzed using High Performance Computing Resources

Land Cover

Meteorology

MODIS –Remote sensing data

• Examine patterns of migration 

• Infer how climate change may affect bird migration

Model results

Occurrence of Indigo Bunting (2008)

Jan Sep DecJunApr

Exploration, Visualization, and Analysis

39

Page 40: ESI Supplemental Webinar 2 - DataONE presentation slides

Scientific workflows

40DuraSpace/ARL/DLF E‐Science Institute

Page 41: ESI Supplemental Webinar 2 - DataONE presentation slides

41

Workflows Evolution with VisTrails

DuraSpace/ARL/DLF E‐Science Institute

Page 42: ESI Supplemental Webinar 2 - DataONE presentation slides

Collaboration environments

42

Page 43: ESI Supplemental Webinar 2 - DataONE presentation slides

43

Taverna, MyExperiment

DuraSpace/ARL/DLF E‐Science Institute

Page 44: ESI Supplemental Webinar 2 - DataONE presentation slides

Community Engagement

44

Page 45: ESI Supplemental Webinar 2 - DataONE presentation slides

Year 1 Year 2 Year 3 Year 4 Year 5

Scientists: BLScientists: BL

User Assessments

Scientists: FUScientists: FU

Librarians: BLLibrarians: BL Librarians: FULibrarians: FU

Policy Makers: BLPolicy Makers: BL Policy Makers: FUPolicy Makers: FU

Educators: BLEducators: BL Educators: FUEducators: FU

Library Policies: BLLibrary Policies: BL Library Policies: FULibrary Policies: FU

45DuraSpace/ARL/DLF E‐Science Institute

Page 46: ESI Supplemental Webinar 2 - DataONE presentation slides

• “More than half of the respondents (56%) reported that they did not use any metadata standard and about 22% of respondents indicated they used their own lab metadata standard.”

• Less than 6% of scientists are making “All” of their data available via some mechanism.

Results

46DuraSpace/ARL/DLF E‐Science Institute

Page 47: ESI Supplemental Webinar 2 - DataONE presentation slides

Community Engagement

47DuraSpace/ARL/DLF E‐Science Institute

Page 48: ESI Supplemental Webinar 2 - DataONE presentation slides

Best Practices and Software Tools

48

Page 49: ESI Supplemental Webinar 2 - DataONE presentation slides

Best Practices and Software Tools

49

Page 50: ESI Supplemental Webinar 2 - DataONE presentation slides

June 3-21, 2013University of New Mexico

50

Page 51: ESI Supplemental Webinar 2 - DataONE presentation slides

DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation 

51

Page 52: ESI Supplemental Webinar 2 - DataONE presentation slides

• 9 areas where you can help researchers

Recommendations

52DuraSpace/ARL/DLF E‐Science Institute

Page 53: ESI Supplemental Webinar 2 - DataONE presentation slides

1. Plan ‐ https://dmp.cdlib.org

53

Page 54: ESI Supplemental Webinar 2 - DataONE presentation slides

2. Collect and assure the data              http://www.dataone.org/best‐practices

54

Page 55: ESI Supplemental Webinar 2 - DataONE presentation slides

3.  Describe and document the data

http://metavist2.codeplex.com/

http://knb.ecoinformatics.org/morphoportal.jsp

55

Page 56: ESI Supplemental Webinar 2 - DataONE presentation slides

4. Select a repository for the datahttp://databib.org/http://www.dataone.org/best-practiceshttp://www.opendoar.org/

56

Page 57: ESI Supplemental Webinar 2 - DataONE presentation slides

5. Preserve the datahttp://daac.ornl.gov/PI/BestPractices-2010.pdf

57

Page 58: ESI Supplemental Webinar 2 - DataONE presentation slides

6. Use the data http://www.nutnet.umn.edu/

58

Page 59: ESI Supplemental Webinar 2 - DataONE presentation slides

7. Budget for it – 10‐>25% of total budget

59

Page 60: ESI Supplemental Webinar 2 - DataONE presentation slides

8. Communicate (early and often)Meetings, web portals, newsletters, phone and video conferences

60

Page 61: ESI Supplemental Webinar 2 - DataONE presentation slides

9. Train (in‐person and/or virtually)

61

Page 62: ESI Supplemental Webinar 2 - DataONE presentation slides

DataONE.org

62DuraSpace/ARL/DLF E‐Science Institute

Page 63: ESI Supplemental Webinar 2 - DataONE presentation slides

DataONE Team and Sponsors

•Bertram Ludaescher

•Deborah McGuinness

• Jeff Horsburgh

•Robert Sandusky

• Peter Honeyman

• Carole Goble

• Cliff Duke

•Donald Hobern

• Ewa Deelman•Amber Budden, Roger Dahl, Rebecca Koskela,  Bill Michener, Robert Nahf, Skye Roseboom, Mark Servilla

• Patricia Cruse, John Kunze

• Dave Vieglais 

• Paul Allen, Rick Bonney, Steve Kelling

• Stephanie Hampton, Chris Jones, Matt Jones, Ben Leinfelder, Andrew Pippin

• Suzie Allard, Nick Dexter, Kimberly Douglass, Carol Tenopir, Robert Waltz, Bruce Wilson

• John Cobb, Bob Cook, Ranjeet Devarakonda, Giri Palanismy, Line Pouchard 

• Sky Bristol, Mike Frame, Richard Huffine, VivHutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly

•David DeRoure

•Ryan Scherle, Todd Vision

LEON LEVY FOUNDATION

•Randy Butler

63

Page 64: ESI Supplemental Webinar 2 - DataONE presentation slides

DuraSpace/ARL/DLF E‐Science Institute 64

Questions?