niso forum, denver, sept. 24, 2012: scientific discovery and innovation in an era of data-intensive...

44
Data Observation Network for Earth (DataONE): Supporting Scientific Data Preservation, Discovery, and Innovation Bill Michener Professor and DataONE Project Director University of New Mexico 24 September 2012 National Information Standards Organization

Upload: national-information-standards-organization-niso

Post on 01-Nov-2014

616 views

Category:

Education


3 download

DESCRIPTION

Scientific discovery and innovation in an era of data-intensive science William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.

TRANSCRIPT

Page 1: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

Data Observation Network for Earth (DataONE): Supporting Scientific Data Preservation, Discovery, and Innovation

Bill Michener

Professor and DataONE Project DirectorUniversity of New Mexico

24 September 2012

National Information Standards Organization

Page 2: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

2

Page 3: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

3

Research and Data Life Cycle Integration

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Proposal writing

Research

Publication

Ideas

?

?

Page 4: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

4

Three Key Challenges

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

1

3

2

{Innovation

Page 5: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

5

1. Data Preservation and Planning

✔ ?

Page 6: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

6

The Long Tail of Orphan DataVolu

me

Rank frequency of datatype

Specialized repositories(e.g. GenBank, PDB)

Orphan data

(B. Heidorn)

“Most of the bytes are at the high end, but most of the datasets are at the low end” – Jim Gray

6

Page 7: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

7

Planning ?

Metadata standard?Data repository?

Page 8: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

8

Three major components for a flexible, scalable, sustainable network

Member Nodes• diverse institutions• serve local community• provide resources for

managing their data• retain copies of data

Coordinating Nodes• retain complete metadata

catalog • indexing for search• network-wide services• ensure content availability

(preservation) • replication services

Investigator Toolkit

DataONE and the DMPTool Support Data Preservation

Page 9: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

9

Dryad (>3,000 data products)

Coordinated submission of articles and underlying data

Handshaking with specialized repositories

Promotion of reuse and incentives for deposit

9

Page 10: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

10

Contributors• Individual investigators• Field stations and networks• Government agencies• Non-profit partnerships• Synthesis centers

Data Types• Ecological• Environmental• Demographic• Social/Legal/Economic

< 1

1-10

10-200

>200

0

15

30

45

60DataSizes

%

10MB

Knowledge Network for Biocomplexity (20,000+ data packages)

Page 11: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

11

✔Check for best practices✔Create metadata✔Connect to ONEShare

Data & Metadata (EML)

Page 12: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

12

Data Management Planning Tool

Page 13: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

13

Page 14: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

14

Page 15: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

15

2. Data Discovery

Page 16: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

16

Data Silos

Page 17: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

17

The DataONE Federation

Page 18: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

18

Tier 1: Read only, public contentping(), getLogRecords(), getCapabilities(),get(), getSystemMetadata(), getChecksum(),listObjects(), synchronizationFailed()

Tier 2: Read only, with access controlisAuthorized(), setAccessPolicy()

Tier 3: Read/Write using client toolscreate(), update(), delete()

Tier 4: Able to operate as a replication targetreplicate(),getReplica()

http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html

Member Node Functional Tiers

Page 19: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

19

NASA collectors DAAC Users (UWG)

DataONE Users

ORNL DAAC as a DataONE Member Node

Investigator Toolkit

19

Page 20: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

20

Page 21: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

21

Page 22: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

22

Page 23: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

23

Page 24: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

24

Page 25: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

25

1. Ontology-based discovery search results

Concepts acquire context: biomass

as Material or biomass as Energy Additional

search terms

Super-classes may have different

properties

1.NCBO ontology repository instance2.Populated with ontologies (e.g., the NASA-JPL Semantic Web

for Earth and Environmental Terminology)3.Queried ontologies and returned results using REST services

Page 26: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

26

Actual Keywords Suggested Keywords1. canopy characteristics2. field investigation3. vegetation index4. leaf characteristics5. Satellite6. land cover7. leaf area meter8. Reflectance9. steel measuring tape10. vegetative cover11. plant characteristics12. albedo

[1]field investigation[2]analysis[3]land cover[4]computational model[5]reflectance[6]vegetative cover[7]biomass[8]primary production[9]steel measuring tape[10]weigh balance[11]precipitation amount[12]canopy characteristics[13]leaf characteristics[14]water vapor[15]quadrat sample frame[16]rain gauge[17]surface air temperature[18]air temperature[19]meteorological station[20]human observer[21]vegetation index[22]soil core device[23]plant characteristics[24]surface wind[25]albedo

DAAC DRYAD KNBNumber of Documents 978 1,729 24,249Total Number of Keywords 7,294 8,266 254,525Average Keywords/Document 7.46 4.78 10.49 1

2

3

0 2 4 6 8 10 12

DAAC

DRYAD

KNB

Approach 2: Enrich MN Metadata

Page 27: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

2727

3. Innovation

The Fourth Paradigm:1. Observational and

experimental 2. Theoretical research 3. Computer simulations of

natural phenomena4. Data-intensive research

• new tools, techniques, and ways of working

Page 28: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

28

Dec

reas

ing

Spati

al C

over

age

Incr

easi

ng P

roce

ss K

now

ledg

e

Adapted from CENR-OSTP

Remotesensing

Intensive science sites and experiments

Extensive science sites

Volunteer & education networks

“Data Intensive Science” and the “80:20 Rule”

28

Page 29: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

29

Public Participation in Scientific Research Conference: 4-5 August 2012 in Portland, Oregon USA prior to Ecological Society of America meeting (6-10 Aug.): http://www.birds.cornell.edu/citscitoolkit/conference/2012

29

Page 30: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

30

Kepler

DMP-Tool

Investigator Toolkit Support

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Page 31: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

31

Spatio-Temporal Exploratory Model identifies factors affecting patterns of migration

Diverse bird observations and environmental data from 300,00 locations in the US integrated and analyzed using High Performance Computing Resources

Land Cover

Meteorology

MODIS – Remote sensing data

• Examine patterns of migration

• Infer how climate change may affect bird migration

Model results

Occurrence of Indigo Bunting (2008)

Jan Sep DecJunApr

Exploration, Visualization, and Analysis

31

Page 32: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

32

Taverna, MyExperiment

Page 33: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

33

Provenance Browser

33

Page 34: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

34

DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation

Current Member Nodes:

Coming Soon: Current Tools:

Tools Coming Soon: Queensland University of Technology

Page 35: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

35

2009 2010 2011 2012 2013 2014

Deployment Targets – Y5

Y1 Y2 Y3 Y4 Y5

Metadata Objects 100k (130k) 400k 1M

Datasets 90k (120k) 180k 360k

Uptime 99.0 (100) 99.9 99.9

Metadata Schemas 8 (4) 8 8

Member Nodes 10 (8) 20 40

MN Countries 3 (2) 5 10

Coordinating Nodes 3 (3) 4 5

CN Countries 1 (1) 1 2

ITK Tools 8 (4) 10 12

Page 36: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

36

Community Engagement

Page 37: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

37

Year 1 Year 2 Year 3 Year 4 Year 5

Scientists: BL

User Assessments

Scientists: FU

Librarians: BL Librarians: FU

Policy Makers: BL Policy Makers: FU

Educators: BL Educators: FU

Library Policies: BL Library Policies: FU

Page 38: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

38

Community Engagement

Page 39: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

39

Best Practices and Software Tools

Page 40: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

40

June 3-21, 2013University of New Mexico

Page 41: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

41

Internships

https://notebooks.dataone.org/summer2012/

2009 – 4 interns, 2010 – 4 interns2011 – 8 interns, 2012 – 6 interns

Page 42: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

42

DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation

Page 43: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

43

DataONE.org

Page 44: NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

44

DataONE Team and Sponsors

• Bertram Ludaescher

• Deborah McGuinness

• Jeff Horsburgh

• Robert Sandusky

• Peter Honeyman

• Carole Goble

• Cliff Duke

• Donald Hobern

• Ewa Deelman• Amber Budden, Roger Dahl, Rebecca Koskela, Bill Michener, Robert Nahf, Skye Roseboom, Mark Servilla

• Patricia Cruse, John Kunze

• Dave Vieglais

• Paul Allen, Rick Bonney, Steve Kelling

• Stephanie Hampton, Chris Jones, Matt Jones, Ben Leinfelder, Andrew Pippin

• Suzie Allard, Nick Dexter, Kimberly Douglass, Carol Tenopir, Robert Waltz, Bruce Wilson

• John Cobb, Bob Cook, Ranjeet Devarakonda, Giri Palanismy, Line Pouchard

• Sky Bristol, Mike Frame, Richard Huffine, Viv Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly

• David DeRoure

• Ryan Scherle, Todd Vision

LEON LEVY FOUNDATION

• Randy Butler