some ideas on making research data: "it's the metadata, stupid!"

29
The Metadata [R]evolution: Transformative Opportunities September 18, 2013 Some Ideas on Making Research Data Discoverable and Usable: “It’s the Metadata, Stupid!” Anita de Waard, VP Research Data Collaborations, Elsevier Research Data Services (VT)

Upload: anita-de-waard

Post on 10-May-2015

688 views

Category:

Technology


1 download

DESCRIPTION

Talk at OCLC Collective Insight symposium, Johns Hopkins, Baltimore, MD, September 18, 2013

TRANSCRIPT

Page 1: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

The Metadata [R]evolution: Transformative OpportunitiesSeptember 18, 2013

Some Ideas on Making Research Data Discoverable and Usable:

“It’s the Metadata, Stupid!”

Anita de Waard, VP Research Data Collaborations, Elsevier Research Data Services (VT)

Page 2: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Everybody’s talking about research data:

Share research outputs Demonstrate impact to public Data availability drives growth

Demonstrate impact Guarantee permanence, discoverability Avoid fraud

Generate, track outputs Comply with mandates Ensure availability

Archive, track, curate Support researcher/institution

Archive Add curation Allow reuse

Todd Vision, DataDryad, OAI8, 6/23/13: “We need to find a way to keep Dryad funded, and would love to hear your ideas about doing that.”

Phil Bourne, Associate Vice Chancellor, UCSD, 4/13: “We are thinking about the university as a digital enterprise.”

Mike Huerta, Ass. Director NLM O of Health Info at NIH, 6/13: “Today, the major public product of science are concepts, written down in papers. But tomorrow, data will be the main product of science…. We will require scientists to track and share their data as least as well, if not better, than they are sharing their ideas today.”

Mara Saule, Dean University Libraries/CIO, UVM, 5/13: “We need to do something about data.”

Derive credit Comply with mandates Discover and use Cite/acknowledge

Gov

Funding bodies

University management

Researchers

Librarians

Data Repositories

Nathan Urban, PI Urban Lab, CMU, 3/13: “If we can share our data, we can write a paper that will knock everybody’s socks off!”

Roles and needs wrt Research Data:

Barbara Ransom, NSF Program Director Earth Sciences, 2/13: “We’re not going to spend any more money for you to go out and get more data! We want you first to show us how you’re going to use all the data we paid y’all to collect in the past!”

Page 3: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Where research data goes now:

> 50 My Papers2 M scientists

2 My papers/year

Majority of data(90%?) is stored

on local hard drives

Dryad: 7,631 files

Dataverse:0.6 My

Institutional Repositories

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

Page 4: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Where research data goes now:

> 50 My Papers2 M scientists

2 My papers/year

Majority of data(90%?) is stored

on local hard drives

Dryad: 7,631 files

Dataverse:0.6 My

Institutional Repositories

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

How do we get researchers to curate, store and share their

data?

Page 5: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Where research data goes now:

> 50 My Papers2 M scientists

2 My papers/year

Majority of data(90%?) is stored

on local hard drives

Dryad: 7,631 files

Dataverse:0.6 My

Institutional Repositories

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

How do we get researchers to curate, store and share their

data?

How do we ensure long-term

sustainability for high-end repositories?

Page 6: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Where research data goes now:

> 50 My Papers2 M scientists

2 My papers/year

Majority of data(90%?) is stored

on local hard drives

Dryad: 7,631 files

Dataverse:0.6 My

Institutional Repositories

Some data (8%?) stored in large,

generic data repositories

MiRB: 25k

PetDB: 1,5 k

TAIR: 72,1 k

PDB: 88,3 k

SedDB: 0.6 k

A small portion of data (1-2%?) stored in small,

topic-focuseddata repositories

How do we get researchers to curate, store and share their

data?

How do we ensure long-term

sustainability for high-end repositories?

What role do libraries/institution

s play?

Page 7: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodies

Page 8: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits

Page 9: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits Grad Students experiment

Page 10: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook.

Page 11: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of their slides,

Page 12: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of their slides,and writes a paper.

Page 13: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Research data management in action:

Using antibodiesand squishy bits Grad Students experimentand enter details into theirlab notebook. The PI then tries to make sense of their slides,and writes a paper. End of story.

Page 14: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

de Waard, A., Burton, S. et al., 2013

An attempt to get researchers to curate (but only partially share!) their data:

Page 15: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

de Waard, A., Burton, S. et al., 2013

An attempt to get researchers to curate (but only partially share!) their data:

Page 16: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

What to do in the meantime:

49 publications193 publications 76 publications 214 publications 210 publications

• In 220 publications only 40% of antibodies, 40% of cell lines and 25% of constructs can be manually identified (Vasilevsky et al, submitted)

• Proposal (with NIH/NIF and Force11 Group): – Adding minimal data standards– Tool extracts likely reagents / resources– User interface asks author to confirm or select

Page 17: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

How can research databases become sustainable in the long term?

1. With IEDA: – Building a database for lunar

geochemistry– Write joint report on building repository, curation

costs and challenges

2. With WDS/RDA WG: – Planning survey of cost recovery models– Input/inspiration: ICPSR Sloane-funded project

‘Sustaining Domain Repositories for Digital Data’– Developing overarching funding model with Todd

Vision/DataDryad

Page 18: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Making lunar sample data usable:

Page 19: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Making lunar sample data usable:

Page 20: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Making lunar sample data usable:

Page 21: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Making lunar sample data usable:

Page 22: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Private store

Data produceror sponsor

Access

Closed

Flow of funds

Data publicatio

n

Public

Service

Collaboration Conclave

Limited

Subscription

content

Commercial overlay

Limited

Academic Use/Limited

Data user

Flow of funds

Examples ICSPR,CERN-LHC

KEGG GeoFacetsReaxys

DRAFT - CC-BY-NC 2013, Todd Vision & Anita de Waard

Many small operations, e.g. try-db.org,plhdb.org

Dryad,arXiv,PDB

Commercial and institutional storage

&

or

A research database funding model:

Page 23: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Comparing data repository types:Repository Advantages Disadvantages

Local data repository

Easy! No one steals your data.

No one sees it. Not compliant with requirements

Generic data repository

Not very hard to do. Have complied!

Data can’t be easily reused. Credit?

Institutional Repository

Can use existing IR? Tracking and compliance checks.

Data can’t easily be reused. Credit?

Domain-specific data repository

Data can be reused. Credit!

Lot of work for curators. Long-term sustainable? Eff

ort,

Reus

e, C

redi

t, Co

mpl

ianc

e

Hab

it, E

ase,

Priv

acy,

Con

trol

Hig

her q

ualit

y m

etad

ata

Page 24: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Funding Agency: University:

Collaborators:Domain of study:Domain-Specific Data Repository

Local Data Repository

Institutional Data Repository

Generic Data Repository

AND

THEYALL

WANT

DIFFERENT

METADATA!!!!

Metadata madness…

Page 25: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Where do IRs/libraries fit in?• Planning series of interviews at key institutions: – What role do libraries/institutions play wrt research

data management? – What tools/metadata standards are used?– What aspects of data deposition is the Research

Office/IR/Institution interested in? – How does this compare with what scientists want and

do in their labs? • Goal: share knowledge; establish plan of action

Page 26: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Principles of Elsevier RDS: • Main goal: make research data optimally available, discoverable

and reusable.• Collaboration is tailored to partner’s unique needs: – Working with a few domain-specific and institutional repositories and

institutions– Aspects where collaboration is needed are discussed– Collaboration plan is drawn up using SLA: agree on time, conditions,

etc. • 2013: series of pilots, studies and reports to enable feasibility

study: – What are key needs? – Can Elsevier play a role: skillsets, partnerships? – Is there a (transparent) business model for this?

Page 27: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

In summary: If researchers start to curate and share their data…And research databases become long-term sustainable…… we enable enrichment with high-quality metadata that makes research data truly discoverable and reusable.

Many questions remain:? What role would the institution/library play? ? How do we ensure interoperable metadata? ? What are sustainable models, moving forward? ? Is there a place for publishers, in all this?

Page 28: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Thank you!Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming,

Ilya Zaslavsky• NIF: Maryann Martone, Anita Bandrowski• MSU: Brian Bothner• OHSU: Melissa Haendel, Nicole Vasilevsky• California Digital Library: Carly Strasser, John Kunze, Stephen Abrams• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu• CNI: Clifford Lynch• Harvard: Michael Kurtz, Chris Erdmann• MIT: Micah Altman• UVM: Mara Saurle

Page 29: Some Ideas on Making Research Data: "It's the Metadata, stupid!"

Your questions?

Anita de WaardVP Research Data Collaborations,

Elsevier Research Data Services (VT) [email protected]

http://researchdata.elsevier.com/