2016 12-14 gbif and reuse of research data. gbif seminar in bergen

46
CC-BY Dag Endresen

Upload: dag-endresen

Post on 13-Apr-2017

84 views

Category:

Science


0 download

TRANSCRIPT

Page 1: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

CC-BYDagEndresen

Page 2: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

•  BIG DATA – a new research paradigm •  Data curation plan (data-life-cycle) •  Publish and archive your research data •  Use shared universal data standards •  Write metadata, good data documentation •  "Data paper" and data citation •  Academic credits for data publishing •  Use digital, stable and universal identity-

numbers (DOI)

Page 3: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen
Page 4: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA EXPLOTION

•  More and more and more data is produced.

•  The challenge ahead is not to produce more data, but knowledge, understanding and capacity to navigate and use very large volumes of data.

•  90% of the data that currently exists was created in just the last two years.

•  Data curation is critical to ensure that data is appropriately structured, available and reusable.

Page 5: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

EXPONENTIAL GROWTH FOR DIGITAL DATA

Thedigitaluniversewilldoubleeverytwoyearsbetweennowand2020.Thegrowthismostlyunstructureddata(includingsensordatafromcameratrapsandweatherstaAons,images,video,soundclips).Amajorfactorbehindtheexpansionisthegrowthofmachinegenerateddata(from11%in2005toover40%in2020).

Imagesource:EMC/IDCDigitalUniverseStudy,2012

Page 6: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

UNSTRUCTURED DATA

"Data! Data! Data! he cried impatiently. I can’t make bricks without clay". (Quote from Sherlock Holmes by Sir Arthur Conan Doyle in “The Adventure of the Copper Beeches”).

UnstructureddataaccountsforanesAmated80%ofalldatainorganizaAonsandawhopping95%ofallnewdatagenerateddaily(Grimes2008).

Page 7: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Why create a data management plan?

GraphicsbyJørgenStampCC-BY

Page 8: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA LOSS Digital data are fragile and susceptible to loss for a wide variety of reasons:

•  Natural disaster •  Facilities infrastructure failure •  Storage failure •  Server hardware/software failure •  Application software failure •  Format obsolescence •  Legal encumbrance •  Human error •  Malicious attack •  Loss of staffing competencies •  Loss of institutional commitment •  Loss of financial stability •  Changes in user expectations

Source: OpenAIRE & EUDAT, CC-BY-4.0, 2013 Image CC BY-NC-SA 2.0 by Dave Hill https://www.flickr.com/photos/dmh650/4031607067

Page 9: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA MANAGEMENT PLAN

•  Making your data available to others ensures that your research is truly reproducible.

•  Managing your research data saves time because it ensures that you and others in your collaboration will be able to find, understand, and use the data.

•  Sharing your research data enables wider dissemination of your work.

•  Enabling others to use your data reinforces open scientific inquiry and can lead to new and unanticipated discoveries.

GraphicsbyJørgenStampCC-BY

Page 10: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

"FAIR" DATA Findable

–  assign persistent IDs, provide rich metadata, register in a searchable resource... (such as GBIF)

Accessible –  Retrievable by their ID using a standard protocol,

metadata remain accessible even if data aren’t...

Interoperable –  Use formal, broadly applicable languages, use

standard vocabularies, qualified references... (e.g. Darwin Core, …)

Reusable –  Rich, accurate metadata, clear licences, provenance,

use of community standards... (e.g. Dublin Core, EML, …)

www.force11.org/group/fairgroup/fairprinciples

Slide source: OpenAIRE & EUDAT, CC-BY-4.0, 2013

Page 11: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA CITATION PRINCIPLES

1.  Data to be legitimate citable products of research. 2.  Data citations giving scholarly credit and attribution. 3.  In scholarly literature, whenever claims are based on data, data should

always be cited. 4.  Persistent method for identification of data, that is machine actionable,

globally unique, universal. 5.  Data citation facilitate access to data or at least to metadata. 6.  Unique identifiers that persist even beyond the lifespan of the data. 7.  Data citation identify and access the specific data that support verification

of the claim (provenance, time-slice, version). 8.  Flexible, but attention to interoperability of practices across communities.

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014

Page 12: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Long-term archiving for your research data

GraphicsbyJørgenStampCC-BY

Page 13: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

BACKUP AND ARCHIVING – NOT THE SAME THING!

Backup –  Periodic snapshots of data in case the current

version is destroyed or lost. –  Backups are copies of files stored for short-term or

near-long-term. –  Often performed on a somewhat frequent schedule.

Archiving –  Preserve data for historical reference. –  Usually the final version, stored for long-term, and

generally not copied over. –  Often performed at the end of a project or during

major milestones.

Source: OpenAIRE & EUDAT, CC-BY-4.0, 2013

Page 14: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

ONLINE DATA ARCHIVING CENTER

Rather than leaving your research data on a local server or in cloud storage, archive your data with a trusted digital repository. Many repositories create metadata and documentation to ensure that the data will be discoverable in the future.

Page 15: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA ONE

Source: GBIF News story, September 2014, DataONE: http://www.gbif.org/page/8199

Page 16: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

NATIONAL DATA CENTER

Sigma2AS

Foto: CC-BY Intel Free Press (WikiMedia Commons) APeekInsideFacebook'sOregonDataCenter

UNINETTSigma2ASandtheNorwegianCenterforResearchData(NSD)providenaAonalanaAonalinfrastructureserviceforarchivingNorwegianresearchdata.AninfrastructuredatarepositoryprovidemanybenefitscomparedtolocalinsAtuAonaldataarchiving.•  Standardizedprotocols.•  Improvedaccessforusersof

datafromoutsideowninsAtuAon.

Page 17: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Metadata

Page 18: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

WHAT IS METADATA?

Photo: CC-BY ‘Metadata is a love note to the future’ by Cea+ www.flickr.com/photos/ centralasian/8071729256

Page 19: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Commonly defined as ‘data about data’, metadata helps to make data findable and understandable. Metadata can be:

Descriptive: information about the content and context of the data.

Structural: information about the structure of the data.

Administrative: information about the file type, rights management and preservation processes.

WHAT IS METADATA?

Source:CC-BYEUDAT,2015

Page 20: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

METADATA CATALOG Image CC-BY ‘University of Michigan Library Card Catalog’ by David Fulmer www.flickr.com/photos/annarbor/4350629792

Page 21: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Comprehensive metadata will:

•  Facilitate data discovery

•  Help users determine the applicability of the data

•  Enable interpretation and reuse

•  Allow any limitations to be understood

•  Clarify ownership and restrictions on reuse

•  Offer permanence as it transcends people and time

•  Provide interoperability

WHY USE METADATA?

Source:CC-BYEUDAT,2015

Page 22: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

INFORMATION ENTROPY

TheLossofInformaAonaboutData(Metadata)OverTime,Micheneretal,1997

Page 23: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Create metadata at the time of data creation.

Information will be forgotten and there won’t be time or effort left to capture it later.

Metadata benefits from quality control at an early stage too.

TIME MATTERS!

Photo CC-BY-SA ‘egg timer – hour glass running out’ by Open Democracy www.flickr.com/photos/opendemocracy/523438942

Source:CC-BYEUDAT,2015

Page 24: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATASET TITLE

Titles are critical in helping readers find your data. –  While individuals are searching for the most appropriate

data sets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

–  Treat the title as the opportunity to sell your dataset.

A complete title includes: What, Where, When, Who, and Scale.

An informative title includes: topic, timeliness of the data, specific information about place and geography.

Source:CC-BYEUDAT,2015

Page 25: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

WHAT IS THE BETTER DATASET TITLE?

Rivers or

Rivers in Rondane national park from 1:126,700 Forest Service visitor maps (1961-1983) Rivers (what) in Rondane national park (where) from 1:126,700 (scale) Forest Service (who) visitor maps (1961-1983) (when)

Source:CC-BYEUDAT,2015

Page 26: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

WRITEFORMACHINES,NOTJUSTHUMANS

Remember: a computer will read your metadata.

Do not use symbols that could be misinterpreted: Examples: ! @ # % { } | / \ < > ~

Don’t use tabs, indents, or line feeds/carriage returns.

When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters.

Source:CC-BYEUDAT,2015

Page 27: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Peer review before data-publishing

"Data paper"

Page 28: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

AuthorsgetscienAficcreditfordatapublicaAon.MeeAngconcernsoverdataquality.MeeAngconcernsoverdatacitaFonmechanism.

hap://www.gbif.org/publishingdata/datapapers

PEERREVIEWOPTIONFORBIODIVERSITYDATASETS

Page 29: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

METADATA TOPICS / HEADLINES

Dataset description Project description People and Organizations (including roles) Coverage

•  Taxonomic coverage •  Geographic coverage •  Temporal coverage

Methods Intellectual property rights, licensing Keywords

Page 30: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

RATIONALE FOR DATA PAPER

•  A scholarly publication of searchable metadata document describing a dataset, or a group of datasets.

•  Promote and publicize the existence of the data.

•  Provide scholarly credit to data publishers through citable journal publications.

•  Describe the data in a structured human- and machine-readable form.

Page 31: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Persistent and universal identity-number

Page 32: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

ThepurposeofidenAfiers…istonamethings,

makingitispossibletorefertothem.“EachidenAfierreferstooneandonlyonething”(Coyle2006).“Anassocia-onbetweenastringandathing”(Kunze2003).“Astatedassocia-onbetweenasymbolandathing;thatthesymbolmaybeusedtounambiguouslyrefertothethingwithinagivencontext”(Campbell2007).

Page 33: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Manythings(inGBIF)arenamed123

Catalognumber:123GBIFID:543392241urn:catalog:CAS:BOT:123Bigelowiajuncea

Catalognumber:123GBIFID:1030591721UAMb:Herb:123Sphagnumgirgensohnii

Catalognumber:123GBIFID:893477175Parideserithalion

Catalognumber:123GBIFID:1050327334Cinchonaledgeriana Catalognumber:123

GBIFID:931031820Bromuskalmii

Catalognumber:123GBIFID:283363urn:occurrence:Arctos:MVZ:Egg:123:164Mercurialisovata

Catalognumber:123GBIFID:231564351Umbrinacanariensis

Catalognumber:123GBIFID:896547722urn:occurrence:Arctos:MVZ:Egg:123:164Contopussordidulusveliei

NAME AMBIGUITY:

Page 34: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen
Page 35: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

HTTP – PURL – UUID http://purl.org/gbifnorway/id/41d9cbb4-4590-4265-8079-ca44d46d27c3

Page 36: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Includingmachine-readableformats

urn:uuid:41d9cbb4-4590-4265-8079-ca44d46d27c3

dc:idenAfier"urn:uuid:41d9cbb4-4590-4265-8079-ca44d46d27c3"

Page 37: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen
Page 38: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Data License (machine-readable license)

Page 39: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

LICENSING FOR DATA PUBLISHED THROUGH GBIF

http://www.gbif.org/terms/licences

GBIFGoverningBoardestablishedin2014supportinGBIFforthreelicenses

GBIFPortal(statusDecember2016)CC0 57%CC-BY4.0 31%CC-BY-NC4.0 13%

Page 40: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen
Page 41: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

DATA LICENSE REGULATES THE POSSIBILITY FOR REUSE OF DATA

•  CC0 data are made available for any use without restriction or particular requirements on the part of users.

•  CC BY data are made available for any use provided that attribution is appropriately given for the sources of data used.

•  CC NC data are made available for no-commercial use – however, how to limit what is considered to be "commercial use"?

•  CC SA data are made available provided conditional that derived products also are shared alike as CC SA – notice that this could block desired commercial products?

•  CC ND data are made available for verification read-only, however no modifications or derived products are allowed (blocking reuse)!

Page 42: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

NORWEGIAN LICENSE FOR PUBLIC DATA (NLOD)

•  NLOD Norwegian license for public data is compatible with CC BY 4.0.

•  http://data.norge.no/nlod/no/1.0

•  Recommended to use CC BY 4.0 for broader compatibility and understanding also outside of Norway (alternatively declare both).

Page 43: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

H2020 – OPEN DATA BY DEFAULT FROM 2017

Kilde:OpenAIRE&EUDAT,CC-BY-4.0,2013

Page 44: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

Conclusion

Page 45: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

WHY MANAGE AND PUBLISH YOUR OWN RESEARCH DATA?

•  Make your own research easier! •  Stop yourself drowning in irrelevant data.

•  Save your own data for later use.

•  Avoid accusations of fraud or bad science (e.g. p-hacking).

•  Share your research data for re-use. •  Get credit for your data. •  Meet funder/institution requirements.

Because well-managed data opens up opportunities for re-use, sharing and makes for better science!

Source:OpenAIRE&EUDAT,CC-BY-4.0,2013

Page 46: 2016 12-14 GBIF and reuse of research data. GBIF seminar in Bergen

NodeteamatNHM,UniversityofOsloDagEndresen,NodemanagerChrisAanSvindseth,Databasemanager

FridtjofMehlum,ResearchdirectorEinarTimdal,AssociateprofessorGeirSøli,AssociateprofessorVidarBakken,Consultant

Artsdatabanken,Trondheim

WouterKochNilsValland

NTNUUniversityMuseumAndersFinstad,GBIFSciencecommiOee

ResearchCouncilofNorway

PerBacke-Hansen,HeadofdelegaQon

Contactusat:[email protected]