library and data lecture for inf21306
TRANSCRIPT
![Page 1: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/1.jpg)
Guest lecture: Library and data?
www.slideshare.net/hugobesemer (use on WURNET Chrome, Firefox)
20160920, Hugo Besemer
![Page 2: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/2.jpg)
Two different things
●An example of data modelling challenges for the library or if you wish: data is dirty ....
●Data management planning at Wageningen University
2
![Page 3: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/3.jpg)
Data is dirty
3
![Page 4: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/4.jpg)
The problem
I am in the tenure track, the university wants me publish in “Q1” journals
My research is funded by NWO/EU/.... And they want me to publish in “Open access” journals
![Page 5: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/5.jpg)
Journals catalogue
Open_access
QuartilesSelect title,issn from Journals where topics=“mine” INNER JOIN open_access.status=“yes” INNER JOIN Quartiles.quartile=“Q1” UNION ALL
topicstitle
Open access status
(boolean)
quartile
issn
issn
issn
![Page 6: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/6.jpg)
Let’s look in Nottingham for online status’
6
![Page 7: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/7.jpg)
But we can also go to Lund
7
![Page 8: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/8.jpg)
Confusion from Amsterdam
8
![Page 9: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/9.jpg)
Things change all the time
9
![Page 10: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/10.jpg)
So we have learned....
ISSN (primary key) is ambiguous●so you need to harmonize data
Open access status is ambiguous ●Gold, Green or Hybrid●Discussion: which one do we take
There are several sources for online status●Discussion: which one do we take?
10
![Page 11: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/11.jpg)
Journals catalogue
Quartiles
topicstitle
Romeo Sherpa (colours)
quartile
issn
issn
issn
Romeo Sherpa (colours)
DOAJ (Romeo gold)
issn
issnAPC
Hybrid publisher
issnAPC
issn
![Page 12: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/12.jpg)
Now for the quartiles
12
![Page 13: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/13.jpg)
Q1
Q2
Q3
Q4
![Page 14: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/14.jpg)
How do we compare numbers
Scientist Z. Math has a publication from 2003 with 17 citations
Scientist M. Biology has a publication from 2009 with 24 citations
![Page 15: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/15.jpg)
Baselines for Mathematics
![Page 16: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/16.jpg)
Baselines for Molecular Biology
0
100
200
300
400
0 2 4 6 8 10 12
Years after publication
Cum
ulat
ive
no. c
itatio
ns
Baselinetop 10%top 1%
![Page 17: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/17.jpg)
What does that mean for our E-R diagram?
Quartile distribution depends on topic
17
![Page 18: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/18.jpg)
Journals catalogue
Quartiles
topicstitle
Romeo Sherpa (colours)
quartile
issn
issn
issn
Romeo Sherpa (colours)
DOAJ (Romeo gold)
issn
issnAPC
Hybrid publisher
issnAPC
issn
topics
![Page 19: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/19.jpg)
19
Datamagement planning at Wageningen University
![Page 20: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/20.jpg)
Wageningen UR policy – What’s in place
●Data management plan for PhD projects●Data management plans for research groups●Data management planning course●Options for data publishing●Code Repository●“Support hub”
20
![Page 21: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/21.jpg)
Wageningen UR data policy – What needs to be resolved
Registration and accessibility of data for ongoing research Storage (security, “getting rid of external hard drives”) Research notes Legal issues?
21
![Page 22: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/22.jpg)
Day-to-day issues (from a workshop for PE&RC)
We are human Synchronizing between different platforms Relationships between files What is a logical file / folder structure? Collaborating on files
22
![Page 23: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/23.jpg)
![Page 24: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/24.jpg)
![Page 25: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/25.jpg)
![Page 26: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/26.jpg)
![Page 27: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/27.jpg)
![Page 28: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/28.jpg)
![Page 29: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/29.jpg)
![Page 30: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/30.jpg)
![Page 31: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/31.jpg)
![Page 32: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/32.jpg)
![Page 33: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/33.jpg)
Some terminology: retention
Retention: obligation to produce upon request data underlying publications for a certain time
Verification purposes or as a basis for further work Often required by scientific organizations or publishers The “Netherlands Code of conduct for Academic
Practice” requires 10 years Rule is seldom enforced
33
![Page 34: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/34.jpg)
More terminology: ‘long term storage’’
‘Long term storage’ used in the DMP format ‘Long term’ meaning
●With sufficient documentation on project, file and parameter / variable level
●In a format that is usable in the future (so preferably “ flat files”)
34
![Page 35: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/35.jpg)
More terminology: ‘publishing data’
We prefer “Data Publishing” as it implies making the data persistently accessible
That’s only possible in a service with a long-term mission It should come with a persistent identifier
independent of its current of future location
35
![Page 36: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/36.jpg)
Persistent identifiers
http://hdl.handle.net/ 1902.1/UOVMCPSWOL
http://dx.doi.org/10.1594/PANGAEA.701380
36
Scheme / ResolverPrefix (identifying institution)Suffix (identifying this dataset)
To get a persistent identifier for your dataset you need to store it with a service, and the resolver will redirect users there
![Page 37: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/37.jpg)
An example
37
![Page 38: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/38.jpg)
An example (continued)
38
![Page 39: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/39.jpg)
An example (continued 2)
39
![Page 40: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/40.jpg)
Publish all data?
40
![Page 41: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/41.jpg)
Services (1)
Discplinary services with a specific data model●EBI, NCBI (bioinformatics) example SRA●Pangaea (spatial)●GBIF (Biodiversity)
Generic (multidisciplinary) services
41
![Page 42: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/42.jpg)
Services - (2)
42
* DANS 3TU Datacentrum
Dryad Figshare Zenodo
URL http://www.dans.knaw.nl/en/
http://datacentrum.3tu.nl/en/home/
http://datadryad.org/ https://figshare.com/
http://www.zenodo.org/
Single file size
unknown - 2GB 5GB 2GB
Total disk space n.a. n.a. Extra charge for
larger sets20 GB “Please be aware that we
cannot offer infinite space for free, so donations from heavy users towards sustainability are encouraged”
Paid € 2.85 per GB (WUR covers first 500 GB)
€ 3.50 per GB (WUR covers first 500 GB)
$120 (> 20 GB extra charge)
N N
Private/public
Public (part of royal Dutch Academy for Sciences – KNAW)
Public, owned by Dutch Technical Universities
Not-for-profit company governed by members
Private, Macmillan inc.
Public, CERN
Special relationships
Wageningen UR Library acts as front office
Wageningen UR Library acts as front office
Reduced fee or free for certain journals, see http://datadryad.org/pages/journalLookup
Embedded in PLOS article submission workflow
EU (output of the Openaire plus project and used for data in the EU data management pilot)
![Page 43: Library and data lecture for inf21306](https://reader036.vdocuments.us/reader036/viewer/2022081604/587c72ee1a28abd04e8b6061/html5/thumbnails/43.jpg)
That’s all
43