archiving research data, dryad,and publishers neil beagrie, charles beagrie ltd bloomsbury...
TRANSCRIPT
![Page 1: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/1.jpg)
Archiving Research Data, Dryad,and Publishers
Neil Beagrie, Charles Beagrie Ltd
Bloomsbury Conference June 2010
With contributions from Julia Chruszcz, Peter Williams, and Todd Vision
![Page 2: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/2.jpg)
Overview• The Challenge;
• The Dryad Consortium;
• Supplementary Data and Publishers;
• Research Data Preservation Costs (KRDS);
• The Future.
![Page 3: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/3.jpg)
The Challenge
![Page 4: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/4.jpg)
4
PRC Global Study
n=3759
n=2940
n=1262
n=1653
n=2989
n=2118
n=1294
n=2565
n=1868
n=2273
n=841
n=2362
Source: PRC global study (forthcoming)
![Page 5: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/5.jpg)
Requesting Data
• Wicherts et al. (2006 Am. Psychol. 61, 726) requested data from the 141 most recent articles in American Psychological Association (APA) journals.
“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…”
Only 27% of authors shared their data
![Page 6: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/6.jpg)
The Dryad Consortium of Scholarly Societies and publishers (and libraries)
![Page 7: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/7.jpg)
Archiving at publication
• Avoids loss, corruption, obsolescence of data files;
• The point in time when authors are best able to ensure the correctness of data and metadata;
• Authors have incentive to deposit their data in order to complete the publication process;
• Journals are best able to monitor compliance with policy;
• In short, the “Genbank model” works.
![Page 8: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/8.jpg)
Incentives to authors• Access to colleagues’ data• Visibility and citability
– Another way for work to have high impact
• Integration– Combinability with other data adds value
• Long-term preservation– Including data format migration
• Ad hoc data sharing can be burdensome– Deposition to multiple specialized repositories– Fulfilling individual requests for data takes effort
![Page 9: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/9.jpg)
Joint Data Archiving Policy
• DEPOSIT AT PUBLICATION– As a condition for publication, all data used in the paper should be
archived in an appropriate public archive.
• REPEATABILITY– Data should be given with sufficient detail so that together with the
paper content, each result in the published paper may be re-created.
• EMBARGO– Authors may elect to have the data publicly available at time of
publication, or if the archive allows opt to embargo access to the data.
• EXCEPTIONS– Exceptions may be granted at the discretion of the editor, especially
for sensitive information such as the location of endangered species.
• COORDINATION– The aim is for the Dryad consortium of journals to adopt this policy
simultaneously.
![Page 10: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/10.jpg)
That’s all well and good, but where’s this “appropriate
public archive”?
![Page 11: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/11.jpg)
A mosaic of specialized databases• There are a growing number to which deposition
is encouraged/required (Genbank, Treebase)– And others are emerging
• A world in which every datatype had its own required database, each with its own submission system:– Would be a huge burden on authors– Would inevitably leave some data orphaned– Might never be financially possible
![Page 12: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/12.jpg)
Overcoming the submission burden
• Integrating journal submission and data submission– Prepopulating bibliographic metadata– “Handshaking” with specialized repositories
• Enhancing low-quality author-provided metadata– Human curation– Machine assisted metadata enhancement
![Page 13: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/13.jpg)
The Dryad Digital Repository
![Page 14: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/14.jpg)
The Repository
• Dryad is a repository (at Duke) for datasets underlying scientific research articles;
• Its initial focus has been evolution and ecology;�• Participating journals subscribe to the Joint Data �
Archiving Policy;• Dryad datasets will have (DOIs), and Creative �
Commons ‘CC-Zero’ licenses;• Project Funded by the National Science Foundation �
2008-2012;• Sustainability plan a key deliverable.
![Page 15: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/15.jpg)
Supplementary Data and Publishers
![Page 16: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/16.jpg)
Overview• Consultancy for Dryad Sustainability: covered areas of draft
business plan and sustainability for Dryad
• Presenting one of the contributions(publishers) to section on Comparators and Costs
• Outcomes from desk research and 12 interviews with publishers/data publishers + some additional input drawn from Keeping Research Data Safe
• Very brief presentation – article in preparation for Learned Publishing Oct 2010 issue….KRDS2 available from JISC
![Page 17: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/17.jpg)
Interviewees• Journal of Clinical Investigation• Journal of the American Medical Association• Molecular Phylogenetics and Evolution (Elsevier)• Journal of Heredity (OUP)• Ecological Society of America• Wiley-Blackwell + Ecology Letters• Royal Society• Federation of American Societies for Experimental Biology• OECD Publishing• Internet Archaeology and Archaeology Data Service• Pangaea: Publishing Network for Geoscientific & Environmental
Data• Dataverse Network (Social Sciences, Harvard)
![Page 18: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/18.jpg)
Some Findings: growth• Many interviewees stated that supplementary data and
materials are showings rapid growth• 3 gave figures: from 32 articles in 2000, to 251 in 2009 – an
increase of 784%; from 6% in 2005 to 38% in 2009; from 2% a decade ago to 87% in 2009.
![Page 19: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/19.jpg)
Some Findings: workflow• supplementary data have grown organically at the various
journals investigated (author driven);• Both the work and the costs being absorbed into the daily
running of journals;• in 4 cases minimal impact on work duties; in 5 others there was a
significant but often unquantified impact (two of these might be considered data publications with a focus on publishing data papers or datasets); and in 3 cases the information was not available or unknown;
• can be explained in terms of level of effort or importance applied : the greatest levels of effort are associated with copy editing, format migration, addition of metadata, etc, whilst the least effort is required for simply hosting the material; and/or high-levels of automation in the workflow.
![Page 20: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/20.jpg)
Some Findings: costs• These were in most cases unknown or only partially known;• Costs mentioned but usually not quantified include: digital
storage costs, salary costs of journal staff; and long term preservation costs;
• detailed cost information was really only available from Internet Archaeology via Archaeology Data Service which had participated in an activity based costing study (KRDS2);
• Internet Archaeology archiving costs reflect those for a “dataset publisher” so only a comparator for part of Dryad’s content – large datasets.
![Page 21: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/21.jpg)
Some Findings: revenue• only author fees and journal subscription fees were
mentioned as current revenue sources for the supplementary materials in journals;
• 3 journals interviewed have author charges for supplementary materials (see next slide);
• The data archiving and sharing organisations interviewed relied primarily on (uncertain) research grants and temporary or re-current core funding, but one had access to a small endowment and another has a charging policy for some depositors.
![Page 22: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/22.jpg)
Some Findings: author charges• Journal of Clinical Investigation - authors are charged $300 for
supplemental data to appear online with accepted articles; • Ecological Archives - submission of ‘appendices and
supplements’ is free up to 10MB. Above this, there is a fee of $250 for the first 1 GB and $50 for each subsequent GB. The fee for publication of a data paper is $250 for publication of the abstract in the relevant journal plus publication of up to 10 MB in Ecological Archives. An additional $250 is charged for data sets between 10MB and 1GB, and for larger datasets there is an additional $50 per GB fee;
• The Federation of American Societies for Experimental Biology (FASEB) charges $100 for each Supplemental file.
![Page 23: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/23.jpg)
Keeping Research Data Safe (KRDS1 & KRDS2):
JISC-funded studies of Research Data Preservation Costs
(separate Dryad costing project by Lori Eakin-Richards based on KRDS approach)
![Page 24: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/24.jpg)
KRDS: what did we learn?Whole of Service costing/Seeing the“Big Picture”
Selection of 2009 Allocation of UKDA Activity Costs
Acquisition 5.8%
Ingest 21.5%
A. Storage +Pres. Planning 3.1%
Access 16.9%
![Page 25: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/25.jpg)
KRDS:Implications
• Changing view of digital preservation costs: – “getting stuff in and out” costs much higher than
“keeping it (bit preservation + migration)”;– Staff costs c.70% of total costs;– Importance of economies of scale and
automation;– Findings of KRDS and Dryad Repository’s own
activity costing projections fed into Dryad sustainability planning.
![Page 26: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/26.jpg)
Future Plans• Dryad sustainability plan being put to Dryad
member societies and publishers;
• Dryad extending consortium to new members –achieving economies of scale;
• Bid to JISC to establish Dryad-UK;
• Extending KRDS research and implementations.
![Page 27: Archiving Research Data, Dryad,and Publishers Neil Beagrie, Charles Beagrie Ltd Bloomsbury Conference June 2010 With contributions from Julia Chruszcz,](https://reader034.vdocuments.us/reader034/viewer/2022051110/5515dc5c550346d46f8b4aea/html5/thumbnails/27.jpg)
Further InformationDryad see www.datadryad.org
Keeping Research Data Safe2 (KRDS2) webpage at www.beagrie.com/jisc.php
KRDS2 report available from JISC website http://www.jisc.ac.uk/publications/reports/2010/keepingresearchdatasafe2.aspx#downloads
Email: [email protected]