tim osborn: research integrity: integrity of the published record
DESCRIPTION
Tim Osborn, Reader, University of East AngliaTRANSCRIPT
![Page 1: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/1.jpg)
11/04/23 Wellcome Collection Conference Centre, 13 September 2011 slide 1
Research Integrity ConferenceThe importance of good data management
![Page 2: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/2.jpg)
Climate research data and research integrity
Dr Tim Osborn
Climatic Research Unit
School of Environmental Sciences
University of East Anglia
JISC Research Integrity Conference:the Importance of Good Data Management
13 September 2011
![Page 3: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/3.jpg)
Integrity of the published research record
Why is it important for climate research and why now?(Of course it’s always been important and not just for this discipline)
The global warming issue:Scientifically challengingPolitically, socially and economically contentiousHigh stakes (economic and non-economic)Under intense scrutiny
![Page 4: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/4.jpg)
Climate change hacked emails controversy
The integrity of our research was severely questionedWhat role did research data issues (management, sharing, etc.) play in this?
Need to distinguish research integrity from perceptions of research integrity
These issues probably played a rather small roleOur research data and the research record were preservedWe “created” very little raw data and we have an excellent record in preserving and publishing for re-use our derived data
Instead, the perception of doubt arose very much more from the contents of the hacked emails and their
interpretation
![Page 5: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/5.jpg)
Climate change hacked emails controversy
Improved research data management and sharing would have made little difference to the attacks on our integrity
Not to our critics, perhaps a small role in the cross-over to the main-stream media
Nevertheless, there are areas where we can improve and we received some criticism in these areasThe climate science community as a whole should improve
Data sharing for openness, for re-useImproved data management for preserving workflows and linking
articles to analysis to data (e.g. JISC ACRID)
![Page 6: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/6.jpg)
Managing and sharing research data:why should we improve?
Supports reproducibility (necessary) and repeatability (desirable)Maintains (actual and perceived) integrity of researchEssential because high-stake decisions must be informed by sound scientific assessment
Supports further exploration of scientific findingsScientific findings that are not clear cut (e.g. in the vicinity of the statistical significance) are more sensitive to variations in data, methodological choices, assumptions, etc.
Supports data re-use for other studiesWe are data poor (despite > 10,000 TB) relative to the complexity of the climate system
![Page 7: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/7.jpg)
Estimated numbers of climate change articles:Total > 100,000Just 2009 > 13,000which is > 1 / hour
Grieneisen & Zhang (2011) doi: 10.1038/nclimate1093
Sharing climate data: some challenges
![Page 8: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/8.jpg)
Data volume is already large (> 10,000 TB)Projected to grow tenfold by end of this decade
Overpeck et al. (2011) doi: 10.1126/science.1197869
Sharing climate data: some challenges
![Page 9: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/9.jpg)
Sharing climate data: some limitations
Data with non-disclosure agreements Formal or informal agreements
Holding back for future exploitation Controlling use, getting recognition
Time and resources Costs may be obvious, benefits may be unrealised Standards, meta-data and software increase the value in re-
use, but can increase the time needed
![Page 10: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/10.jpg)
Non-disclosure agreements: real or excuse?
Example 1: UK climate data Data sets must not be passed on to third parties under any
circumstances... Once the project work using the data has been completed, copies of the datasets held by the end user should be deleted... The introduction of sanctions against individuals or Departments may be considered if breaches occur.http://badc.nerc.ac.uk/conditions/ukmo_agreement.html
![Page 11: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/11.jpg)
Non-disclosure agreements: real or excuse?
Example 2: Global precipitation data One of the most widely used analyses of variations in
precipitation across the global land surface is “based on the complete GPCC monthly rainfall station data-base (the largest monthly precipitation station database of the world with data from ca. 85,000 different stations)... Corresponding to international agreement, station data provided by Third Parties are protected.”http://gpcc.dwd.de
![Page 12: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/12.jpg)
Non-disclosure agreements: real or excuse?
Informal agreements exist too Especially with newly collected data provided in advance of its
formal publication These agreements with colleagues, and the consequences of
breaching them, are genuine (regardless of what the ICO might decide if tested under FOI/EIR legislation!)
![Page 13: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/13.jpg)
Holding back data for future exploitation
Traditionally, climate data itself aren’t publishedInstead, a journal article is published reporting findings
arising from some analysis of the data Provides a citable outcome for which the scientist gains credit
This could take many months to a few years Because publishable findings may only arise from extensive
analysis of the data or from a collection of multiple records and it has to go through peer-review system
In the meantime, the data may have been shared and used under non-disclosure restrictions
![Page 14: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/14.jpg)
Ways forward…1
Providing data (and other materials) with a publication to allow it to be reproduced (or perhaps repeated)
E.g. supplementary online materialsSeen as a burden for all 13,000 climate change articles per year
Co-benefits must be evident to make this worthwhileCitation and data re-use
Potential proliferation of copies of identical (or perhaps not!) copies of datasets
Better to provide a unique identifier to existing data that have been used, rather than a copy of the data
![Page 15: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/15.jpg)
Ways forward…2
Data publicationNewly collected (observed, simulated, derived) datasets published in their own right, not as part of scientific paperMeta-data and other accompanying information
But could speed up the lag from data collection to data publication, and much lighter-touch peer review
Citable (e.g. DOI) allows due creditIdentifiable (long-lasting URI) allows unique identification
Should be unique – updates or modifications to the data should have separate unique identifier (how to link between versions –
considered in our JISC ACRID project)
![Page 16: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/16.jpg)
Preferred data archives…1
Storing data with publisher, linked directly to articleUseful (not essential) for a strong link between article and dataNot ideal for long term preservation, large datasets, tools for exploring data, searches of databases etc.Not ideal for re-use
University archiving possible, but similar disadvantagesDiscipline-specific, dedicated data centres are preferable
E.g. World Data Center system (http://www.icsu-wds.org/)WDC-Climate, WDC-Paleoclimate, BADC, BODC, ITRDB, CMIP5
![Page 17: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/17.jpg)
Preferred data archives…2
Sub-discipline specific archives superior to broader archives
More generalised approaches provide a steeper barrier for submission (e.g. describing all environmental data sets via one standard meta-data model – very large model, much to learn etc.)Approaches tailored to sub-disciplines avoid irrelevant structures, formats, meta-dataSometimes expertise is needed rather than extra meta-data
![Page 18: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/18.jpg)
Summary points Improved data sharing and links to published findings are needed across the climate science community, to increase the pace of knowledge creation and to support the integrity of published work New approaches to publishing newly constructed datasets should be encouraged and adopted where possible
Bringing benefits of citations, credit and unique identification Published articles should identify data used, preferably via citation/identification of already published data rather than providing a further copy of the data Subject-specific data archives are preferred, offering better support for data re-use Other issues (non-disclosure agreements, time and resources) need to be considered – benefits must be clear to encourage them to be overcome
![Page 19: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/19.jpg)
![Page 20: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/20.jpg)
Global warming issue: high stakes
Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is highCost of reducing GHGs high, adverse impact of not doing so is low
Decision making in the actual context is much harder:Significantly reducing GHGs may prove difficult with moderate to high costsNet effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact
![Page 21: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/21.jpg)
Global warming issue: high stakes
Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high
![Page 22: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/22.jpg)
Global warming issue: high stakes
Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high
![Page 23: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/23.jpg)
Global warming issue: high stakes
Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high
![Page 24: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/24.jpg)
Global warming issue: high stakes
Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is highCost of reducing GHGs high, adverse impact of not doing so is low
![Page 25: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/25.jpg)
Global warming issue: high stakes
Decision making in the actual context is much harder:Significantly reducing GHGs may prove difficult with moderate to high costsNet effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact
![Page 26: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/26.jpg)
Time and resources
Must not mistake reluctance to commit time and resources with desire to avoid disclosureThere is a real cost involved
Standards, meta-data and software increase the value in re-use, but can increase the time needed
The answer is not simply to obtain fundingEven with specific funding, unless the benefits of sharing data, meta-data are clear there will be pressure to do things with more obvious benefits
![Page 27: Tim Osborn: Research Integrity: Integrity of the published record](https://reader036.vdocuments.us/reader036/viewer/2022062418/554da9e4b4c905ff7a8b4d8b/html5/thumbnails/27.jpg)
11/04/23 Wellcome Collection Conference Centre, 13 September 2011 slide 27
Research Integrity ConferenceThe importance of good data management