elpub 2008: a review of journal policies for sharing research data
DESCRIPTION
Abstract: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals that are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBI’s Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access). Results: Of the 70 journal policies, 53 made some mention of sharing publication-related data within their Instruction to Author statements. Of the 40 policies with a data sharing policy applicable to gene expression microarrays, we classified 17 as weak and 23 as strong (strong policies required an accession number from database submission prior to publication). Existence of a data sharing policy was associated with the type of journal publisher: 46% of commercial journals had data sharing policy, compared to 82% of journals published by an academic society. All five of the openaccess journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.9, and 6.2. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 8%, 20%, and 25%, respectively. Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes. We hope it contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.TRANSCRIPT
A review of journal policies for sharing research data
Heather Piwowar, Wendy Chapman
Department of Biomedical Informatics University of Pittsburgh
ELPUB 2008
http://www.flickr.com/photos/cogdog/123072/
“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …” http://www.nature.com/authors/editorial_policies/availability.html
http://www.nature.com/nature/journal/v453/n7197/index.html
Benefits for journal – allows publications to be useful (and cited) in
additional ways – demonstrates commitment to quality research – discourages fraud
Drawbacks for journal – might decrease submissions – administrative burden
Prior work in this area
• McCain: 16% of 850 science+engineering journals have a policy about sharing RRI
• NAS: 53% of 38 life sciences journals
But these reviews are dated, consider a variety of resources, and don’t correlate policy to behaviour
McCain. Science Communication, Vol. 16, No. 4. (1 June 1995), pp. 403-431 NAS. Sharing Publication-Related Data and Materials. (2003), p. 33
• In this study, we looked at the data-sharing policies within Instruction to Author statements of 70 journals for a specific data type
• We look at themes within the statements
• We correlate the strength of the policy statements to the frequency with which the authors actually share their data
Data type: gene expression microarrays
http://en.wikipedia.org/wiki/Image:Heatmap.png
Three types of results
1. Themes within data sharing policies
2. Relative policy strength
3. Observed data sharing behaviour
Themes within data sharing policies • statements of policy motivation • datatype-specific policies • requested vs. required • data location • data format • data completeness • timeliness of sharing • consequences for not sharing • exceptions
Relative policy strength
• No applicable policy (43%)
• Weak policy (24%) – should, recommend, request – must, but without database accession number
• Strong policy (33%) – must, required, condition of publication – requires database accession number
High-impact journals tend to have
a strong data-sharing policy
What journal characteristics are associated with having a data-sharing policy?
Journal has a data sharing policy?
Impact Factor
Open Access?
Society Publisher?
Subdisciplines…
What journal characteristics are associated with having a data-sharing policy?
Journal has a data sharing policy?
Impact Factor
Open Access?
Society Publisher?
• Biochemistry &Molecular Biology • Oncology
Observed Sharing Behaviour
For each of the 70 journals, we measured % of papers with links to database
submission entries
% of submission links is our proxy for % of publications with shared data
Articles published in journals with a strong data-sharing
policy are more likely to have publicly available datasets
What journal characteristics are associated with data sharing behaviour?
% of articles with shared data
Impact Factor
Open Access?
Society Publisher?
Subdisciplines…
Having a data-sharing policy?
What journal characteristics are associated with data sharing behaviour?
% of articles with shared data
Impact Factor
Open Access?
Society Publisher?
• Genetics & Heredity • Multidisciplinary Sciences
Having a data-sharing policy?
Limitation
• Association does not imply causation
Take-home message
• Many, but not all, journals require sharing of microarray data. Very diverse policies.
• Stronger data-sharing policies: – high-impact journals – open-access journals – published by association
• Policy strength correlates with behaviour • Policies would benefit from
improved clarity, scope, and accountability
Future work
• Who shares data? • Who reuses data?
Hopefully the answers will inform our decisions about where to focus our energy to improve
policies, tools, and incentives
Thank you
Advisor: Dr. Wendy Chapman Funding: NLM for training grant, and
Pitt DBMI department for travel grant
My shared data: www.dbmi.pitt.edu/piwowar Share your research data too!
“Does anyone want your data?
That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.
Your data, too, may simply be awaiting an effective matchmaker.”
Got data? Nature Neuroscience 10, 931 (2007)