laurie goodman: overcoming hurdles to data publication
TRANSCRIPT
Overcoming Hurdles to Data Publication
Laurie Goodman, PhDEditor-in-Chief GigaScience
ORCID ID: 0000-0001-9724-5976@GigaScience
(Personal Twitter Acct @Grimhawk1- but this is mostly me whining about Donald Trump, Pitbull Discrimination, and why I hate TSA and Homeland Security)
Why should we “publish” data?
1. Ioannidis et al., (2009). Repeatability of published microarray gene expression analyses. Nature Genetics 41: 142. Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8)
Out of 18 microarray papers, resultsfrom 10 could not be reproduced
Deconstructing a paper into accessible, useable, trackable, interlinked units
Need to provide credit to reward sharing and proper organization of:• Narrative• Data/Metadata
availability/curation• Source Code, Software
availability• Interoperability• Availability of workflows• Transparent analyses
Data/MetaData
Source Code, Software
Methods
Narrative
Data Sets inGigaDB
Analyses inGigaGalaxy
Paper inGigaScience
Linked to
Linked to
Open-access journal
Data Publishing Platform (under CC0 waiver)
Data Analysis Platform
How we view publishing at GigaScience
DOIs from
GigaScience Publishes (or links to) All Research ObjectsArticle (Narrative) + Data + Software + Source Code +
Methods + Workflows + Containers/Docker + VMs
Data sets inGigaDB
Analyses inGigaGalaxy
GigaSciencepaper
Linked to
Linked to
Workflow DOI
DataDOI
+ +
What is Data Publication?
1. Publishing a standard article that describes the data.
2. Making the data itself citable.
Make it easy to cite
See where it got cited!
Describe the data
Current listOf Darwin Finch Data Citations on Google Scholar
…And more
?
Data Publication HurdlesIf only it were easy…
• Data isn’t “scholarly” enough to be a citable entity (a ‘real’ paper)
• If I publish my data, I may not be able to publish the analysis paper later because journals will consider it Prior Publication
• If I publish my data, #DataParasites will use it!!*
*http://www.nejm.org/doi/full/10.1056/NEJMe1516564 Response from Functional Genomics Data Society:http://fged.org/projects/data-sharing-and-research-parasites/
F1000 ResearchChecked with Publishers and Journals about Data Publication being considered “Prior Publication”
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
The polar bear DATA was published -as a citable entity- in 2011 before publication of a data analysis paper
BUT #dataparasites!Polar Bear Data were used before the data producer’s analysis paper was published—But it garnered 5 citations.
Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science. 2012 Apr 20;336(6079):344-7. doi:10.1126/science.1216424.
Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e1003345. doi:10.1371/journal.pgen.1003345.
Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol. 2013 Sep;30(9):2145-56. doi:10.1093/molbev/mst117.
Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3):312-23. doi:10.1093/jhered/est133.
Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol. 2014 Apr 4. doi:10.1093/molbev/msu109
http://blogs.biomedcentral.com/gigablog/2014/05/14/the-latest-weapon-in-publishing-data-the-polar-bear/
However, this paper didn’t include the data citation…The Data Publication has since garnered 6 more citations
Even though the data had been released 2 years earlier and been cited in other papers- The main analysis paper was published in Cell
Analysis Paper was published in Cell.(And made the cover)
Data Publication is being tracked by this and other tracking resources
AND THAT MEANS You can get a Data IF!!
How are Data Citations Doing Overall?Proportions of Citation Types Per Year
https://blog.datacite.org/location-of-the-citation/
Looked at 1,125 Journal Articles with associated data in Dryad from 2011-2014
The Location of the Citation: Are Data Citation Recommendations Having an Effect? Elizabeth Hull, DataCite Blog
Highlights:• Dryad DOI in the works cited, as
recommended = only 6% of total articles
• Dryad DOI in the body only (including data availability sections) = 75%
• No citation (Dryad DOI not found anywhere in the article) = 20%
Good News:• Works cited in references increased from 5%
to 8% from 2011-2014• Articles with no data citation declined from
31% to 15%Bad News: With Current Growth Rate- expect to see 90% in works cited section in 2031
More Education Needed“Easiest” Way Forward is to Engage the Journal Community• Organizations providing citation guidelines should engage
“Editor Evangelists”• Editor Evangelists will do the following:
o Get Data Citation Guidelines in the Guide To Authorso Get Data Citation Guidelines in the Copy Editor
Handbooko Tell All their Editor Friends and Get a Cult following
Example: The Standardization of Gene Nomenclature in articles• The Human Genome Organization (HUGO) worked with journal editors in the
late 1990s to drive use of appropriate Gene Nomenclature, getting it into the guide to authors.
• Within about ~3 Years, standard nomenclature use was used by all
Oh- and don’t forget to have the Editors tell the Production Department that DOIs shouldn’t be stripped out and replaced with URLs.
Thanks to:Scott Edmunds, Executive EditorNicole Nogoy, Commissioning EditorPeter Li, Lead Data ManagerChris Hunter, Lead BioCuratorXiao (Jesse) Si Zhe, Database DeveloperSam Rose, Journal Development ManagerRob Davidson, Open Data Lead, Office for National Statistics
[email protected]@gigasciencejournal.com
@GigaScience
facebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog
Contact us:
Follow us:
http://gigascience.biomedcentral.comwww.gigadb.org