winning the tour de france, research data and data stewardship
TRANSCRIPT
Presentation to Sport Data Valley meetingMay 2016
Alastair Dunning3TU.Datacentrum hosted at TU Delft Library@alastairdunning, [email protected]
Winning the Tour de France, Research Data and Data Stewardship
In the 2015 Tour de France Chris Froome won the Bastille Day Stage 10, with a 1.610m Hors Categorie climb, by 59 seconds
Critics immediately questioned Froome’s dominance over other riders, accusing him of doping.
Such criticism has been around since Froome shot to fame in 2012, and then as winner of the Tour de France in 2013
As a response, Froome’s TeamSky published the ‘power data’ behind his performance
Later in the year, Froome underwent more testing and the lab data was released
Results showed that much of Froome’s improvement was down to weight loss (>5 kilos)Since then, criticism of Froome has diminished.
What happened to TeamSky and Chris Froome is happening across scientific study.
How does any scientist look after their data? Not just to prove arguments to others but to themselves at a later time.
In a digital age, with data readily available, how does science verify and reproduce the claims it makes ?
This has led to the fields of research data management and data stewardship
I would urge anybody creating or using data as evidence to start thinking about these issues
The safe storage and protection of intellectual
capital developed by scientists
Best practice in ensuring scientific arguments are
replicable in the long term
Better exposure of work of scientists and improved
citation rates
Improved practices for meeting the demands of funders, publishers and others in respect to research data
Shared values behind Data Stewardship
Around 1 in 6 researchers at Erasmus University had no idea if their data is backed up
56 professors in the USA agreed to have their data practices analysed: “a majority of them had experienced the loss of at least one work-related digital object that they considered to be important in the course of their professional career.”
Safe storage and protection of intellectual capital
Safe storage and protection of intellectual capital
Study in Cell: The Availability of Research Data Declines Rapidly with Article Age
“We examined the availability of data from 516 studies
between 2 and 22 years old”
“The odds of a data set being reported as extant fell by 17%
per year”
“Policies mandating data archiving at publication are clearly
needed”
Safe storage and protection of intellectual capital
Disproving Einstein’s Theory of Locality - Professor Ronald Hanson and his team, including featured Ph.D. student Bas Hansen. Published in Nature
Best practice in ensuring scientific arguments are replicable in the long term
Hanson and Hensen knew they were working on a high impact paper. So they realised there would be requests for the raw data so that the experiment could be validated and the data checked for consistency. Given that scientists had been using this experimental method since the 1960s, and results had always been contested, there was a tradition of sharing data related to this experiment. So they knew from the start they would open up the data.
A couple of months since its publication and the dataset is already gaining interest. In the first six months since its deposit, the first dataset has been viewed 650 times. The second dataset has been viewed 56 times in the first three weeks. This is according to Hensen’s expectations. Hensen reckons that this shows that nearly all of the world’s other research groups involved in experimental quantum mechanics have accessed the dataset.
“The Citation Advantage presently (at the least since 2009) amounts to papers with links to data receiving on the average 50% more citations per paper per year, than the papers without links to data.”
(Astrophysics, 2012)
“Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication.”
(Cancer microarray trials, 2007)
“Findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years”
(Oceanography, 2014)
Better exposure of academic work of scientists
Improved practices for meeting the demands of funders, publishers and others in respect to research data
The 3TU.Datacentrum (soon to become 4TU) exists to help with these issues
21
Services of 3TU.Datacentrum data repository
http://data.3tu.nl/repository/
• ‘Frozen’ dataset (version) for future use & long term storage
• ‘Published’ data: visible• Open (max. 2 years embargo):
shareable• Persistent digital object identifier
(DOI): findable and citable• Sustainable formats: readable• Data Seal of Approval: safe and
secure
22
Every researcher can upload up to 10 GB of data to 3TU.Datacentrum a year free of charge. For depositing additional data there is a one off cost of € 4.50 per GB.
3TU.Datacentrum would be happy to discuss options with Sport Data Valley partners for hosting their data
Presentation to Sport Data Valley meetingMay 2016
Alastair Dunning, Research DataTU Delft & 3TU.Datacentrum
@alastairdunning, [email protected]
Winning the Tour de France, Research Data and Data Stewardship
Slide 2 - https://en.wikipedia.org/wiki/2015_Tour_de_France,_Stage_1_to_Stage_11#Stage_10Slide 3 - http://www.independent.co.uk/sport/cycling/tour-de-france-2015-doping-claims-dampen-the-mood-as-chris-froome-triumphs-10417336.htmlSlide 5 - http://www.teamsky.com/teamsky/home/article/59618#vYKyzhBzAIYy7BKH.97Slide 6 - http://chrisfroome.esquire.co.uk/Slide 14 - https://www.fosteropenscience.eu/sites/default/files/pdf/919.pdf (Erasmus); http://www.ijdc.net/index.php/ijdc/article/view/10.2.96 (Intellectual Capital at Risk, US Study) https://www.flickr.com/groups/2121762@N23/Slide 15 - http://www.cell.com/current-biology/abstract/S0960-9822(13)01400-0; https://www.flickr.com/groups/2121762@N23/Slide 16 - various. Type ‘Fire Lab University’ into Google !Slide 17 - http://datacentrum.3tu.nl/en/researchers-about-3tudatacentrum/ (forthcoming); http://www.nature.com/nature/journal/v526/n7575/full/nature15759.htmlSlide 18 - Belter CW (2014) Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets. PLoS ONE 9(3): e92590. doi:10.1371/journal.pone.0092590; Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308, Bertil Dorch. On the Citation Advantage of linking to data: Astrophysics. 2012. <hprints-00714715v2>Slide 19 - http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15266 (EU) , http://www.nwo.nl/en/policies/open+science/data+management (NWO)Slide 21 - http://data.3tu.nl/repository/
Citations