create, curate, re-use: the expanding life course of digital research data
DESCRIPTION
Presentation to Educause Australasia 2007TRANSCRIPT
![Page 1: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/1.jpg)
a centre of expertise in data curation and preservation
Funded by:This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
Create, curate, re-use: the expanding life course of digital research data
Chris Rusbridge
EDUCAUSE Australasia May 2007
![Page 2: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/2.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Contents• Science and digital curation• Why are data important?• What kinds of data?• What to do with your data: frontiers of
practice• Repository frontiers• Changing practice
![Page 3: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/3.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Digital Curation Centre Mission“The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation”
![Page 4: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/4.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
![Page 5: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/5.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Science and curation• Creating and managing data suitable for re-use• Good curation supports good science (managing
your data properly)• Poor curation allows sloppy science?
• Data curation should save money• Murray-Rust/Frey on interesting but fruitless experiments!
• Some science impossible without curation…• QCD strong coupling constant prediction (Bethke)• Viscosity of earth mantle from Shang Dynasty eclipse
records (Pang et al)• Science depending on past baselines (eg environmental,
social sciences)
![Page 6: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/6.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Records of science• Data increasingly important as evidence
• Key part of the scholarly record (public good)• Unrepeatable observations & experiments
• Experimental verifiability (the basis of science)• Would Chang retractions have been reduced if his first
data were available?
• Allows additional interpretations• Legal and compliance
• See APSR/AERES report for good examples
![Page 7: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/7.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
What kinds of data?• Observations
• eg UARS (Upper Atmosphere) Level 0: telemetry• UARS Level 1: measured physical parameters (post
calibration?)
• Derived data• UARS Level 2: calculated geophysical? profiles• UARS level 3: gridded, interpolated?
• Combined data• Crafted data
• Eg annotated gene/protein databases
• Descriptive (meta)data
![Page 8: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/8.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Retaining research data means…• Data secure against loss (within group)• Communal repository (secure bit dump)• Re-usable, sharable information• As above, plus active curation (eg bio-
informatics)• Long term preservation of information
• Be clear what you are trying to do!
![Page 9: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/9.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
… or the data trajectory is…• Hard drive lost (crash)• Hard drive DVD Cardboard box Loft
Skip/dumpster lost
• Sometimes this is a very bad thing• Sometimes these are the right options!
![Page 10: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/10.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Long term bit storage…• A solved problem? Just requires well-
understood good data management practices?
• Wrong! For very large datasets over very long time, there are significant problems…
BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys '06. Leuven, Belgium, ACM.
![Page 11: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/11.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
How Well Must We Preserve?
Keep a petabyte for a century
– With 50% chance of remaining completely undamaged
Consider each bit decaying independently
– Analogy with radioactive decay
That's a bit half life of 10**18 years
– One hundred million times the age of the universe
That's a very demanding requirement
– Hard to measure
– Even very unlikely faults will matter a lot
•Slide from David Rosenthal, LOCKSS
![Page 12: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/12.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
What to do about curation• Build curation/reusability into your workflow
• Curation begins before creation• What’s easy at first becomes (impossibly) hard
later• Describe your data (metadata schemas,
“representation info”, etc)• Keep experimental parameters (technical, who,
what, when, where)• Keep ability to process• Keep data!
![Page 13: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/13.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
What to do about curation - 2• Use standard/agreed formats for data• Make ownership & restrictions clear, &
explain how to cite your data• Offer for deposit in institutional or discipline
repository• Appraisal and selection essential• Possible time-limited embargos
• “Publish” data in support of articles
![Page 14: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/14.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Internet Archaeology: publication with data
![Page 15: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/15.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Database as book…• Buneman (early pilot)
work on IUPHAR database
• MySQL to XML database• Historic to logical
schema
• XML via XSLT to LaTeX
![Page 16: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/16.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
The StORe vision
• Seamless transport from research data to research publications and vice versa
• Bi-directional links proven in social science e-research but capable of export to other disciplines Source
Output
Middleware
•Slide from Graham Pryor•http://jiscstore.jot.com/WikiHome/
![Page 17: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/17.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
What are the reusability issues?• Data not neutral to hypothesis• Hard to know the risks & pitfalls of a particular
dataset• Data not self-describing: hard to find
appropriate data (but see Murray-Rust on Googling InChi etc)
• Hard to “understand” data once found• Really need information, not data!
• Hard to use data once understood
![Page 18: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/18.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Context • Data meaningless without context
• Metadata of many kinds• Representation information… from data to
information• Linkage and connection between datasets• Use your workflow!
• Provenance • Authenticity/integrity• Computational lineage
![Page 19: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/19.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Csat8-day composite and subsceneCsatE0SST8-day composite and subscenePbopt calc Ctot calc Zeu calcPPeu calcPAR subsceneHRPT
NASA
University research group1
research group3 local
decision-making body
University research group2
Slide from Rajendra Bose
![Page 20: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/20.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Access and re-use• Ethics and rights control access
• Weak in expressing this long-term
• Collaboration tools• Annotation, discussion, review (see DART…)• Re-use leading to change and development
• “Publication”• Not just in “print”• Underlying data should be “published”, too
![Page 21: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/21.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Who does curation?• Individuals• Departments or groups• Institutions, maybe through libraries• Communities• Disciplines• Publishers• National services• Other 3rd parties…
![Page 22: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/22.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Curation: Individual• “Small science 2-3 times more data than Big
science”, but much more at risk• PhD student? RA? PI? Administrator? IT support?• Data potentially on local hard drives, or at best
shared network drives• May be inadequately protected• Liable for policy-led deletion on resignation
• Individual “knows” too much (tacit knowledge)• Documentation/metadata unlikely to be adequate
• Future: gone!
![Page 23: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/23.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Curation: Individual
•© Marita Bushell
![Page 24: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/24.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Department: eCrystals• Partnership with Institutional
Repository• Specialist department
archive (& national service)• Workflow recording of lab
parameters (R4L)• Public & private elements• Trying to build eCrystals
federation (eBank 3)• Future: likely to continue
![Page 25: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/25.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Data in institutional repositories
![Page 26: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/26.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Institution: Cambridge Chemistry• 175,000 small molecule
structures in CML• Alongside Archaeology,
Manuscripts, Learning Materials, etc
• No library curation skills; dependent on research group enthusiast
• Collection isolated from other Chemistry
• (Only 5 UK institutional repositories claim to hold data)
• Future: assured…
![Page 27: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/27.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Community: LOCKSS?• Self-selected group of
collectors: closest to genuine open activity (despite Alliance)?
• Traditionally libraries collecting eJournals
• Model respects IPR• No domain expertise; rely on
origins• Data limitations…• Future: potentially very
persistent (low cost, high reliability, attack resistance, distributed)
![Page 28: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/28.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Discipline: Atmospheric Science• Strong believer in need
for domain scientists as curators
• Significant participant in “community proxy” agenda-setting activities
• Internationally fragmented resources
• Future: mostly dependent on grant funding (but strong commitment)
![Page 29: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/29.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Bio-informatics: Nature article 23 June 05
• Databases in Peril• 51 out of 89 biological databases contacted reported they
were struggling financially• 7 have closed• Several being updated in owner’s spare time• (Notes that not all deserve long term support)
• [Nucleic Acids Research reports 968 databases in 2007!]
• Major issue: money
![Page 30: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/30.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Publisher: Crystallography
• Publisher and Scientific Union
• Created key domain crystallographic standard (CIF)
• Strong motivator for deposit of structure data
• Consistent quality checks• DOIs used for structure data• Future: publishing business
model
•Slide from IUCr
![Page 31: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/31.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
National bodies: British Library• Serious and robust
approach• Legal deposit powers &
responsibilities as driver• Oriented primarily
towards “cultural heritage” (broadly interpreted)
• Little data, no science domain experience
• Future: strong future commitment
![Page 32: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/32.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
National bodies: TNA/NDAD• Specialist archive for
government datasets• Understand government
regulations, dynamics & requirements
• Subject generalists; disconnected from associated science
• Technology specialists (understand databases)
• Future: likely to pass eventually to The National Archives
![Page 33: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/33.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
3rd parties: Portico• Specific area: eJournals• Depends on publisher
agreements• No data or domain
science expertise• Future: commitment
from Mellon + publishers + subscriptions, good funding mix
![Page 34: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/34.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
3rd Parties: Iron Mountain?• Records management
IS a curation problem• Organisations like this
very likely to branch out• No domain science
expertise• Future: business case,
viability, stock market…
![Page 35: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/35.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
3rd parties: Web 2.0 style, Swivel.com??
![Page 36: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/36.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Institutions & the network• Institutions have
fundamental sustainability
• Disciplines have domain knowledge advantage but sustainability is an issue
• Can we get the best of both?
• Needs serious work to examine!
Inst’n 1
Inst’n 2
Inst’n 3
Discipline 1 X X
Discipline 2 X X
Discipline 3 X X
etc
![Page 37: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/37.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Who are the curation players?
![Page 38: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/38.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Cultural change• If we build it, will they come? NO!!• Outreach important: communication with
scientists and researchers is hard graft• Cultural change to new approach requires more:
• Incentives, rewards and mandates• Successful exemplars (well publicised)• Discipline-oriented approach (one size does not fit all)
![Page 39: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/39.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Australian context?• In the emerging context of the Research
Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance!
![Page 40: Create, curate, re-use: the expanding life course of digital research data](https://reader033.vdocuments.us/reader033/viewer/2022061202/547b2e7bb4af9f33538b456b/html5/thumbnails/40.jpg)
a centre of expertise in data curation and preservation
EDUCAUSE Australasia 2007
Thank you
•(Citations in paper in proceedings)