managing and exploiting the digital deluge: issues, challenges and opportunities
DESCRIPTION
Presentation, University of Bath, 5 June 2008TRANSCRIPT
1UKOLN R&D TL presentation, Bath, 5 June 2008
Managing and exploiting the digital deluge: issues, challenges andopportunities
Michael Day
UKOLN R&D TL presentation, Bath, 5 June 2008 2
The digital deluge - outline
Understanding the scope of the problem Some challenges Opportunities for researchers
UKOLN R&D TL presentation, Bath, 5 June 2008 3
The digital deluge - what is it? (1) A phrase applicable in more than one context:
The network infrastructure (the 'Exaflood'): The rapid expansion of Internet traffic, e.g. from the
streaming of movies or TV (BBC iPlayer) Managing a rapidly growing and diverse range of digital
content, e.g. Personal content, e.g. from digital cameras, e-mail Digitised content, e.g. sound and video reformatting,
e-texts generated by mass-digitisation programmes The "Data Deluge" - curating the vast amounts of
research data being generated by experiments, observational instruments and computer simulation
UKOLN R&D TL presentation, Bath, 5 June 2008 4
UKOLN R&D TL presentation, Bath, 5 June 2008 5
UKOLN R&D TL presentation, Bath, 5 June 2008 6
The digital deluge - what is it? (2) International Data Corporation (IDC) White Paper:
Estimates the digital universe in 2007 as 281 exabytes (281 billion gigabytes) and still growing fast
But these estimates include outputs from surveillance cameras, financial transaction journals, Web search logs, as well as more directly user-generated forms of content
Notes a growing environmental impact: Increased power consumption, electronic waste
Key areas of recent growth identified include: Healthcare data, e.g. medical imaging User-generated content, e.g. YouTube videos Scientific experiments, e.g. LHC (300 exabytes a year)
UKOLN R&D TL presentation, Bath, 5 June 2008 7
The digital deluge - challenges (1)
Problems of scale: Can our infrastructures begin to cope with dealing
with petabytes or exabytes of data? Technology has been quite good at keeping pace
with data growth in the past (although Moore's Law will not rescue us for ever)
Dealing with Organisational change is more problematic
The need for better co-ordination of effort is compromised by:
Professional and disciplinary differences Fragmented funding structures
UKOLN R&D TL presentation, Bath, 5 June 2008 8
The digital deluge - challenges (2) Problems of complexity:
Many different types of digital content: Structured, semi-structured, completely unstructured Mediated, non-mediated Interactivity and contextual links Sometimes key supporting information
(documentation, metadata, representation information, etc.) is missing
Content is stored in many different places: Active environments 'Repositories' of various kinds (new forms of silos?)
Ownership and privacy issues
UKOLN R&D TL presentation, Bath, 5 June 2008 9
The digital deluge - challenges (3) Organisational problems:
Lack of co-ordination between sectors, institutions, funding bodies, etc.
Still little consensus on: Deciding what needs to be kept (selection and
appraisal) Deciding who should ultimately be responsible for
looking after content, i.e. who pays? Infrastructures for preservation
These are still emerging from R&D projects and the commercial sector (rapid progress in last five years)
In HE, still questions on exactly where institutional repositories fit within the digital preservation landscape
UKOLN R&D TL presentation, Bath, 5 June 2008 10
The digital deluge - opportunities
Some opportunities: While many curation challenges remain to be solved,
the growing availability of digital content means that researchers:
Will find new and innovative ways of combining data to develop and test new research hypotheses
Will develop methodologies for mining and analysing vast amounts of data
It could also foster new and innovative ways of doing research, e.g. 'Science 2.0'
11UKOLN R&D TL presentation, Bath, 5 June 2008
Thank you for your attention