scientific data for open knowledge circulation. an ever changing perspective from wherever you come...
TRANSCRIPT
Scientific Data
for
open knowledge circulation.
An ever changing perspective from wherever you come from
John Wood
Secretary-General, Association of Commonwealth Universities
Chair of European Research Area Board
Previous chair of High Level Group on Scientific Data
“Data is the new oil” Expert says
Peninsular Newspaper 5th December Qatar
Which has the greatest impact – nature or nurture?PSID: longitudinal data on 8000 families over 40 years
Where are the brown dwarfs?NVO: Data from 50+ astronomical sky surveys and large-scale telescopes.
Are current stresses on this bridge dangerous?Terabridge data set: Structure sensor data for real-time data mining, event detection, decision support and alert dissemination
How does disease spread?PDB: World wide reference collection of protein structure information
What is the impact of a large-scale earthquake on the Southern San Andreas Fault?Digital data from Southern California Earthquake Center simulations used for disaster planning and building requirements
Research today
Slide from Dr. Francine Berman's presentation “Got Data? New Roles for Libraries in Shaping 21st Century Research”Dr. Berman is V. President for Research, Rensselaer Polyt. Institute; Co-Chair of US Blue Ribbon Task Force for Sustainable Digital Preservation and Access
A world in transformation...
Technology– a major factor of change
Internet (instantaneous communication)
Miniaturisation (pervasiveness)
Virtualisation (information/data)
Science in transformation
More intense/global collaborations between scientists, between machines across disciplines
ICT Infrastructures enabling e-Science drivers of social transformations Impact of cyber-democracy – who can we trust?
Information becomes an infrastructure
• Science 2.0 – main trends (figure 1)
“[…] The data availability landscape transforms because of interrelated major trends. The cost for accessing data has dramatically lowered: much of the useful statistics and more general data from (often publicly funded) research are now published and freely accessible in raw format on the web. […] much more data collected and archived today than ever before, and the volume is growing at an exponential rate […]”
reference: Science 2.0 (change will happen….) J.C Burgelman, D. Osimo & M. Bogdanowicz
• Scientific Data– Information cycle/continuums– Costs (associated with quality)– Roles and tensions between “today’s” stakeholders– Scenarios for the future (High-Level Group on Scientific Data)
Science in transformation
Meeting 21st Century Challenges
strategic to embrace the e-Science paradigm shift and the strategic role of e-Infrastructures as a crucial asset underpinning European research and innovation policies
e-Science benefiting from pervasive technologies for high-speed communication and information processing
Science is global, e-Science even more so: 35% of articles in leading journals result from international collaboration (that was 25% 15 years ago)
Data infrastructures are key enablers of e-Science
The Centrality of Research Infrastructures for Innovation
Information in the on-line ERA
is the basis for e-Science produced in large volumes more complex expressions of knowledge not only human readable for machine-to-machine communication volatile but need for permanence traditional organisations adapting to manage data
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
Tota
l A
rch
ive in
Terr
aB
yte
s (
TB
)
1986 1989 1993 1995 1998 2000 2003 2005 2007 2015 2020
Year
Evolution of ESA's EO Data Archives between 1986-2007
and future estimates (up to 2020)
Future Data Estimates
LANDSAT 2-4 MSS (75-Dec 93)
AQUA Modis (April 03-today)
ENVISAT LR (March 02-today)
ENVISAT HR (March 02-today)
TERRA Modis (June 01-today)
QUICK SCATT (01-today) /PROBA (May 02-today)
LANDSAT 7 ETM (April 99-Dec 03)
SEA STAR SeaWifs (Apr 98-today)
ERS 2 HR (May 95-today)
ERS 2 LBR (May 95-today)
JERS SAR/OPS VNIR (92-Sep 98)
ERS 1 HR (Jul 91-Mar 00)
ERS 1 LBR (Jul 91-Mar 00)
SPOT 1-4 HRV (87-today)
MOS 1, 1b MESSR (87-Oct 93)
NOAA 9-17 AVHRR (86-today)
LANDSAT 5 TM (April 84-today)
NIMBUS 7 (Nov 78-May 86), SEASAT (Jun-Oct 78)
CESSDACouncil of European Social Science Data
Archives
CLARINCommon Language Resources and
Technology Infrastructure
DARIAHDigital Research Infrastructure for the
Arts and Humanities
ESSEuropean Social Survey
SHARESurvey on Health, Ageing and
Retirement in Europe
Social Science and Humanities RIs currently in progressSocial Science and Humanities RIs currently in progress
Copyright © 2009 Norwegian Social Sciences Data Services Grenoble, September 10, 2009
RAMIRI Hamburg Sept 2009 - Steven Krauwer 14
CLARIN
Common Language Resources and Technology Infrastructure
Basic idea: European federation of digital archives with language data
and tools (text, speech, multimodal, gesture …)target audience humanities and social sciences scholars with uniform single sign-on access to the archiveswith access to language and speech technology tools to
retrieve, manipulate, enhance, explore and exploit dataall languages are equally importantto cover all EU and associated countries
The X-ray free-electron lasers
will provide coherent radiation
of the proper wavelength and
the proper time structure,
so that materials and the
changes of their properties
can be portrayed at atomic
resolution in four dimensions,
in space and time.
Diffraction pattern of 10 x 10 x 10 Au cluster
Fascination - FELs for hard X-rays
Technology Forecast – Storage at DESY
Year Rate Capability
[Gbyte/sec]
Storage Space
[Petabyte]
2009 1 3
2012 5 26
2016 40 200
• not a technology problem• money and manpower issues
• to be determined: • user behaviour
• compression and accept/reject algorithms
• potentially critical: access to data!
Science driver:-Integration of Data (and publications)
Neutron diffraction X-ray diffraction NMR}High-quality
structure refinement
}
Data ingest
Managing petabytes+
Common schema(s)
How to organize?
How to re-organize?
How to coexist & cooperate with other scientists and researchers?
Data query and visualization tools Support/training Performance
Execute queries in a minute Batch (big) query scheduling
Experiments &Instruments
Simulationsfacts
facts
answers
questions
?Literature
Other Archives facts
facts
Data Services
Community Support Services
Astronomy
Climatology
Chemistry
History
Biology
• Computing Infrastructure• Persistent Storage Capacity• Integrity• Authentication & Security
• API• Data Discovery & Navigation• Workflows Generation
Demography
Scientific Data(Discipline Specific)
Other Data
Researcher 1
Non Scientific World
Scientific World
Researcher 2
Aggregated Data Sets(Temporary or Permanent)
Workflows
Aggregation Path
Source: High-level Group on Scientific Data
ERA 2030: ERAB’s STRATEGIC VIEW
October 2009
An ERA driven by societal needs to address the ‘Grand Challenges’
Rising tide of data
“A fundamental characteristic of our age is the rising tide of data – global, diverse, valuable and complex. In the realm of science, this is both an opportunity and a challenge”
Report of the High-Level Group on Scientific Data, Oct 2010
“Riding the wave: how can Europe gain from the rising tide of scientific data”
http://bit.ly/riding_the_wave
Data as Infrastructure
“Our Vision is a scientific e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense [...] the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance”.
Report of the High-Level Group on Scientific Data, Oct 2010
“Riding the wave: how can Europe gain from the rising tide of scientific data”
http://bit.ly/riding_the_wave
Vision 2030
(8) Global governance promotes international trust and interoperability.
Member states should publish their strategy, and resources, for implementation, by 2015.
Create a European framework for certification for those coming up to an appropriate level of interoperability.
Create a “scientific Davos” meeting to bring commercial and scientific domains together.
IMPACT IF ACHIEVED We avoid fragmentation of data and resources.