taming the big data beast - together
DESCRIPTION
Kennisalliantie Nieuwjaarsreceptie 31 januari 2013: Prof. dr. Jacob de Vlieg: “Taming the Big Data Beast Together” CEO en wetenschappelijk directeur van het Netherlands eScience Center (NLeSC)TRANSCRIPT
Netherlands eScience Center ICT Synergy Hub, Amsterdam
Taming the Big Data Beast - Together Nieuwjaarsbijeenkomst Kennisalliantie Delft, 31 januari-2013 Prof. dr. Jacob de Vlieg ¹ ² 1. CEO & Scientific Director of Netherlands eScience Center, NWO-SURF 2. Head Computational Design & Discovery, CMBI, Radboud University, Medical Center, Nijmegen, Netherlands
Agenda
• Big Data in Science: Challenges & Opportunities – Top Sector ICT Roadmap theme: “Data, Data, Data”
• Netherlands eScience Center (NLeSC)
– Expert centre for Big Data Research
• Joint NWO-NLeSC “Big Data” project call
– Public-private partnerships
Data are the lifeblood of modern science and the digital economy
Data are the lifeblood of modern science and the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities
Data are the lifeblood of modern science and the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities
Big Data: a complex concept – 4Vs: Volume, Variety, Velocity, Verification
Data are the lifeblood of modern science and the digital economy
Managing, analyzing, linking & re-using data to create business
value and/or scientific breakthroughs e.g. – Social media data to influence consumer choices – Sensor networks data: e.g. sensor-enabled smart dikes – Imaging & biobanking data in health care e.g. diagnostics, medicine – And many more opportunities.
Big Data: a complex concept – 4Vs: Volume, Variety, Velocity, Verification
Big Data inextricably connected to eScience/HPC ICT top sector roadmap: e-Science is about intelligent infrastructure to
model and/or to access big data
Key eScience challenges Big Data research
– Cross-type data integration – Data-driven & multi-models simulations – Visualization & analytics – High performance computing: connected computers & fast networks.
Key eScience challenges Big Data research
– Cross-type data integration – Data-driven & multi-models simulations – Visualization & analytics – High performance computing: connected computers & fast networks
– Stimulate culture of knowledge sharing: no silos; data stewardship – Rationalization of ICT landscapes; interoperability & industry data standards – Training & education
Science itself is changing …We need to change with it…
Neelie Kroes in “Giving Europe’s Scientists the Tools to Deliver”
Two key words: multidisciplinary research & data-driven discovery
eScience and the mystery of the empty labs
eScience and the mystery of the empty labs
eScience and the mystery of the empty labs
• Much more data per experiment (miniaturized and/or automation) • External data sources & outsourcing • Experimental design, data management & analytics(eScience)
Use apps and wearable sensors to monitor daily life e.g. hours of sleep, food consumed, exercise taken, etc. Quantified Self = Big Data + Mobile + Sensors + Visualization + Gamification .
Quantified Self Movement -> Big Data
eScience Hero
• Big Data
• Pattern recognition
• Machine learning
• Social Media
Andy Grove (ex-CEO Intel)
Fights for medical innovation; parkinson’s disease
Voice algorithms spot Parkinson's disease: data-driven diagnostics
• Machine learning algorithms that analyse voice recordings to detect Parkinson's symptoms early on (Little at al. @ Media Lab, MIT)
• Social Media:
Looking for volunteers to contribute to the database to improve pattern recognition
Voice algorithms spot Parkinson's disease: data-driven diagnostics
• Machine learning algorithms that analyse voice recordings to detect Parkinson's symptoms early on (Little at al. @ Media Lab, MIT)
• Social Media:
Looking for volunteers to contribute to the database to improve pattern recognition
•21andme •PatientsLikeMe.com •And so on
Social networking health sites: patient-driven data collection
Big Data V= Verification: privacy, compliance, etc
'Data Scientist' is now the hottest job title in Silicon Valley…
Tim O'Reilly Founder of O'Reilly Media Supporter free software and open source movements
McKinsey projected that the US needs 140,000 to 190,000 more workers with “deep analytical expertise”
Netherlands eScience Center
Netherlands organization for scientific research:
Principal Dutch body for ICT innovation for research
NL-eSC SURF Science park, Amsterdam; SARA, EGI Networked innovation model Bridge:
•Science & advanced ICT •Industry & Academic Research
•Training & Education New ways to do research made possible because of Big Data/eScience
NLeSC portfolio divided in themes •Sustainability & Environment - Climate - Water management -Energy -Ecology •Chemistry & Materials -Chemistry
•Humanities & Social Sciences - Humanities -Social Sciences
•Life Sciences - Green Genetics - Translational Research IT - Foods - Cognition/Neuroscience •eScience Methodology & ‘Big Data’ - eScience Methodology - Astronomy
Can scientists from digital humanities help food researchers?
Digital Humanities: BiographyNED
Project Leader: Guus Schreiber
Will improve current version of the Biography Portal by incorporating analytical tools to show interconnections, trends, geographical maps and time lines.
Food Research: Food Specific Ontologies for Food Focused Text Mining
Project Leader: Wynand Alkema
Addressing absence of domain specific structured vocabularies which limits the use of data mining & knowledge management methods in food research.
eScience & Big Data: providing leads for new food applications
NLeSC eScience engineers: Scientists bridging research and advanced ICT
Deliver sustainable solutions for data-driven research Work both at center and on site
NLeSC eScience Engineers: Work both at center and on site: •Exchange of eScience expertise •Re-use of proven eScience (technology hopping) •Career development & training
Collaborative Innovation Network Taming the Data Beast Together
SMEs,etc
Grand scientific challenges leads to innovative eScience & Big Data Research
•eScience to allow unprecedented level of detail (large scale distributed computing) •State-of-the-art visualization techniques to analyze hundreds of Terabytes of output
•Re-use of proven eScience concepts in new areas (e.g. sector water)
Prof. Henk Dijkstra, Univ. of Utrecht NLeSC Integrator Climate
eSalsa NLeSC project: data-driven simulations & advanced visualization to understand Climate Change
Dr. Jason Maassen eScience Engineer NLeSC
The number of data-driven start-ups is growing—particularly when it comes to social media.
Taming the Big Data Beast
Development of a high performance Twitter analysis platform
Hadoop – MapReduce architecture @ a large SARA computer cluster
Smart search & analysis software
Goal is to ask “Big Data” research questions e.g.
• Ability to analyze microblogging data produced over years • Time dependant • Real time sentiment analysis • And so on…
Prof. Antal van den Bosch NLeSC Integrator Humanities Radboud University Nijmegen
Dr. Erik Tjong Kim Sang eScience Engineer NleSC
Cyber-common: a facility for 21st century data-driven research and multidisciplinary team work
SURF-SARA-NLeSC
To link minds and eScience
The key to scientific questions y!
Cyber-common: a facility for 21st century data-driven research and multidisciplinary team work
SURF-SARA-NLeSC
To link minds and eScience
The key to scientific questions y! The key to scientific questions yet unasked!
Joint NWO-NLeSC “data sciences” call • Focus on stimulating public-private partnerships
• Three instruments:
– Industrial Partnership Programme (IPP) – Technology Area’s (TA) – Knowledge Innovation Mapping SMEs (KIEM MKB)
Rosemarie van der Veen-Oei (NLeSC) [email protected] T 070 3440 851
Mark Kas (NWO) [email protected] T 070 3440 811, M 06 205 93 207
www.nlesc.nl Netherlands eScience Center