![Page 2: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/2.jpg)
Outline
• What is big data? What’s behind the hype? • Industry and academia outlooks • Basic tools & frameworks • National/international research and innovation agendas • Roadmap opportunities:
• Mobility & cloud • Internet of Things, cyber-physical systems • Data analytics • Datacenter automation and management, etc.
• Strengths, weaknesses, opportunities, threats
![Page 3: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/3.jpg)
What is “big data”?
• Data with properties meeting the 3-4 Vs • volume: from machines, networks, social media, etc. • variety: often unstructured • velocity: continuous flow, often real-time • veracity: full of bias, noise, abnormality, irrelevance
![Page 4: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/4.jpg)
How do we process it?
• Similar objectives as with any data • creation • retrieval • storage • analysis • presentation • visualization, etc.
• However, new scalable methods needed to effectively and efficiently process the data
![Page 5: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/5.jpg)
Origin 1: business analytics and corporate decision making in enterprises
A survey by BARC shows where data comes from
![Page 6: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/6.jpg)
More on enterprise and business analytics
A survey by Jaspersoft shows how data is stored
![Page 7: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/7.jpg)
Origin 2: The big four in cloud
Amazon, Google, Facebook, Yahoo (but now there are hundreds of followers) • It is worth studying how their systems are built
under the hood. • Based on fundamentals in distributed systems
research • New solutions that are adapted to specific
requirements, which allow for trade-offs in order to increase speed
• Adressing all 4 Vs
![Page 8: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/8.jpg)
Research fields
• Distributed and pervasive systems, grid systems
• Computer architecture, virtualization • Networking • Data mining and big data analytics • Automation • Control theory
• In combination with research in application areas (or deep understanding of user needs) !
![Page 9: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/9.jpg)
Toolsets
• The traditional tools used in the mentioned fields
• Some relatively new ones specifically for big data processing • Showing two example stacks on next page
• The potential set is huge and new inventions are added quickly
• Having some common ground knowledge and a lab that supports those tools is a success factor!
![Page 10: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/10.jpg)
BDAS
![Page 11: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/11.jpg)
Stratosphere
![Page 12: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/12.jpg)
Notes
• BDAS and Stratosphere will be presented by their originators at the Cloudberry workshop in June!
• Whatever toolsets we prefer, it should as far as possible be used in lab assignments at undergraduate and masters level
![Page 13: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/13.jpg)
Arenas and agendas
• Process IT Innovations • Cloudberry Datacenters • Centek and county municipality efforts in the
region • The information driven society (Vinnova SIO) • EU arenas, Horizon 2020
• Partnerships and cooperations
![Page 14: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/14.jpg)
Potential roadmap items follow
• Initial set, more can be added • Mostly focused on systems with experimental
research and evaluation • Theoretical evaluations where applicable
![Page 15: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/15.jpg)
Mobility and cloud computing
• Personalized (group) clouds • credentials, security
• Light-weight distributed cloud architectures • Monitoring and profiling • Make mobility and cloud even smoother
• locality, caching,
![Page 16: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/16.jpg)
Distributed algorithms and data structures
• Based on application class specific requirements and trade-offs • Many fundamentals where researched decades ago,
but with new deltas in requrements, there are opportunities
• Looking into dynamic scenarios and mobility • Not only fast lookups, but also fast re-build of data
structures, locality challenges and opportunities, etc
![Page 17: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/17.jpg)
Machine learning
Covered in depth in Fredrik Sandins report
![Page 18: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/18.jpg)
Content distribution and named data networking
• A major challenge of growth in data intensive applications (e.g., video)
• Interesting in combination with sensor data and similar models where content is produced by billions of devices • Addressing models • Data aggregation
![Page 19: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/19.jpg)
Internet of Things (IoT)
• By definition connected to the Internet • Large number of devices • Crowd sensing • Aggregation and indexing architectures • Open data, or restricted data • Resource efficiency (power, bandwidth,
storage, space etc)
![Page 20: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/20.jpg)
Cyber physical systems (CPS)
• Can encompass IoT technologies • But also embedded/closed systems • Process industry • Real-time systems • Availability, fail-over, redundancy
![Page 21: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/21.jpg)
Data analytics
• Novel analytics methods related to the data presented on previous slides
• Application specific data to analyse • Where are the gaps?
![Page 22: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/22.jpg)
SWOT Strengths good systems knowledge, experimental research, strong industry cooperation.
Opportunities the growth in datacenter industry, strong arenas, great industry interest, cross functional projects (applications software infrastructure, IoT/M2M)
Weaknesses late starter in big data, few researchers directly engaged in topic, too few graduate students in the topic.
Treats speed, ramp-up of research, lack of international cooperation, insufficient contribution/hype ratio.
![Page 23: A roadmap for big-data research and education › cms_fs › 1.145312! › file › 2014-05-09_LTU_BigDa… · A roadmap for big-data research and education at LTU olov.schelen@ltu.se](https://reader034.vdocuments.us/reader034/viewer/2022042408/5f23ad770f7e3f174123252a/html5/thumbnails/23.jpg)
So, lets kick off!
Discussions J