may 03 2016, school of maths & stats, uc, ilam raaz...

27
Apache Spark for scalable data analytics Raaz Sainudiin May 03 2016, School of Maths & Stats, UC, Ilam

Upload: others

Post on 04-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

Apache Spark for scalable data analytics

Raaz SainudiinMay 03 2016, School of Maths & Stats, UC, Ilam

Page 2: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

Where does big Data Come From?• It’s all happening online – could record every:

» Click

» Ad impression

» Billing event

» Fast Forward, pause,...

» Server request » GPS signal

» Transaction » Network message » Fault » …

» social media feed on twitter, yelp, facebook, instagram, youtube, ...

http://www.nlinews.com/2013/big-data-analytics/

Page 3: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

We can measure much faster than we can computeCost of DNA sequencing decays faster than that of computing, sensors are everywhere, ...

http://www.nlinews.com/2016/seamlessly-connecting-the-iot/, http://www.economist.com/node/16349358, http://www.symmetrymagazine.org/article/august-2012/particle-physics-tames-big-data

Page 4: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

What can you do with Big Data?

Antony Joseph’s UC Berkeley EdX Intro to Big Data Course (AJ2015)

Page 5: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

What can you do with Big Data? App with real-time traffic maps, flows, ...

Antony Joseph’s UC Berkeley EdX Intro to Big Data Course (AJ2015)

Page 6: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 7: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Traditional Analysis tools (sh, pandas, R) run on a single machine!!!

The Big Data Problem● A single machine can no longer process

or even store all the data!

● Only solution is to distribute data over large clusters

Page 8: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 9: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 10: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 11: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 12: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Word Count Example via Map & Reduce

Page 13: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 14: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 15: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 16: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 17: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 18: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 19: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 20: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 21: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

One Spark to rule them all!

Page 22: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 23: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 24: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 25: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

Join Christchurch-Apache-Spark-Meetup to learn Spark!https://itbrief.co.nz/article/university-canterbury-embraces-cloud-computing-and-big-data/, http://www.meetup.com/Christchurch-Apache-Spark-Meetup/, http://goo.gl/OIKRRL

Page 26: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015

Page 27: May 03 2016, School of Maths & Stats, UC, Ilam Raaz Sainudiinlamastex.org/talks/20160503_ApacheSpark_canterbury_tech.pdf · 2016-05-03 · Apache Spark for scalable data analytics

AJ2015