new trends in big data: in-memory analytics, streaming computing and distributed machine learning

16
Trends in Big Data. Natalino Busa Data Platform Architect at Ing

Upload: natalino-busa

Post on 12-Jan-2017

636 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Trends in Big Data.

Natalino BusaData Platform Architect at Ing

Page 3: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Play with your phones

Page 4: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Re-think Big DataHadoop has turned 10

Page 5: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Memory is eating Big DataAmazon is delivering instances with 2 TB RAM

Facebook, Microsoft: 90% workload below the 100 GB

Machine Learning algorithms fit on a single node

Page 6: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

250 MB hard disk drive from 1979

I like Big Data and I cannot lie.

Page 7: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Disk -> RAMHadoop -> Spark

Map-Reduce -> Data Flow Graphs

HDFS -> Storage, MPPs, NoSQL

Page 8: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Wheel mill.

Stream like a boss

Page 9: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Streaming and Real-Time Analytics

Page 10: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Batch -> Event-DrivenETL -> Streaming

Hive -> Flink, Akka, Spark

Page 11: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Stream Centric Architectures

Page 12: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Spark - RDDs

Streaming SQL MLlib Graphx

Analytics, Statistics, Data Science, Model Training

HDFS NoSQL SQL

Data Sources

Map-Reduce

HDFS KAFKA

Spark: Unified Distributed Computing:SQL + Machine Learning + Graph Analytics

Hive

Page 13: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Virtual resources

Big Data Applications,

Assemble!

Page 14: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Clusters -> ResourcesOrchestrated -> Isolated

Static -> Disposable

YARN, MESOS, CoreOS, Kubernetes

Application-oriented Infrastructure

Page 15: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

Elastic: Docker, Mesos, Yarn, Kubernetes

Data Processing: Flink, Spark, Akka

Indexing: Elastic Search, Deep Learning

APIs and microservices: Akka, Python, Java

Data storage: SQL, NoSQL, HDFS, Streaming

MESOS, YARN

Spark

Streaming

SQL MLlib

Graphx

DBs

ES

C*

Application Oriented Architectures

Page 16: New trends in big data:  in-memory analytics, streaming computing and distributed machine learning

That’s all folks!

Natalino BusaData Platform Architect at Ing

@natbusa