powering predictive mapping at scale with spark, kafka, and elastic search: spark summit east talk...

16
© 2016 Mesosphere, Inc. All Rights Reserved. 1 @joerg_schad @dcos #smack Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search Spark Summit East February 08, 2017

Upload: spark-summit

Post on 21-Feb-2017

204 views

Category:

Data & Analytics


0 download

TRANSCRIPT

© 2016 Mesosphere, Inc. All Rights Reserved. 1

@joerg_schad @dcos #smack

Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search

Spark Summit EastFebruary 08, 2017

© 2016 Mesosphere, Inc. All Rights Reserved. 2

Jörg SchadDistributed Systems Engineer

@joerg_schad

© 2016 Mesosphere, Inc. All Rights Reserved. 3

HYPERSCALE MEANS VOLUME AND VELOCITY

Batch Event ProcessingMicro-Batch

Days Hours Minutes Seconds Microseconds

Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics

Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product Recommendations

© 2016 Mesosphere, Inc. All Rights Reserved. 4

SMACK stack

EVENTSUbiquitous data streams from connected devices

INGEST

Apache Kafka

STORE

Apache Spark

ANALYZE

Apache Cassandra

ACT

Akka

Ingest millions of events per second

Distributed & highly scalable databaseReal-time and batch

process dataVisualize data and build data driven applications

DC/OS

Sensors

Devices

Clients

© 2016 Mesosphere, Inc. All Rights Reserved. 5

NAIVE APPROACH

Typical Datacentersiloed, over-provisioned servers,

low utilization

Industry Average12-15% utilization

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 6

Mesos & DC/OS

© 2016 Mesosphere, Inc. All Rights Reserved. 7

MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS

Typical Datacentersiloed, over-provisioned servers,

low utilization

Mesos/ DC/OSautomated schedulers, workload multiplexing onto the

same machines

mySQL

microservice

Cassandra

Spark/Hadoop

Kafka

© 2016 Mesosphere, Inc. All Rights Reserved. 8

DC/OS ENABLES MODERN DISTRIBUTED APPS

Datacenter Operating System (DC/OS)

Distributed Systems Kernel (Mesos)

Big Data + Analytics EnginesMicroservices (in containers)

Streaming

Batch

Machine Learning

Analytics

Functions & Logic

Search

Time Series

SQL / NoSQL

Databases

Modern App Components

Distributed systems kernel to abstract resources

Ecosystem of frameworks & apps

Consistent architecture to run on top of kernel

User Interface (GUI & CLI)

Core system services (e.g., distributed init, cron, service discovery, package mgt & installer, storage)

Any Infrastructure (Physical, Virtual, Cloud)

© 2016 Mesosphere, Inc. All Rights Reserved. 9

EXAMPLE:REAL-TIME TRACKING

© 2016 Mesosphere, Inc. All Rights Reserved. 10

GEO-ENABLED IoT

© 2016 Mesosphere, Inc. All Rights Reserved. 11

DATA FLOW

© 2016 Mesosphere, Inc. All Rights Reserved. 12

DEMO

© 2016 Mesosphere, Inc. All Rights Reserved. 13

THANK YOU!

ANY QUESTIONS?

@dcos

[email protected]

/groups/8295652

/dcos/dcos/examples/dcos/demos

chat.dcos.io

© 2017 Mesosphere, Inc. All Rights Reserved. 14

Keep it running!

© 2016 Mesosphere, Inc. All Rights Reserved. 15

SERVICE OPERATIONS

● Configuration Updates (ex: Scaling, re-configuration)● Binary Upgrades● Cluster Maintenance (ex: Backup, Restore, Restart)● Monitor progress of operations● Debug any runtime blockages

© 2016 Mesosphere, Inc. All Rights Reserved. 16

Typical Use: distributed, large-scale data processing; micro-batching

Why Spark Streaming?● Micro-batching creates very low

latency, which can be faster● Well defined role means it fits in well

with other pieces of the pipeline

APACHE SPARK (STREAMING)