building a real-time data pipeline with spark, kafka, and python

17

Upload: memsql

Post on 15-Apr-2017

1.547 views

Category:

Data & Analytics

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 2: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Douglas ButlerProduct Manager

Page 3: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 4: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

massively parallel, lock free, FASTdistributed SQL database

in-memory, on-diskACID

JSON and geospatialtransactions and analytics

Page 5: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 6: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 7: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 8: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 9: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

2 Minute Install

Page 10: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 11: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

A Simple Pipeline

Page 12: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

from pystreamliner.api import Extractor

class CustomExtractor(Extractor): def initialize(self, streaming_context, sql_context, config, interval, logger): logger.info("Initialized Extractor") def next(self, streaming_context, time, sql_context, config, interval, logger): rdd = streaming_context._sc.parallelize([[x] for x in range(10)]) return sql_context.createDataFrame(rdd, ["number"])

Page 13: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 14: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Page 15: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

> memsql-ops pip install [package]

distributed cluster-wide

any Python package

bring your own

Page 16: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Real-time pipeline

Page 17: Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Q & A time

Feeding Cassandra with Spark-Streaming and Kafka

Realtime Risk Management Using Kafka, Python, and Spark Streaming by Nick Evans

Data Driven Performance Repository to Classify and ... · MongoDB. Cluster-Python Driver. Cassandra - Python Driver. Python. Spark Cluster. Spark - Cassandra Connector. Spark - MongoDB

Big Data Logging Pipeline with Apache Spark and Kafka

Spark Essentials Python

Release 1.4 - media.readthedocs.org · kafka-python Documentation, Release 1.4.6 Python client for the Apache Kafka distributed stream processing system. kafka-python is designed

Spark Summit 2016: Connecting Python to the Spark Ecosystem

WITSML data processing with Kafka and Spark Streaming

Streaming Big Data with Spark Streaming, Kafka, Cassandra and … · 2017-08-23 · Asynchronous Data Passing Kafka, Akka, Spark Fast, Low Latency, Data Locality Cassandra, Spark,

1 V - static.ucloud.cn · kjwc.jari spark-streaming-kafka-assembly_2.10-1.5.2.jar x(l Ï; U Ô QìÐjÛ§ spark-submit --master yarn --jars spark-streaming-kafka-assembly_2.10-1

AutoPilot Insight User’s Guide - Nastel · AutoPilot Insight User’s Guide CONFIDENTIALITY STATEMENT: ... technologies such as Kafka, STORM, Spark, MQTT, log files, Python, REST,

Spark streaming with kafka

Data Science using Python - digitalvidya.com · Spark, Apache Storm, Kafka, MongoDB. Rohit Kumar Research Assistant, ULB ... Model Evaluation and Parameters Tuning Example of …

Meet Up - Spark Stream Processing + Kafka

Scaling with Couchbase, Kafka and Apache Spark

Spark Streaming & Kafka-The Future of Stream Processing

Real Time Aggregation with Kafka ,Spark Streaming and ... · Real Time Aggregation with Kafka ,Spark Streaming and ElasticSearch , scalable beyond Million RPS Dibyendu B Dataplatform

Spark streaming with apache kafka

Stream Processing using Apache Spark and Apache Kafka

Apache Kafka with Spark Streaming: Real-time Analytics Redefined

[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story

Real time Analytics with Apache Kafka and Apache Spark

Twitter + Lambda Architecture (Spark, Kafka, FLume, Cassandra) + Machine Learning

· Apache Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content

Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala

Spark Streaming + Kafka Best Practices (w/ Brandon O'Brien)

Release 2.0.2-dev · kafka-python Documentation, Release 2.0.2-dev Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much

Kafka spark cassandra webinar feb 16 2016

Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop

Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Streaming Big Data with Spark Streaming, Kafka, Cassandra ...chariotsolutions.com/wp-content/uploads/2015/04/HelenaEdelson_ETE... · Streaming Big Data with Spark Streaming, Kafka,

Kafka pours and Spark resolves

High-Performance Python On Spark