peter bakas - zero to insights - real time analytics with kafka, c*, and spark - nosql matters...

42
Zero to Insights Real time analytics with Kafka, C*, and Spark Peter Bakas

Upload: nosqlmatters

Post on 05-Aug-2015

94 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Zero to InsightsReal time analytics with Kafka, C*, and Spark

Peter Bakas

Peter Bakas | @peter_bakas

@ Netflix : Cloud Platform Engineering - Event and Data Pipelines

@ Ooyala : Analytics, Discovery, Platform Engineering & Infrastructure

@ Yahoo : Display Advertising, Behavioral Targeting, Payments

@ PayPal : Site Engineering and Architecture

@ Play : Advisor to various Startups (Data, Security, Containers)

Who is this guy?

Let’s get down to business

Netflix is a logging company

that occasionally streams video

● 450 billion events per day

● 8 million events & 17 GB per second during peak

● Hundreds of event types

By the Numbers

Publish, Collect, Process, Aggregate & Move Data

@ Cloud Scale

What does it take to run @ Cloud Scale?

How did we get here?

EMR

EventProducer

Chukwa

What are we supposed to do?

EventProducer

Druid

EMR

Stream Consumers

KafkaRouter

Suro

EventProducer

EventProducer

Druid

EMR

Stream Consumers

KafkaRouter

Suro

EventProducer

Where are we going?

Stream Consumers

Router

EMR

FrontingKafka

EventProducer

Druid

ConsumerKafka

Keystone

Stream Consumers

Router

EMR

FrontingKafka

EventProducer

Druid

ConsumerKafka

Keystone

Stream Consumers

Router

EMR

FrontingKafka

EventProducer

Druid

ConsumerKafka

Keystone

Routing Service

++

Stream Consumers

Router

EMR

FrontingKafka

EventProducer

Druid

ConsumerKafka

Keystone

ConsumerKafka

Custom Apps

Real time processing

ConsumerKafka

Custom Apps

Real time processing

ConsumerKafka

Custom Apps

Real time processing

ConsumerKafka

Custom Apps

Real time processing

ConsumerKafka

Custom Apps

Real time processing

FrontingKafka

Ooyala’s experience

About Ooyala

Powering personalized video experiences across all screens.

● 5 billion events per day

● 1 billion videos per month

● 200 million unique users per month

● 130 countries

● 25% of U.S. online viewers watch video powered by Ooyala

By the Numbers

Where did it all start?

Precomputed Aggregates

What if we need more dynamic queries?

Why not just use C*?

What were the options?

100% Precomputation 100% Dynamic

Where we wanted to be

100% Precomputation 100% Dynamic

Partly dynamic

Our solution

API

loggersloggersloggersloggersloggers

loggersloggersloggersloggersingest

loggersloggersloggersloggersjob server

Delphi - Real time AnalyticsKafka

Challenges

● Hiring● Rapidly evolving ecosystem● Enterprise Service for Enterprise Software

Challenges

Q&A time!

Obligatory...

Everyone is hiring

[email protected]