kafka & hadoop - for nyc kafka meetup

18
Kafka & Hadoop Gwen Shapira / Software Engineer

Upload: chen-gwen-shapira

Post on 02-Dec-2014

2.559 views

Category:

Software


5 download

DESCRIPTION

How do you integrate Kafka & Hadoop, and how will it get better soon.

TRANSCRIPT

Page 1: Kafka & Hadoop - for NYC Kafka Meetup

Kafka & Hadoop

Gwen Shapira / Software Engineer

Page 2: Kafka & Hadoop - for NYC Kafka Meetup

2©2014 Cloudera, Inc. All rights reserved.

• 15 years of moving data around• Formerly consultant• Now Cloudera Engineer:– Flume– Sqoop– Kafka

About Me

Page 3: Kafka & Hadoop - for NYC Kafka Meetup

3©2014 Cloudera, Inc. All rights reserved.

There’s a book on that!

Page 4: Kafka & Hadoop - for NYC Kafka Meetup

4©2014 Cloudera, Inc. All rights reserved.

We are also blogging

Page 5: Kafka & Hadoop - for NYC Kafka Meetup

5©2014 Cloudera, Inc. All rights reserved.

Getting Data from Kafka to Hadoop

There are only bad options.

It's about finding the best one.

Page 6: Kafka & Hadoop - for NYC Kafka Meetup

6©2014 Cloudera, Inc. All rights reserved.

Camus

Page 7: Kafka & Hadoop - for NYC Kafka Meetup

7©2014 Cloudera, Inc. All rights reserved.

Camus

ZooKeeper

Setup

Topic Offsets

Pro

cess

es

HD

FSO

ther

Syst

em

s

TaskTask

Task

In process Avro Files

In process Avro Files

Audit Counts

Clean Up

Kakfa

B

A

C

D

F

G H

I

E

Page 8: Kafka & Hadoop - for NYC Kafka Meetup

8©2014 Cloudera, Inc. All rights reserved.

• Kafka has no MR layer– InputFormat, OutputFormat, Utils…

• Sqoop is a generic batch ingest framework– Why no Kafka?

Missing in Action

Page 9: Kafka & Hadoop - for NYC Kafka Meetup

9©2014 Cloudera, Inc. All rights reserved.

Flume + Kafka = Flafka

Page 10: Kafka & Hadoop - for NYC Kafka Meetup

10

Sources Interceptors Selectors Channels Sinks

Flume Agent

How does work?Twitter, logs, webserver,

Kafka…

Mask, re-format,

validate…DR, critical

Memory, file

HDFS, Hbase,

Solr, Kafka

Page 11: Kafka & Hadoop - for NYC Kafka Meetup

11

But I just want to get data from Kafka to Hbase / HDFS

©2014 Cloudera, Inc. All rights reserved.

Page 12: Kafka & Hadoop - for NYC Kafka Meetup

12

Channels Sinks

Flume Agent

Kafka ChannelKafka! HDFS,

Hbase, Solr

Page 13: Kafka & Hadoop - for NYC Kafka Meetup

13©2014 Cloudera, Inc. All rights reserved.

SparkStreaming

Single Pass

SourceRawInputDStream

RDD

SourceRawInputDStream

RDD

RDD

Filter Count Print

SourceRawInputDStream

RDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first Batch

First Batch

Second Batch

Page 14: Kafka & Hadoop - for NYC Kafka Meetup

14©2014 Cloudera, Inc. All rights reserved.

Storm

Spout

Source

Split wordsbolts

Split wordsbolts

Spout

Split wordsbolts

Split wordsbolts

Count

Count

Count

Spout Layer Fan out Layer 1 Shuffle Layer 2

Page 15: Kafka & Hadoop - for NYC Kafka Meetup

15©2014 Cloudera, Inc. All rights reserved.

Retro Thoughts

Page 16: Kafka & Hadoop - for NYC Kafka Meetup

16©2014 Cloudera, Inc. All rights reserved.

• Data often has schema• At least it should• Kafka is unaware – which is good• Need capability to figure out schema for

events• Without including it in every event

Schema

Page 17: Kafka & Hadoop - for NYC Kafka Meetup

17©2014 Cloudera, Inc. All rights reserved.

Kafka in Cloudera Manager

Page 18: Kafka & Hadoop - for NYC Kafka Meetup

18

Visit us at

Booth #305

BOOK SIGNINGS THEATER SESSIONS

TECHNICAL DEMOS GIVEAWAYS