kafkameetup shapira-141016070340-conversion-gate01

18
Kafka & Hadoop Gwen Shapira / Software Engineer

Upload: alex-lefur

Post on 14-Jul-2015

291 views

Category:

Internet


0 download

TRANSCRIPT

Kafka & Hadoop

Gwen Shapira / Software Engineer

2©2014 Cloudera, Inc. All rights reserved.

• 15 years of moving data around

• Formerly consultant

• Now Cloudera Engineer:– Flume

– Sqoop

– Kafka

About Me

3©2014 Cloudera, Inc. All rights reserved.

There’s a book on that!

4©2014 Cloudera, Inc. All rights reserved.

We are also blogging

5

Getting Data from Kafka to Hadoop

There are only bad options.

It's about finding the best one.

©2014 Cloudera, Inc. All rights reserved.

6©2014 Cloudera, Inc. All rights reserved.

Camus

7©2014 Cloudera, Inc. All rights reserved.

Camus

ZooKeeper

Setup

Topic Offsets

Pro

cesses

HD

FS

Oth

er

Syste

ms

TaskTask

Task

In process

Avro Files

In process

Avro FilesAudit Counts

Clean Up

Kakfa

B

A

C

D

F

G H

I

E

8©2014 Cloudera, Inc. All rights reserved.

• Kafka has no MR layer– InputFormat, OutputFormat, Utils…

• Sqoop is a generic batch ingest framework– Why no Kafka?

Missing in Action

9©2014 Cloudera, Inc. All rights reserved.

Flume + Kafka = Flafka

10

Sources Interceptors Selectors Channels Sinks

Flume Agent

How does work?Twitter, logs,

webserver,

Kafka…

Mask, re-format,

validate…DR, critical

Memory, fileHDFS,

Hbase, Solr,

Kafka

11

But I just want to

get data from Kafka

to Hbase / HDFS

©2014 Cloudera, Inc. All rights reserved.

12

Channels Sinks

Flume Agent

Kafka ChannelKafka! HDFS,

Hbase, Solr

13©2014 Cloudera, Inc. All rights reserved.

SparkStreaming

Single Pass

SourceRawInput

DStreamRDD

SourceRawInput

DStreamRDD

RDD

Filter Count Print

SourceRawInput

DStreamRDD

RDD

RDD

Single Pass

Filter Count Print

Pre-first

Batch

First

Batch

Second

Batch

14©2014 Cloudera, Inc. All rights reserved.

Storm

Spout

Source

Split

words

bolts

Split

words

bolts

Spout

Split

words

bolts

Split

words

bolts

Count

Count

Count

Spout Layer Fan out Layer 1 Shuffle Layer 2

15©2014 Cloudera, Inc. All rights reserved.

Retro Thoughts

16©2014 Cloudera, Inc. All rights reserved.

• Data often has schema

• At least it should

• Kafka is unaware – which is good

• Need capability to figure out schema for events

• Without including it in every event

Schema

17©2014 Cloudera, Inc. All rights reserved.

Kafka in Cloudera Manager

18

Visit us at Booth #305

BOOK SIGNINGS THEATER SESSIONS

TECHNICAL DEMOS GIVEAWAYS