Kafka & Hadoop - for NYC Kafka Meetup

Download Kafka & Hadoop - for NYC Kafka Meetup

Post on 02-Dec-2014

2.535 views

Category:

Software

4 download

DESCRIPTION

How do you integrate Kafka & Hadoop, and how will it get better soon.

TRANSCRIPT

  • 1. Kafka & Hadoop Gwen Shapira / Software Engineer
  • 2. About Me 15 years of moving data around Formerly consultant Now Cloudera Engineer: 2014 Cloudera, Inc. All rights reserved. 2 Flume Sqoop Kafka
  • 3. Theres a book on that! 2014 Cloudera, Inc. All rights reserved. 3
  • 4. We are also blogging 2014 Cloudera, Inc. All rights reserved. 4
  • 5. 5 Getting Data from Kafka to Hadoop There are only bad options. It's about finding the best one. 2014 Cloudera, Inc. All rights reserved.
  • 6. 2014 Cloudera, Inc. All rights reserved. 6 Camus
  • 7. 2014 Cloudera, Inc. All rights reserved. 7 Camus Setup ZooKeeper Topic Offsets Other Systems HDFS Processes Task Task Task In process Avro Files In process Avro Files Audit Counts Clean Up Kakfa B A C D F G H I E
  • 8. Missing in Action Kafka has no MR layer InputFormat, OutputFormat, Utils Sqoop is a generic batch ingest framework 2014 Cloudera, Inc. All rights reserved. 8 Why no Kafka?
  • 9. Flume + Kafka = Flafka 2014 Cloudera, Inc. All rights reserved. 9
  • 10. 10 How does work? Sources Interceptors Selectors Channels Sinks Flume Agent Twitter, logs, webserver, Kafka Mask, re-format, validate DR, critical Memory, file HDFS, Hbase, Solr, Kafka
  • 11. 11 But I just want to get data from Kafka to Hbase / HDFS 2014 Cloudera, Inc. All rights reserved.
  • 12. 12 Channels Sinks Kafka Channel Flume Agent Kafka! HDFS, Hbase, Solr
  • 13. SparkStreaming Single Pass 2014 Cloudera, Inc. All rights reserved. 13 Source RawInput DStream RDD Source RawInput DStream RDD RDD Filter Count Print Source RawInput DStream RDD RDD RDD Single Pass Filter Count Print Pre-first Batch First Batch Second Batch
  • 14. 2014 Cloudera, Inc. All rights reserved. 14 Storm Spout Source Split words bolts Split words bolts Spout Split words bolts Split words bolts Count Count Count Spout Layer Fan out Layer 1 Shuffle Layer 2
  • 15. Retro Thoughts 2014 Cloudera, Inc. All rights reserved. 15
  • 16. Data often has schema At least it should Kafka is unaware which is good Need capability to figure out schema for events Without including it in every event 2014 Cloudera, Inc. All rights reserved. 16 Schema
  • 17. Kafka in Cloudera Manager 2014 Cloudera, Inc. All rights reserved. 17
  • 18. 18 Visit us at Booth #305 BOOK SIGNINGS THEATER SESSIONS TECHNICAL DEMOS GIVEAWAYS