deep dive of kafka to hdfs/hadoop ingestion app template

16
© 2016 DataTorrent Chaitanya Chebolu Committer, Apache Apex Engineer, DataTorrent Dec 5, 2016 Data Ingestion - Kafka to HDFS

Upload: datatorrent

Post on 08-Jan-2017

208 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Chaitanya CheboluCommitter, Apache Apex

Engineer, DataTorrentDec 5, 2016

Data Ingestion - Kafka to HDFS

Page 2: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Agenda

2

• Introduction about Apache Apex (Architecture, Application, Native Hadoop Integration)

•What is Data Ingestion•Brief about Kafka•Kafka to HDFS App•App Templates•Kafka to HDFS Demo

Page 3: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent3

Apache Apex •Platform and runtime engine that enables development of

scalable and fault-tolerant distributed applications•Hadoop native (Hadoop >= 2.2)

No separate service to manage stream processingStreaming Engine built into Application Master and

Containers•Process streaming or batch big data•High throughput and low latency•Library of commonly needed business logic•Write any custom business logic in your application

Page 4: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent4

Apex Architecture

Page 5: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent5

An Apex Application is a DAG(Directed Acyclic Graph)

A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams

● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel

Page 6: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Typical application example

Page 7: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent7

Apex - Native Hadoop Integration

• YARN is the resource manager

• HDFS used for storing any persistent state

Page 8: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

What is Data Ingestion?

8

•Data IngestionA process of obtaining, importing, and analyzing data for

later use or storage in a database•Big Data Ingestion

Discovering the data sources Importing the data Processing data to produce intermediate data Sending data out to durable data stores

Page 9: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Brief about Kafka

9

● Distributed Messaging System.

● Data Partitioning Capability.

● Fast Read and Writes.

● Basic Terminology○ Topic ○ Producer○ Consumer○ Broker

Page 10: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Kafka to HDFS App

10

Kafka HDFS

•Consuming data from Kafka

•Writing the processed data to HDFS

Page 11: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

App Templates

11

● Ready to use, customizable applications for big data ingestion use-cases.

● Look at: https://www.datatorrent.com/apphub/● Source : https://github.com/DataTorrent/app-templates

(apache 2.0)

Page 12: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Kafka to HDFS Demo

12

Demo

Page 13: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Kafka to HDFS App Template

• Import and Launch: https://www.youtube.com/watch?v=d0RSeazfjN8

•Add Custom Logic: https://www.youtube.com/watch?v=UKIgcYPNepI

Page 14: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Resources

14

• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare: http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+apex• Free Enterprise License for Startups -

https://www.datatorrent.com/product/startup-accelerator/

Page 15: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

Page 16: Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

© 2016 DataTorrent

•Wednesday, December 7, 2016 at 7:30pm IST – ETL using RTS

Upcoming events...