deep dive of kafka to hdfs/hadoop ingestion app template
TRANSCRIPT
© 2016 DataTorrent
Chaitanya CheboluCommitter, Apache Apex
Engineer, DataTorrentDec 5, 2016
Data Ingestion - Kafka to HDFS
© 2016 DataTorrent
Agenda
2
• Introduction about Apache Apex (Architecture, Application, Native Hadoop Integration)
•What is Data Ingestion•Brief about Kafka•Kafka to HDFS App•App Templates•Kafka to HDFS Demo
© 2016 DataTorrent3
Apache Apex •Platform and runtime engine that enables development of
scalable and fault-tolerant distributed applications•Hadoop native (Hadoop >= 2.2)
No separate service to manage stream processingStreaming Engine built into Application Master and
Containers•Process streaming or batch big data•High throughput and low latency•Library of commonly needed business logic•Write any custom business logic in your application
© 2016 DataTorrent4
Apex Architecture
© 2016 DataTorrent5
An Apex Application is a DAG(Directed Acyclic Graph)
A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams
● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel
© 2016 DataTorrent
Typical application example
© 2016 DataTorrent7
Apex - Native Hadoop Integration
• YARN is the resource manager
• HDFS used for storing any persistent state
© 2016 DataTorrent
What is Data Ingestion?
8
•Data IngestionA process of obtaining, importing, and analyzing data for
later use or storage in a database•Big Data Ingestion
Discovering the data sources Importing the data Processing data to produce intermediate data Sending data out to durable data stores
© 2016 DataTorrent
Brief about Kafka
9
● Distributed Messaging System.
● Data Partitioning Capability.
● Fast Read and Writes.
● Basic Terminology○ Topic ○ Producer○ Consumer○ Broker
© 2016 DataTorrent
Kafka to HDFS App
10
Kafka HDFS
•Consuming data from Kafka
•Writing the processed data to HDFS
© 2016 DataTorrent
App Templates
11
● Ready to use, customizable applications for big data ingestion use-cases.
● Look at: https://www.datatorrent.com/apphub/● Source : https://github.com/DataTorrent/app-templates
(apache 2.0)
© 2016 DataTorrent
Kafka to HDFS Demo
12
Demo
© 2016 DataTorrent
Kafka to HDFS App Template
• Import and Launch: https://www.youtube.com/watch?v=d0RSeazfjN8
•Add Custom Logic: https://www.youtube.com/watch?v=UKIgcYPNepI
© 2016 DataTorrent
Resources
14
• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare: http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+apex• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/
© 2016 DataTorrent
© 2016 DataTorrent
•Wednesday, December 7, 2016 at 7:30pm IST – ETL using RTS
Upcoming events...