deep dive of hdfs to hdfs sync - disaster recovery app template

18
Deep dive of HDFS to HDFS Sync Disaster Recovery App Template Yogi Devendra [email protected]

Upload: datatorrent

Post on 08-Jan-2017

172 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

Deep dive of HDFS to HDFS SyncDisaster Recovery App Template

Yogi [email protected]

Page 2: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

Agenda

● About Apache Apex

● Ingestion

● App templates

● HDFS to HDFS sync app template

● Live demo

● Scalability, fault tolerance

Page 3: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

3

● Platform and runtime engine that enables development of scalable and fault-tolerant distributed applications

● Hadoop native● Unified engine to process streaming or batch big data● High throughput and low latency● Library of commonly needed business logic● Write any custom business logic in your application

What is Apache Apex

Page 4: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

4

Page 5: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

5

● Migrate a lot more use cases to Hadoop● Productization of big data projects on Hadoop● Enable users to extract value from big data● Significant reduction of time to market for big data

applications migrating to Hadoop

Reference : https://wiki.apache.org/incubator/ApexProposal

Apex rationale

Page 6: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

● Operability

● Highly scalable and performant

● Fault tolerant

● Hadoop native

● Easy to integrate

● Easy to develop

Guiding Principles

Page 7: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

● Advertising

● IoT

● Finance

● Telecoms and

Networks

● Ingestion●...

Use cases

Page 8: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

Architecture overview

Page 9: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

IngestionData ingestion

● Obtaining, importing, and processing data for later use or storage in a database

Big Data Ingestion● Discovering the data sources

● Importing the data

● Processing data to produce intermediate data

● Send data out to durable data stores

Page 10: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

● Failure Recovery

● Copying large number of files

● Copying big files in parallel

● Bandwidth limit

Challenges in Ingestion @ scale

Page 11: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

App templates

● Look at: https://www.datatorrent.com/apphub/● Ready to use, customizable applications for big data

ingestion use-cases.● Source : https://github.com/DataTorrent/app-templates

(apache 2.0)

Page 12: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

● Use cases○Disaster recovery○Archival

● Features○Dynamic scaling○Fault tolerance○Easy to customize

HDFS to HDFS sync

Page 13: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

Application DAG

● A DAG is composed of vertices (Operators) and edges (Streams).● A Stream is a sequence of data tuples which connects operators at end-points called Ports● An Operator takes one or more input streams, performs computations & emits one or more output streams

○ Each operator is USER’s business logic, or built-in operator from our open source library○ Operator may have multiple instances that run in parallel

13

Page 14: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template
Page 15: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

Questions

Image ref [2]

Page 16: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

● Apache Apex - http://apex.apache.org/● Subscribe to forums

○ Apex - http://apex.apache.org/community.html○ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

● Download - https://datatorrent.com/download/● Twitter

○ @ApacheApex; Follow - https://twitter.com/apacheapex○ @DataTorrent; Follow – https://twitter.com/datatorrent

● Meetups - http://meetup.com/topics/apache-apex● Webinars - https://datatorrent.com/webinars/● Videos - https://youtube.com/user/DataTorrent● Slides - http://slideshare.net/DataTorrent/presentations● Startup Accelerator – Free full featured enterprise product

○ https://datatorrent.com/product/startup-accelerator/● Big Data Application Templates Hub – https://datatorrent.com/apphub

Resources

Page 17: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template

17

We Are Hiring• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

Page 18: Deep dive of HDFS to HDFS Sync - Disaster Recovery App Template