business implication - hdfs sync app template

14
Business Cases - HDFS to HDFS (Sync) Mohit Jotwani [email protected]

Upload: datatorrent

Post on 08-Jan-2017

58 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Business Implication - HDFS Sync App Template

Business Cases - HDFS to HDFS (Sync)

Mohit [email protected]

Page 2: Business Implication - HDFS Sync App Template

Agenda

● DataTorrent RTS ● App Hub + App Template: HDFS Sync

● Use Cases

● Why DataTorrent RTS?

● Questions?

2

Page 3: Business Implication - HDFS Sync App Template

DataTorrent RTS - Overview3

Page 4: Business Implication - HDFS Sync App Template

Operator Library4

RDBMS• Vertica• MySQL• Oracle• JDBC

NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode

Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi

File Systems• HDFS/ Hive• NFS• S3

Parsers• XML • JSON• CSV• Avro• Parquet

Transformations• Filters• Rules• Expression• Dedup• Enrich

Analytics• Dimensional Aggregations

(with state management for historical data + query)

Protocols• HTTP• FTP• WebSocket• MQTT• SMTP

Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter

Page 5: Business Implication - HDFS Sync App Template

AppHub5

Click on AppHub in

DataTorrent RTS 3.6

HDFS Sync – Java App Package

Page 6: Business Implication - HDFS Sync App Template

App Template: HDFS Sync6

● Reading new, updated files from source Hadoop Cluster

● Polling on regular intervals

● Writing to destination Hadoop Cluster

● Highly Performant (as fast based on network bandwidth)

● Powered by Apex – Built in Fault Tolerance, Scalability

HDFS HDFSSync Files

Page 7: Business Implication - HDFS Sync App Template

App Template: HDFS Sync (Cont..)7

Page 8: Business Implication - HDFS Sync App Template

Utilization of Template8

Source Hadoop Cluster 1 (Multiple Directories)

Source Hadoop Cluster 2(Multiple Directories)

Source Hadoop Cluster n(Multiple Directories)

HDFS => HDFS Sync 1

Sync Files

Sync Files

Sync Files

HDFS => HDFS Sync 2

HDFS => HDFS Sync n

Destination Hadoop Cluster

Page 9: Business Implication - HDFS Sync App Template

Use Cases9

● Utilize for Data Archival

● Utilize for Data Replication

● Utilize in Data Retention Applications

● Utilize in Disaster Recovery Applications

Page 10: Business Implication - HDFS Sync App Template

Why DataTorrent RTS?10

● Powered by Apache Apex

● In-memory Processing

● Reusable Malhar components

● Built in Fault-tolerance, Scalability

● Ease of development

● Reduced Time to Production + Support for DevOps

Page 11: Business Implication - HDFS Sync App Template

11Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums

ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product

ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub

Page 12: Business Implication - HDFS Sync App Template

12

We Are Hiring• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

Page 13: Business Implication - HDFS Sync App Template

Upcoming events...

Apache Apex Meetup

•Wednesday, November 23, 2016 @ 7:30pm IST – Deep Dive of HDFS to HDFS Sync App Template

Page 14: Business Implication - HDFS Sync App Template

14