business implication - hdfs sync app template
TRANSCRIPT
Agenda
● DataTorrent RTS ● App Hub + App Template: HDFS Sync
● Use Cases
● Why DataTorrent RTS?
● Questions?
2
DataTorrent RTS - Overview3
Operator Library4
RDBMS• Vertica• MySQL• Oracle• JDBC
NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode
Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi
File Systems• HDFS/ Hive• NFS• S3
Parsers• XML • JSON• CSV• Avro• Parquet
Transformations• Filters• Rules• Expression• Dedup• Enrich
Analytics• Dimensional Aggregations
(with state management for historical data + query)
Protocols• HTTP• FTP• WebSocket• MQTT• SMTP
Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter
AppHub5
Click on AppHub in
DataTorrent RTS 3.6
HDFS Sync – Java App Package
App Template: HDFS Sync6
● Reading new, updated files from source Hadoop Cluster
● Polling on regular intervals
● Writing to destination Hadoop Cluster
● Highly Performant (as fast based on network bandwidth)
● Powered by Apex – Built in Fault Tolerance, Scalability
HDFS HDFSSync Files
App Template: HDFS Sync (Cont..)7
Utilization of Template8
Source Hadoop Cluster 1 (Multiple Directories)
Source Hadoop Cluster 2(Multiple Directories)
Source Hadoop Cluster n(Multiple Directories)
HDFS => HDFS Sync 1
Sync Files
Sync Files
Sync Files
HDFS => HDFS Sync 2
HDFS => HDFS Sync n
Destination Hadoop Cluster
Use Cases9
● Utilize for Data Archival
● Utilize for Data Replication
● Utilize in Data Retention Applications
● Utilize in Disaster Recovery Applications
Why DataTorrent RTS?10
● Powered by Apache Apex
● In-memory Processing
● Reusable Malhar components
● Built in Fault-tolerance, Scalability
● Ease of development
● Reduced Time to Production + Support for DevOps
11Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums
ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users
• Download - https://datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product
ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub
12
We Are Hiring• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders
Upcoming events...
Apache Apex Meetup
•Wednesday, November 23, 2016 @ 7:30pm IST – Deep Dive of HDFS to HDFS Sync App Template
14