intro to big data apphub: kafka to database & database to hdfs app templates
Post on 19-Feb-2017
169 Views
Preview:
TRANSCRIPT
Big Data AppHub and Application Templates
Ashwin Chandra Putta,Sanjay Pujare,
Dr. Munagala V. Ramanath
© 2017 DataTorrent Confidential – Do Not Distribute2www.ApexBigData.com
© 2017 DataTorrent Confidential – Do Not Distribute3
• Big Data is neither Productized nor Operationalized• Total Cost of Ownership (TCO) =
• Time to Develop + Time to Launch + Cost of ongoing Operations
• Provide a Product to ...• Build Applications Rapidly with Simple Interfaces, Pre-Built Apps, Code
Reuse & Debuggability
• Support Dev, Test, Prod cycle to Launch Apps quickly
• Manage and Visualize Applications for Operability
DataTorrent Vision - Productize Big Data
© 2017 DataTorrent Confidential – Do Not Distribute4
Next Gen Big Data Applications
Browser
Web Server
Kafka Input(logs)
Decompress, Parse, Filter
Dimensions Aggregate Kafka
LogsKafka
Variety of sources - IoT, Kafka, files, social media etc.Variety of sinks – Kafka, files, databases etc.* Supports low latency real time visualizations as well
Unbounded and continuous data streamsBatch support, batch processed as stream
In-memory processing with temporal window boundaries
Stateful operations: Aggregation, Rules etc --> Analytics
© 2017 DataTorrent Confidential – Do Not Distribute5
Big Data Ecosystem: Where DataTorrent fits
Data Sources Oper1 Oper2 Oper3
Hadoop (YARN + HDFS)
Sensor Data
Social Media
Web Servers
App Servers
Click Streams
Real-time analytics &
Visualizations
Real-time DataVisualization
© 2017 DataTorrent Confidential – Do Not Distribute6
DataTorrent ArchitectureSolutions for
Business Problems
Ingestion & Data Prep ETL Pipelines
Ease of Use Tools Real-Time Data VisualizationManagement & MonitoringGUI Application
Assembly
Application Templates
Apex-Malhar Operator Library
Big Data Infrastructure Hadoop 2.x – YARN + HDFS – On Prem & Cloud
Core
High-level APITransformation ML & Score SQL Analytic
s
FileSync
Dev Framework
Batch Support
Apache Apex Core
Kafka HDFS
HDFS HDFS JDBC HDFS JDBC
Kafka
© 2017 DataTorrent Confidential – Do Not Distribute7
• Building Apps such as Ingestion & Transform Apps for commonly patterns in customer use cases
App Templates – Recurring patterns
Use Case Pattern Sources Processors Sinks
Data Synchronization, Staging Data for Analytics
HDFS, Kafka, JDBC,
S3
→ HDFS,S3
Enriching Data before Staging
HDFS,JDBC,Kafka
Parser → Deduper → Enricher → Formatter HDFS,Cassandra
Merge & Transform Data Streams
Kafka,JDBC,
FileStream Merge → Transform → Filter → Enricher HDFS
Machine Scoring Kafka H2O or Custom HDFS
© 2017 DataTorrent Confidential – Do Not Distribute8
• Central repository for big data application templates
• Tested and published by DataTorrent
• Accessible via dtManage on DataTorrent RTS and direct app download from website
• Provides version updates via dtManage
AppHub – App Template Repository
© 2017 DataTorrent Confidential – Do Not Distribute9
App Templates advantages
Ease of use Time to market and TCO Real-time Visualizations
✓ Quickly import and launch app template applications
✓ Easily add business logic by adding custom operators
✓ Reduces time to production drastically
✓ Reduces cost of operations in production
✓ Real-time visualizations of operational metrics such as throughput, latency etc.
✓ Real-time visualizations of application data such as number of files processed, amount of data transferred etc.
© 2017 DataTorrent Confidential – Do Not Distribute10
• Kafka to Database Sync
• Database to HDFS Sync
Checkout more apps at: https://www.datatorrent.com/apphub/
App Template Demo
© 2017 DataTorrent Confidential – Do Not Distribute11
• Visualizations – widgets on app data• Metrics such as size of data moved, lines per file, number of errors etc
• Custom user defined metrics using apex auto-metrics• Schema enablement• Cloud Integrations
• Amazon EMR, Microsoft Azure• Upcoming app templates
• FTP → HDFS• SFTP → HDFS• Kinesis → S3• Kinesis → Redshift • Kafka → JSON parse → filter → transform → HDFS• Kafka → CSV parse → filter → transform → HDFS
Roadmap
© 2017 DataTorrent Confidential – Do Not Distribute12
Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums
ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users
• Download - https://datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product
ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub
© 2017 DataTorrent Confidential – Do Not Distribute13
We are hiring!
jobs@datatorrent.com• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders
top related