intro to big data apphub: kafka to database & database to hdfs app templates

13
Big Data AppHub and Application Templates Ashwin Chandra Putta, Sanjay Pujare, Dr. Munagala V. Ramanath

Upload: datatorrent

Post on 19-Feb-2017

169 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

Big Data AppHub and Application Templates

Ashwin Chandra Putta,Sanjay Pujare,

Dr. Munagala V. Ramanath

Page 2: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute2www.ApexBigData.com

Page 3: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute3

• Big Data is neither Productized nor Operationalized• Total Cost of Ownership (TCO) =

• Time to Develop + Time to Launch + Cost of ongoing Operations

• Provide a Product to ...• Build Applications Rapidly with Simple Interfaces, Pre-Built Apps, Code

Reuse & Debuggability

• Support Dev, Test, Prod cycle to Launch Apps quickly

• Manage and Visualize Applications for Operability

DataTorrent Vision - Productize Big Data

Page 4: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute4

Next Gen Big Data Applications

Browser

Web Server

Kafka Input(logs)

Decompress, Parse, Filter

Dimensions Aggregate Kafka

LogsKafka

Variety of sources - IoT, Kafka, files, social media etc.Variety of sinks – Kafka, files, databases etc.* Supports low latency real time visualizations as well

Unbounded and continuous data streamsBatch support, batch processed as stream

In-memory processing with temporal window boundaries

Stateful operations: Aggregation, Rules etc --> Analytics

Page 5: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute5

Big Data Ecosystem: Where DataTorrent fits

Data Sources Oper1 Oper2 Oper3

Hadoop (YARN + HDFS)

Sensor Data

Social Media

Web Servers

App Servers

Click Streams

Real-time analytics &

Visualizations

Real-time DataVisualization

Page 6: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute6

DataTorrent ArchitectureSolutions for

Business Problems

Ingestion & Data Prep ETL Pipelines

Ease of Use Tools Real-Time Data VisualizationManagement & MonitoringGUI Application

Assembly

Application Templates

Apex-Malhar Operator Library

Big Data Infrastructure Hadoop 2.x – YARN + HDFS – On Prem & Cloud

Core

High-level APITransformation ML & Score SQL Analytic

s

FileSync

Dev Framework

Batch Support

Apache Apex Core

Kafka HDFS

HDFS HDFS JDBC HDFS JDBC

Kafka

Page 7: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute7

• Building Apps such as Ingestion & Transform Apps for commonly patterns in customer use cases

App Templates – Recurring patterns

Use Case Pattern Sources Processors Sinks

Data Synchronization, Staging Data for Analytics

HDFS, Kafka, JDBC,

S3

→ HDFS,S3

Enriching Data before Staging

HDFS,JDBC,Kafka

Parser → Deduper → Enricher → Formatter HDFS,Cassandra

Merge & Transform Data Streams

Kafka,JDBC,

FileStream Merge → Transform → Filter → Enricher HDFS

Machine Scoring Kafka H2O or Custom HDFS

Page 8: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute8

• Central repository for big data application templates

• Tested and published by DataTorrent

• Accessible via dtManage on DataTorrent RTS and direct app download from website

• Provides version updates via dtManage

AppHub – App Template Repository

Page 9: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute9

App Templates advantages

Ease of use Time to market and TCO Real-time Visualizations

✓ Quickly import and launch app template applications

✓ Easily add business logic by adding custom operators

✓ Reduces time to production drastically

✓ Reduces cost of operations in production

✓ Real-time visualizations of operational metrics such as throughput, latency etc.

✓ Real-time visualizations of application data such as number of files processed, amount of data transferred etc.

Page 10: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute10

• Kafka to Database Sync

• Database to HDFS Sync

Checkout more apps at: https://www.datatorrent.com/apphub/

App Template Demo

Page 11: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute11

• Visualizations – widgets on app data• Metrics such as size of data moved, lines per file, number of errors etc

• Custom user defined metrics using apex auto-metrics• Schema enablement• Cloud Integrations

• Amazon EMR, Microsoft Azure• Upcoming app templates

• FTP → HDFS• SFTP → HDFS• Kinesis → S3• Kinesis → Redshift • Kafka → JSON parse → filter → transform → HDFS• Kafka → CSV parse → filter → transform → HDFS

Roadmap

Page 12: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute12

Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums

ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product

ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub

Page 13: Intro to Big Data AppHub: Kafka to Database & Database to HDFS App Templates

© 2017 DataTorrent Confidential – Do Not Distribute13

We are hiring!

[email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders