business implications of etl using rts

19
Business Cases – ETL (Extract, Transform, Load) Mohit Jotwani [email protected]

Upload: datatorrent

Post on 08-Jan-2017

110 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Business Implications of ETL using RTS

Business Cases – ETL (Extract, Transform, Load)

Mohit [email protected]

Page 2: Business Implications of ETL using RTS

Agenda● DataTorrent RTS

● App Hub + App Template: Kafka ETL

● What is ETL? ● Use Cases

● Why DataTorrent RTS?

● Questions?

2

Page 3: Business Implications of ETL using RTS

DataTorrent RTS - Overview3

Page 4: Business Implications of ETL using RTS

AppHub4

Kafka ETL – Java App Package

Click on AppHub in

DataTorrent RTS 3.6

Page 5: Business Implications of ETL using RTS

App Template: Kafka ETL5

● Reading data records (csv format) through Kafka

● Parsing the CSV tuples

● Filtering tuples

● Transformation of tuples

● Writing to HDFS

● Powered by Apex – Built in Fault Tolerance, Scalability

Page 6: Business Implications of ETL using RTS

App Template: Kafka ETL (Cont..)6

Page 7: Business Implications of ETL using RTS

What is ETL?7

Page 8: Business Implications of ETL using RTS

Stream vs Batch8

Ingest Archive

Transform Normalize

Transform Analyze Action Visualize/

PersistIngest

Stream Processing Data Pipeline

Batch Processing Data Pipeline

Extract Transform Load Analyze Action

Page 9: Business Implications of ETL using RTS

Operator Library9

RDBMS• Vertica• MySQL• Oracle• JDBC

NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode

Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi

File Systems• HDFS/ Hive• NFS• S3

Parsers• XML • JSON• CSV• Avro• Parquet

Transformations• Filters• Rules• Expression• Dedup• Enrich

Analytics• Dimensional Aggregations

(with state management for historical data + query)

Protocols• HTTP• FTP• WebSocket• MQTT• SMTP

Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter

Page 10: Business Implications of ETL using RTS

Use Case1: Real Time Ad Performance10

Ad Servers (AWS – Region 2)

Ad Servers (AWS – Region 1)

Real Time Dashboarding

Ad Placement

Persistent

In-Memory Computation

Kafka

Producers

Kafka

Brokers

Ad Servers (AWS – Region n)

Ad server log events consumed from Kafka. Real Time Dimension

Computation. Data Ingestion through Kafka, Dimension Computation for

Real Time Dashboarding

Page 11: Business Implications of ETL using RTS

Use Case 2: Sentiment Analysis11

Scrapped Data

Twitter API

Web Logs

Application n

Application 1

Persist

Data Parsing Data Enrichment

Sentiment Analysis

High performance, multi-customer secure, data ingestion. Real Time

ETL for Sentiment Analysis

Page 12: Business Implications of ETL using RTS

Use Case 3: Log Analysis12

CEF

Syslog

APIs

Application n

Application 1

Persist

Log Data Parsing

Data Transformation

Alerts

High performance, multi-customer secure, data ingestion. Real Time

Log Analysis

Data Enrichment

JDBC

Page 13: Business Implications of ETL using RTS

Use Case 4: Financial Data Fabrication13

Financial Data

SMTP Logs

Historical

Application n

Application 1

Persistent

Encrypt Compliance Alert on Error

Archive

Secure, fault tolerant, data ingestion, formatting & archiving. Data access

layer for application processing

Data Parse

Data Transform

Page 14: Business Implications of ETL using RTS

Use Case 5: Real Time Fraud Detection14

Loan Applications

Credit Card

New Accounts

Application n

Application 1

Alert

Data ParsingData

EnrichmentRules for Fraud

Detection

Kafka

Producers

Kafka

Brokers

High performance, multi-customer secure, data ingestion. Real Time complex data processing against

fraud rules

Page 15: Business Implications of ETL using RTS

Use Case 5: Real Time Sensor Data15

Sensor 2

Sensor 1

Sensor N

Application n

Application 1

Persistent

Data Governance

Complex Event Process

Predictive Maintenance

Kafka

Producers

Kafka

Brokers

High performance, multi-customer secure, data ingestion. Complex

event processing with historical data for predictive maintenance

Page 16: Business Implications of ETL using RTS

Why DataTorrent RTS?16

● Powered by Apache Apex

● In-memory Processing

● Reusable Malhar components

● Built in Fault-tolerance, Scalability

● Ease of development

● Reduced Time to Production + Support for DevOps

Page 17: Business Implications of ETL using RTS

17Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums

ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users

• Download - https://datatorrent.com/download/• Twitter

ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent

• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product

ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub

Page 18: Business Implications of ETL using RTS

18

We Are Hiring• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders

Page 19: Business Implications of ETL using RTS

19