business implications of etl using rts
TRANSCRIPT
Agenda● DataTorrent RTS
● App Hub + App Template: Kafka ETL
● What is ETL? ● Use Cases
● Why DataTorrent RTS?
● Questions?
2
DataTorrent RTS - Overview3
AppHub4
Kafka ETL – Java App Package
Click on AppHub in
DataTorrent RTS 3.6
App Template: Kafka ETL5
● Reading data records (csv format) through Kafka
● Parsing the CSV tuples
● Filtering tuples
● Transformation of tuples
● Writing to HDFS
● Powered by Apex – Built in Fault Tolerance, Scalability
App Template: Kafka ETL (Cont..)6
What is ETL?7
Stream vs Batch8
Ingest Archive
Transform Normalize
Transform Analyze Action Visualize/
PersistIngest
Stream Processing Data Pipeline
Batch Processing Data Pipeline
Extract Transform Load Analyze Action
Operator Library9
RDBMS• Vertica• MySQL• Oracle• JDBC
NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode
Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi
File Systems• HDFS/ Hive• NFS• S3
Parsers• XML • JSON• CSV• Avro• Parquet
Transformations• Filters• Rules• Expression• Dedup• Enrich
Analytics• Dimensional Aggregations
(with state management for historical data + query)
Protocols• HTTP• FTP• WebSocket• MQTT• SMTP
Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter
Use Case1: Real Time Ad Performance10
Ad Servers (AWS – Region 2)
Ad Servers (AWS – Region 1)
Real Time Dashboarding
Ad Placement
Persistent
In-Memory Computation
Kafka
Producers
Kafka
Brokers
Ad Servers (AWS – Region n)
Ad server log events consumed from Kafka. Real Time Dimension
Computation. Data Ingestion through Kafka, Dimension Computation for
Real Time Dashboarding
Use Case 2: Sentiment Analysis11
Scrapped Data
Twitter API
Web Logs
Application n
Application 1
Persist
Data Parsing Data Enrichment
Sentiment Analysis
High performance, multi-customer secure, data ingestion. Real Time
ETL for Sentiment Analysis
Use Case 3: Log Analysis12
CEF
Syslog
APIs
Application n
Application 1
Persist
Log Data Parsing
Data Transformation
Alerts
High performance, multi-customer secure, data ingestion. Real Time
Log Analysis
Data Enrichment
JDBC
Use Case 4: Financial Data Fabrication13
Financial Data
SMTP Logs
Historical
Application n
Application 1
Persistent
Encrypt Compliance Alert on Error
Archive
Secure, fault tolerant, data ingestion, formatting & archiving. Data access
layer for application processing
Data Parse
Data Transform
Use Case 5: Real Time Fraud Detection14
Loan Applications
Credit Card
New Accounts
Application n
Application 1
Alert
Data ParsingData
EnrichmentRules for Fraud
Detection
Kafka
Producers
Kafka
Brokers
High performance, multi-customer secure, data ingestion. Real Time complex data processing against
fraud rules
Use Case 5: Real Time Sensor Data15
Sensor 2
Sensor 1
Sensor N
Application n
Application 1
Persistent
Data Governance
Complex Event Process
Predictive Maintenance
Kafka
Producers
Kafka
Brokers
High performance, multi-customer secure, data ingestion. Complex
event processing with historical data for predictive maintenance
Why DataTorrent RTS?16
● Powered by Apache Apex
● In-memory Processing
● Reusable Malhar components
● Built in Fault-tolerance, Scalability
● Ease of development
● Reduced Time to Production + Support for DevOps
17Resources• Apache Apex - http://apex.apache.org/• Subscribe to forums
ᵒ Apex - http://apex.apache.org/community.htmlᵒ DataTorrent - https://groups.google.com/forum/#!forum/dt-users
• Download - https://datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://meetup.com/topics/apache-apex• Webinars - https://datatorrent.com/webinars/• Videos - https://youtube.com/user/DataTorrent• Slides - http://slideshare.net/DataTorrent/presentations • Startup Accelerator – Free full featured enterprise product
ᵒ https://datatorrent.com/product/startup-accelerator/• Big Data Application Templates Hub – https://datatorrent.com/apphub
18
We Are Hiring• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders
19