sparkflows use cases
TRANSCRIPT
Use Cases to Build & Deploy in < 30 min
Self-Serve Big Data Analytics & Applications
2
AgendaIntroductionSparkflows SolutionUse Cases
3
100 + Building Blocks
ETL, ML, OCR, NLP, Connect to various Sources/Sinks
Workflow Editor
Powerful Schema Inference, Schema Propagation, Interactive Execution
Visualization & DashboardsPrebuilt Workflows
Introduction
4
Workflow Editor
Sparkflows Solution
Rich Visualizations &
Dashboards
100’s of Pre-built Nodes
Batch & Streaming Engine
Interactive Execution
Easy Deployment & Configuration
Pre-built Workflows
Telco Churn Pred
Housing Price Pred
Bike Sharing Analysis
NY Taxi Data Analysis
Movie Lens Recommendations
5
Sparkflows Product Stack
Streaming DataKafka
Flume
Data SourcesHIVE/HBase
HDFS/S3
Solr
RDBMS
Apache Spark Cluster
Databricks AWS
IBM Bluemi
x
On Prem
Azure
Data Sinks
HIVE/HBase
HDFS/S3
Solr
RDBMS
Visualizations/
Dashboards
6
Machine Learning
Classification Regression Clustering Collaborative Filtering Save/Load Model Predict Cross-Validator
NLP
NER Sentiment
OCR
Tesseract
Visualization
Line Chart Bar Chart Pie Chart Updating Dashboards
File Formats
CSV/TSV Parquet JSON Avro PDF Images Whole Files
Feature Generation
Tokenization TF, IDF OneHotEncoder StringIndexer Imputer Scaler
Data Sources/Sinks
HDFS S3 Kafka, Flume, Twitter HBase Solr Elastic Search
ETL
Joins, Unions Filter SQL, Scala, Python GeoIP ConcatColumns Column Filter Dedup
Languages
SQL Scala Jython Java
Some of the Building Block / Nodes
7
Use Cases in < 30 minutes
Self-Serve Big Data Analytics
ETL Pipelines
NLP
OCR
Streaming Analytics
Do Big Data Analytics with Drag & Drop with 100+ building blocks
Build ETL pipelines with ease. Also incorporate SQL, Scala, Jython in it.
Perform NLP on Big Data with OpenNLP and Stanford CoreNLP
Perform OCR on millions of images with Tesseract
Perform Streaming Analytics reading from Kafka, performing complextransforms, generate graphs and write out to Solr, Hbase etc.
8
Use Cases in < 30 minutes
Machine Learning
Entity Resolution
Log Analytics
Format Conversion
Load data into Solr, ES, HBase
Perform Machine Learning on huge datasets with drag and drop
Perform large scale Entity Resolution on data from multiple channels
Build Log Analytics Platform with Kafka, Spark, Solr/Elastic Search, Hue
Convert Big Data from one format to another
Easily load data into Solr, Elastic Search, HBase etc.
9
Use Cases in < 30 minutes
Custom Nodes Create Custom Nodes and drop them in the Library/Workflow Editor
Dashboards Combine various outputs of workflows into a Dashboard
Self-Serve Data Analytics
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic Search
HIVE
Row Filter / Rename Col
Random Forest
SQL / Scala / Jython
JOIN
Read
Graph
Graph
Model
Dashboard
ETL – Build ETL pipelines with ease
HIVE
Solr
Spark
CSV Filter
Filter
JOIN SQLES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
ETL – Connect various SQL for powerful pipelines
HIVE
Solr
Spark
CSV SQL
SQL
SQL SQLES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadCSV
ReadHIVE
NLP – Perform distributed NLP on Big Data
CSV
Solr
Spark
PDF NLP
NLP
JOINES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
ReadCSV
OCR – Perform distributed OCR on Big Data
Solr
Spark
PDF OCRES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadPDF
Plus extract images
Streaming Analytics – With Kafka & Spark Streaming
Solr
Spark
ES
HBase
HIVE
LoadSolr
LoadES
LoadHBase
LoadHIVE
ReadKafka
Apply various
transforms
Kafka
Transform
Graph
Machine Learning – With Spark ML
Spark
Logistic Regression
Score
Evaluate
Apply various
transforms
TransformHIVE Split
Entity Resolution – Applying various distance algorithms & scoring
Spark
DedupJoin & Transform
DataSet 1
DataSet 2
HIVEFilter low
Scores
Log Analytics
Spark
IP2Geo
ReadKafka
Kafka
Graph
Apache Logs
Parse Apache Logs
Save
Solr
HBase
Elastic Search
HIVE
SQL
HUE
Small Files Problem
CSV
Spark
CSV
Coalesce
HIVE
Read
HIVE
Save
Format Conversion
Spark
CSV
Read
AVRO
Save
JSON
Parquet
CSV
AVRO
JSON
Parquet
Loading Data into Solr, Elastic Search, HBase, HIVE
Spark
CSV
Read
AVRO
Save
JSON
Parquet
Solr
HBase
Elastic Search
HIVE
Custom Nodes – Create & Use Custom Nodes which add custom features
Spark
Custom NodeJoin & Transform
DataSet 1
DataSet 2
HIVECustom Node
Dashboards – Combine output of various Workflows/Nodes into a Dashboard
24
THANK YOU