real-world applications of streaming analytics- streamanalytix webinar
DESCRIPTION
On- demand webcast ‘Real-world Applications of Streaming Analytics’ available at http://bit.ly/1AeGdxMTRANSCRIPT
© 2014 Impetus Technologies1
Recorded version available at http://bit.ly/1AeGdxM
WEBINAR
Real World Applications
of Streaming Analytics
© 2014 Impetus Technologies2
Recorded version available at http://bit.ly/1AeGdxM
Recent Webcast Recap– Archived on the
Website
Real-time Streaming Analytics for Enterprises based on Apache Storm
Real-time Streaming Analytics: Business Value, Use Cases, and Architectural Considerations
© 2014 Impetus Technologies3
Recorded version available at http://bit.ly/1AeGdxM
Agenda
Q&A
Why rapid growth and
demand for real-time
analytics
StreamAnalytix –
Product Overview
Real World Case Studies
Business Problem, Solution
Architecture and Outcomes
© 2014 Impetus Technologies4
Recorded version available at http://bit.ly/1AeGdxM
Brief Intro
• Big Data Solutions & Services company• Unique in depth, expertise – started implementing in 2008
• Proven with customer success
• IP and Products
• We deliver - Business Impact from Big Data Solutions• Technology expertise
• Data Science
• Business Analytics
• Serving Fortune 1000 companies since 1996• Large-scale and mission critical software platforms
• HQ: Los Gatos, CA; 1500 people • Offshore operations in 3 cities in India
© 2014 Impetus Technologies5
Recorded version available at http://bit.ly/1AeGdxM
Drivers for Real-time Streaming Analytics
Fleet Operations & Logistics Security
Mobile Devices and Apps Energy Industry IT Operations
© 2014 Impetus Technologies6
Recorded version available at http://bit.ly/1AeGdxM
Drivers for Real-time Streaming Analytics
You and I :
The ‘CUSTOMER’
© 2014 Impetus Technologies7
Recorded version available at http://bit.ly/1AeGdxM
Drivers for Real-time Streaming Analytics
© 2014 Impetus Technologies8
Recorded version available at http://bit.ly/1AeGdxM
Drivers for Real-time Streaming Analytics
Multi-channel engagement in
real-time
Context
Sensitive service
Happy customers,
Loyalty, Revenue,
Profits, Growth
© 2014 Impetus Technologies9
Recorded version available at http://bit.ly/1AeGdxM
Drivers for Real-time Streaming Analytics
Business Operations
Business Analytics
Real-time Streaming Analytics
© 2014 Impetus Technologies10
Recorded version available at http://bit.ly/1AeGdxM
Real-time Business Analytics – The “Batch Gap”
The batch workflow is too slow
Views are out of date
Not yet
absorbed.Data absorbed into Batch
Views
Now
Time
Just a few hours of data.
© 2014 Impetus Technologies11
Recorded version available at http://bit.ly/1AeGdxM
t
now
Hadoop works great back
here
Storm works
here
Blended View – Historical and NOW
Blended viewBlended viewBlended View
© 2014 Impetus Technologies12
Recorded version available at http://bit.ly/1AeGdxM
Big Data and Fast Data Combined
Batch Layer
All dataPre-computed
information
Batch re-compute
Speed Layer
All dataPre-computed
information
Real time
increment
Batch view
Serving Layer
Batch view
Me
rge
Real time view
Real time view
Incoming
Data Query
© 2014 Impetus Technologies13
Recorded version available at http://bit.ly/1AeGdxM
Poll
Where are you in the process of
implementing real-time streaming analytics?
© 2014 Impetus Technologies14
Recorded version available at http://bit.ly/1AeGdxM
Enterprise Class Real time Streaming Analytics Platform
A Product developed and offered by
© 2014 Impetus Technologies15
Recorded version available at http://bit.ly/1AeGdxM
At a Glance
StreamAnalytix is a software platform that enables enterprises to analyze and
respond to events in real-time at Big Data scale. It is designed to rapidly build
and deploy streaming analytics applications for any industry vertical, any data
format, and any use-case
© 2014 Impetus Technologies16
Recorded version available at http://bit.ly/1AeGdxM
StreamAnalytix Block Diagram
© 2014 Impetus Technologies17
Recorded version available at http://bit.ly/1AeGdxM
Case Studies - Real World Applications
© 2014 Impetus Technologies18
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
Problem:
Basic Schematic Architecture
Numerous " non-voice "communications
© 2014 Impetus Technologies19
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• Classify streaming text in real-time based on topic
• Sentiment Analysis on the stream in real-time
• 250 million messages a day
• Variety: weblogs, chats, emails, tweets etc.
• Accuracy Classification - 99.99% Sentiment analysis - 80%
20 Predefined Categories
"Arts_culture_entertainment" "law_crime_justice" "disaster_accident" "economy_finance" "education" "environment_weather" "health" "lifestyle" "politics" "religion" "science" "society" "sports" "conflict_war" "literature" "computing" "labor" "travel" "governance_government" "human_interest"
Problem statement
© 2014 Impetus Technologies20
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• English and Arabic content
• Other languages = “other” (no metadata)
• Data very rawHad CSS and JavaScript filesTo be categorized as “scripts”
• Ingest, Store, Index, QueryMetadata and Raw binary dataPetabytes
• Query SLA – On any 4 hour window "cold data"4 to 5 secondsETSI compliant encryption
Problem statement
© 2014 Impetus Technologies22
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
Content Extraction and Preprocessing
ClassificationSentiment
Analysis
Tokenization of words based on delimiters (space)
Elimination of all “Stop Words”, non-contributory words
Removal of non-ASCII and Non UTF-8
Models built offline and scoring online
© 2014 Impetus Technologies23
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• 20 categories; Multiple labels if applicable
• Semantic similarity approach based on matrix
decomposition
• Language independent (with caveats)
• Low Latency achieved by two step process-Pre-processing -Numerical computation
Real-time Classification
© 2014 Impetus Technologies24
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• Dictionary or Lexicon approach; Unsupervised model
• Prepared offline with matrix decomposition
• Polarities assigned to adjectives (+ - 0 )
-Surrounding words could negate, amplify etc.
-Clusters of words treated separately
-Feature extraction possible for distinct sentiment
Sentiment Analysis
© 2014 Impetus Technologies25
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• Language independent technique worked well
• 50-60 documents per topic was sufficient
• Arabic content
-Is not 100% tokenizable – no spaces
-Did not hamper accuracy significantly
-Needed language expert to test model (for any foreign language)
Learnings - Analytics
© 2014 Impetus Technologies26
Recorded version available at http://bit.ly/1AeGdxM
Case Study 1 – Intelligence Solutions Company
• Task lends well to parallelization and scale out
• StreamAnalytix is a good fit – linear scale out
• Flexible topology
• Event size/ throughput – trade off
• Unique sharding and indexing for query optimization
• Many more types of use-cases possible
Learnings - Architecture
© 2014 Impetus Technologies27
Recorded version available at http://bit.ly/1AeGdxM
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies28
Recorded version available at http://bit.ly/1AeGdxM
IVR
QueueAgent
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies29
Recorded version available at http://bit.ly/1AeGdxM
Problem Statement
• Reactive –
Customer service
complaints on “What
happened to my call ?”
Diagnostics
- Easier
- Faster
• Proactive –
Business teams
want to understand
dominant call paths
• Lower “Queue” time
• Proactive -
Abandoned call
analysis
Hang up on
IVR/hold
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies30
Recorded version available at http://bit.ly/1AeGdxM
Real-Time Dashboard and Alerts
• Ability to show counters on the existing Log Monitoring dashboard. For eg. #of inbound calls per tenant
• SLA based alarms – ability to generate alarms based on SLA threshold values over a moving time window per
tenant.
Log Aggregation
• Stream raw log events from
multiple remote servers
• Filter incoming log events – before
storage
• Index/search of log events
Auto Correlate Logs in
Real-time• Correlate log events arriving at
different time intervals based on
System ID, Channel ID, Call ID
• Visualize the complete call path
for a particular id
IVR Dominant
Path
Case Study 2 – Hosted Contact Center
Solution
Technical Requirements
© 2014 Impetus Technologies31
Recorded version available at http://bit.ly/1AeGdxM
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies32
Recorded version available at http://bit.ly/1AeGdxM
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies33
Recorded version available at http://bit.ly/1AeGdxM
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies34
Recorded version available at http://bit.ly/1AeGdxM
Outcome, Next steps
• Next steps
– Sentiment analysis in real-time
(chat)
– Audio to text: Sentiment
analytics on transcript
– Rich real-time dash-boarding
and live counters
• Successfully solved key problems
– Call log aggregation, indexing and
search
– Real-time call path picture
– Dominant path analytics
Case Study 2 – Hosted Contact Center
Solution
© 2014 Impetus Technologies35
Recorded version available at http://bit.ly/1AeGdxM
Case Study 3 – Digital Content Provider
• Scholarly journals, educational, research
content
• Institutional Subscribers – 1000s of users
each
• Business wants real-time visibility and
analytics of customer behavior patterns
© 2014 Impetus Technologies36
Recorded version available at http://bit.ly/1AeGdxM
Problem Statement
• 10s of millions of events per day
– Clickstream data – complex XML events
• Real-time ETL
– Complex XML events parsed, filtered in real-time
• Clickstream-Analytics:
– Double click detection
– BOT detection
• Recommendation engine
– Upsell/ cross-sell
Case Study 3 – Digital Content Provider
© 2014 Impetus Technologies37
Recorded version available at http://bit.ly/1AeGdxM
Case Study 3 – Digital Content Provider
Data Flow and Real-time Pipeline Design
© 2014 Impetus Technologies38
Recorded version available at http://bit.ly/1AeGdxM
Case Study 4 – Web Application SLA Monitoring
• Healthcare insurance exchange software platform
• Server response time to front end application is a key
metric
• Complaints from key customers (potential revenue impact)
• Triggered need for aggressive monitoring and alerting
system
© 2014 Impetus Technologies39
Recorded version available at http://bit.ly/1AeGdxM
• Alert if response
breaches 4
second threshold
• Real-time
counters/
dashboard for a
variety of metrics
• Monthly report
Case Study 4 – Web Application SLA Monitoring
Problem Statement
© 2014 Impetus Technologies40
Recorded version available at http://bit.ly/1AeGdxM
Remote
Node
Syslog
Server
StreamAnalyti
x
Agent
Syslog
Kafka
Server
Kafka
via
TCP
StreamAnalytix Agent Features
• The agent can publish to multiple destinations
• The agent can send encrypted data (optional)
StreamAnalytix
Real-Time Pipeline
Index
Store
Down
Stream
System
SLA
Events
Report
generationSLA
Alerts
Real-Time
Counters
Case Study 4 – Web Application SLA Monitoring
Data Flow and Real-time Pipeline Design
© 2014 Impetus Technologies41
Recorded version available at http://bit.ly/1AeGdxM
Successful outcomes with all early customers
• Tier1 Healthcare Insurance Carrier – variety of use-cases
• Major Credit Card Brand and Bank – variety of use-cases
• End-point Security Application – On-prem and SaaS
• Mobile Field Devices – Real-time monitoring, predictive analytics
A few others in process
© 2014 Impetus Technologies42
Recorded version available at http://bit.ly/1AeGdxM
Q&A
Email us at [email protected]
www.StreamAnalytix.com
?
Request: On-premise and Cloud based trial and/or Proof of concept
© 2014 Impetus Technologies43
Recorded version available at http://bit.ly/1AeGdxM
StreamAnalytix Product Highlights
An “App Server” for real-time apps – on-premise and
cloud
Focus on your business logic - leave infra to us
Handle all the 3V’s of Big Data on one platform
Significant time to market acceleration
Seamless integration with Hadoop and NoSQL
© 2014 Impetus Technologies44
Recorded version available at http://bit.ly/1AeGdxM
Key Features
High Speed
Data
Ingestion
Elastic
Scaling –
Volume,
Velocity
Data Parsing
- Variety
Pluggable
Persistence
Real-time
Index and
Search
Dynamic
Message
Routing
Rule Based
Alert
Pluggable
Workflow
Management
Fault
Tolerance
and Data
Integrity
Optimized for
High
Performance