hadoop as an analytic platform: why not?
DESCRIPTION
The Briefing Room with William McKnight and Actian Live Webcast on October 14, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=135528d85baa96a07850bd35961d459d Integrating Hadoop with existing data sources, workflows and analytics can be a real challenge. While some components, like Hive and Spark, can give SQL access to Hadoop data, there isn’t much that enables Hadoop to be treated as a genuine BI and analytics platform, capable of running multiple jobs that serve multiple users and multiple applications. But what if you could turn Hadoop into a versatile, high performance development platform, forgoing all the pain of figuring out how and where to manage big data? Register for this episode of The Briefing Room to hear veteran Analyst William McKnight as he discusses the fairly swift evolution of Hadoop’s capabilities. He’ll be briefed by Jim Hare of Actian, who will tout his company’s latest addition to its Analytic Platform: Hadoop SQL Edition. He will show how Actian has leveraged Hadoop and its scale out file system to create a fully functioning platform, providing everything from an analytic database to machine learning. Visit InsideAnlaysis.com for more information.TRANSCRIPT
Grab some coffee and
enjoy the
pre-show
banter
before the top of the
hour!
The Briefing Room
Hadoop as an Analytic Platform: Why Not?
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: ANALYTIC PLATFORMS
November: DISCOVERY & VISUALIZATION
December: INNOVATORS
Twitter Tag: #briefr
The Briefing Room
Executive Summary
Ø Don’t build CARRIAGES for highways
Ø Focus on NEW opportunities
Ø SLOWLY ween off old systems
A NEW ERA of Architecture
Twitter Tag: #briefr
The Briefing Room
Analyst: William McKnight
William is President of McKnight Consulting Group. His clients have included 17 of the Global 2000. Many clients have gone public with their success story. His team's implementations have won multiple Best Practices awards. William is an Entrepreneur of the Year Finalist, a frequent best practices judge and an expert witness. He has hundreds of articles and dozens of white papers in publication. William has also given numerous keynote presentations worldwide at major conferences and has given hundreds of public seminars and webinars. William’s experience includes taking his company to placement on the Inc. 500 and the Dallas 100 to seller of a multi-million dollar consulting firm. He is a passionate communicator and motivator, and a former IT VP of a Fortune 50 company.
Twitter Tag: #briefr
The Briefing Room
Actian
! Actian is a database and software development company
! The Actian Analytics Platform connects to data and Big Data sources to perform actionable and advanced analytics
! Actian recently released Hadoop SQL Edition, a component that enables SQL access on data stored in Hadoop
Twitter Tag: #briefr
The Briefing Room
Guest: Jim Hare
Jim Hare is Senior Director of Product Marketing for the Actian Analytics Platform, helping organizations transform big data into business value. Prior to Actian, he was Director of Marketing at IBM responsible for go-to-market strategy and messaging for the big data platform. Prior to joining IBM in 2008, Jim was vice president of product marketing and business development at Celequest, a California-based operational business intelligence vendor, which was acquired by Cognos in 2007. He has over 16 years of deep experience in enterprise software, business
intelligence, business process management, business activity monitoring, big data, and automated software testing & monitoring. Jim holds a MS in Systems Management from the University of Southern California, and an undergraduate degree from the University of Colorado at Boulder.
Confiden'al © 2014 Ac'an Corpora'on 10
Hadoop as an Analy'c Pla8orm: Why Not? Jim Hare 14 October 2014
Confiden'al © 2014 Ac'an Corpora'on 11
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
11 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor
Ac'an is Delivering Transforma'onal Value
“Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE
Confiden'al © 2014 Ac'an Corpora'on 12
Emergence of Hadoop as the Data Reservoir
Low Cost Storage for New Data & Offload
Confiden'al © 2014 Ac'an Corpora'on 13
Ø Scalable -‐ store large data sets across low cost servers
Ø Cost effec.ve – 1/10th the cost of tradi'onal data storage
Ø Flexible -‐ quickly and easily land any data in raw format
Ø Fast Access –'maps' data wherever it is located on a cluster
Ø Resilient -‐ data is replicated to other nodes in the cluster
Benefits of Hadoop
Confiden'al © 2014 Ac'an Corpora'on 14
Hadoop Deployment Survey Results
Only 22% of Hadoop projects are in production today!
14
8%
14%
10%
41%
16%
11% Exploring and educa.ng
41%
16% 8%
14%
10%
11%
Conduc.ng POC
Developing First Solu.on
Pilot First Solu.on
First Solu.on
Deployed
Suppor.ng Mul.ple Analy.cs
Source: SandHill Group Research, “How do you Hadoop? A Survey of Big Data Practitioners”, May 2014
Confiden'al © 2014 Ac'an Corpora'on 15
Because It isn’t Easy to Analyze Hadoop Data
Batch performance is
Slow Lengthy Time to
Discover Insights Expensive Skills
Lack of Data
Access & Security
Data preparation is time-consuming
Analytics Complexity
Confiden'al © 2014 Ac'an Corpora'on 16
Organiza'ons are Replica'ng Hadoop Data to Overcome Analy'c Challenges
Rela'onal Data Store
OLTP, ERP, CRM
Unstructured docs, emails
Server logs
Social/Web data
Sensor, machine data
Geoloca'on
Clickstream
Discovery
Analy'cs
Predic've BI
Hadoop
• Duplicate storage & infrastructure costs • More IT resources to manage • Network bandwidth usage • Less accuracy from data Sampling • Slower ?me to analysis results
Confiden'al © 2014 Ac'an Corpora'on 17
Transforma.onal Value Data Explosion
? Customer Delight
Competitive Advantage
World-Class Risk Management
Disruptive New Business Models
Ac6an Transforms Hadoop from a Data Reservoir into a High Performance Analy6cs PlaCorm
Discovery Analy.cs
Time-‐Sensi.ve Analy.cs
Crea'ng Transforma'onal Value from Hadoop Data
! Highest performing, most industrialized Hadoop analy'cs pla8orm
! Only end-‐to-‐end analy.c processing na'vely in Hadoop
! Most consumable, accessible, manageable Hadoop analy'cs
Actian Analytics PlatformTM
Analyze
Act
Connect
Hadoop
Confiden'al © 2014 Ac'an Corpora'on 18
Elastic and Secure
• Schedule • Transform • Validate • Aggregate • Reformat • Join • Orchestrate
ENERPRISE Data
SOCIAL Data
MACHINE Data
CLOUD Data
LEGACY Data
DEVICE Data
Connect to Any Data Anywhere
200+ Connectors
Embeddable High Throughput
Engine
Drag and Drop Workflow Designs
Capture Data Feeds in Batch or Real-Time
Expandable Plugin
Framework
High Volume Parallel Data Processing
7 Ingredients Added to Hadoop to Unlock Value 1. High speed integra'on to on-‐board data from any data
source and any type
Confiden'al © 2014 Ac'an Corpora'on 19
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework for connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
7 Ingredients Add to Hadoop to Unlock Value
Connect Blend & Enrich Discover Build & Test Models
Coding
Confiden'al © 2014 Ac'an Corpora'on 20
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework: connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
3. 1500 KNIME Operators + R analy'cs running in parallel on HDFS + Hadoop = The Open Source Trifecta
7 Ingredients Added to Hadoop to Unlock Value
Gartner Magic Quadrant for Advanced Analytics Platforms Source: Gartner (February 2014)
Confiden'al © 2014 Ac'an Corpora'on 21
Complete End-‐to-‐End Analy'cs on Hadoop
Source: 2013 Rexer Analy'cs Survey
Confiden'al © 2014 Ac'an Corpora'on 22
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework: connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
3. 1500 KNIME Operators + R analy'cs running in parallel on HDFS + Hadoop = The Open Source Trifecta
4. High-‐Performance, YARN-‐based data processing engine running on HDFS
7 Ingredients Added to Hadoop to Unlock Value
Actian DataPrep
LEADER On-Node Processing
Read Write Prepare Analyze Read Write Analyze
Optimizer
Confiden'al © 2014 Ac'an Corpora'on 23
High Performance, Parallelized Processing on HDFS Without Any Programming
Ac.an Analy.cs PlaPorm
Hadoop – Leader Node
Optimized, On-HDFS Processing
Query Pipelining CPU Pipelining
Reuse and share all components from
operators to workflows
Optimize
Choose from five sets of operators: Connections
Transformation Data Quality
Analytics Data Science
Automatically detect resources, plan
optimal utilization, and parallelize all
workloads on Hadoop
Use dual pipeline parallelism to
accelerate performance 30X
Run fully optimized processing directly on the Hadoop node via
YARN
Take processing to where the data lives,
runs natively on Hortonworks
Visual Framework
Manage the entire analytic process in a visual framework with no coding required.
≠ ☼ ≡ ∞ ∆ ∑ √ ≈ ∑ = ? # ~ ‰
Confiden'al © 2014 Ac'an Corpora'on 24
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework: connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
3. 1500 KNIME Operators + R analy'cs running in parallel on HDFS + Hadoop = The Open Source Trifecta
4. High-‐Performance, YARN-‐based data processing engine running on HDFS
5. High-‐Performance, vector processing engine as the pajern for SQL on Hadoop
7 Ingredients Added to Hadoop to Create Value
Confiden'al © 2014 Ac'an Corpora'on 25
Vector-‐based SQL Processing Na'vely on HDFS
HADOOP
YARN
HDFS
Datanode HDFS
Datanode HDFS
Datanode HDFS
Datanode HDFS
X100 X100 X100
Visual Data & Analytics Workbench
Read Load
Ac'an Vector Blend & Enrich
Data Science & Analy'cs
Datanode HDFS
X100
Namenode High Performance, Industrialized SQL
Database
High Performance, Parallelized Data
Flow Engine
SQL
Standards - ANSI SQL 92 plus advanced analytics Optimized - mature, proven planner and optimizer Secure – native DBMS security Reliable - full ACID-compliance Manageable – YARN certified Performance - 30X faster than Impala Scalable – unlimited expansion as Hadoop cluster grows Native – runs natively on top of HDFS via YARN
Industrialized
Confiden'al © 2014 Ac'an Corpora'on 26
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework: connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
3. 1500 KNIME Operators + R analy'cs running in parallel on HDFS + Hadoop = The Open Source Trifecta
4. High-‐Performance, YARN-‐based data processing engine running on HDFS
5. High-‐Performance, vector processing engine as the pajern for SQL on Hadoop
6. Extreme-‐Performance, super-‐low latency, massively parallel analy'cs engine
7 Ingredients Added to Hadoop to Unlock Value
Confiden'al © 2014 Ac'an Corpora'on 27
Libraries of Analytics
Mas
sive
ly P
aral
lel
Inte
grat
ion
Hadoop
Sophisticated, Low Latency Analytics in
Database
Connections for Any Data
Actian Analytics PlatformTM
Enterprise Data
Machine Data
Social Data
Business Processes
Users
Machines
Applications
Data Warehouse
Real-Tim
e A
nalytic Services
Visual Framework for Data and Analytic Workflows
SaaS Data
Ac'an Analy'cs Pla8orm: Next Genera'on Big Data Analy'cs
Amazon Redshift
High Performance Data Science Natively on
Hadoop
Confiden'al © 2014 Ac'an Corpora'on 28
1. High speed integra'on to on-‐board data from any data source and any type
2. Visual Framework: connec'ng, blending, & enriching data, data science discovery, building and tes'ng predic've models
3. 1500 KNIME Operators + R analy'cs running in parallel on HDFS + Hadoop = The Open Source Trifecta
4. High-‐Performance, YARN-‐based data processing engine running on HDFS
5. High-‐Performance, vector processing engine as the pajern for SQL on Hadoop
6. Extreme-‐Performance, super-‐low latency, massively parallel analy'cs engine
7. Blueprints to accelerate analy'cs applica'on development and value crea'on
7 Ingredients Added to Hadoop to Unlock Value
Confiden'al © 2014 Ac'an Corpora'on 29
Big Data 2.0 Media Mix Modeling Blueprint
IMPACT FORECAST ANALYSIS
MARKETING IMPACT
ANALYSIS
CRMdb
All Relevant Account Info and Demographics
CONNECT
BUILD CUSTOMER
PROFILE
EDWdb
All Relevant Sales Histories
ANALYZE ACT
MAXIMIZE REVENUE
FROM MARKETING
SENTIMENT AND
CONTENT ANALYSIS
AGGREGATE SALES DATA
Hadoop Logs
Detailed ePOS Receipts
SKU LEVEL SALES DATA
BY GEO
JOIN DERIVE
AGGREGATE PREPARE
EDWdb
Marketing Vehicle Details
CAPTURE MARKETING VEHICLES
MARKETING MIX SALES
CONTRIBUTION YEARLY CHANGE
REPORT
SALES VOLUME,
EFFECTIVENESS, EFFICIENCY
AND ROI REPORT
NEW MEDIA MIX
OPTIMIZATION
MINIMIZE MARKETING SPEND TO REVENUE
RATIO
Hadoop Text Files
Campaign Response Notes
PREPARE FOR TEXT
ANALYTICS
CUSTOMER MATCH WITH
CAMPAIGNS
VEHICLE RESULTS AT GEO,
STORE, SKU AND
CUSTOMER LEVEL
Confiden'al © 2014 Ac'an Corpora'on 30
1) Stay tuned for several exci'ng announcements on 16 October 2014 at the Strata Conference in NYC!
2) Visit Ac'an at Booth 225 for a demo of Ac'an Analy'cs Pla8orm -‐ the Highest Performing Analy'cs & SQL in Hadoop
3) Download and try it out yourself: bigdata.ac'an.com/sql-‐in-‐Hadoop
Learn more about the Ac'an Analy'cs Pla8orm – Hadoop SQL Edi'on
Confiden'al © 2014 Ac'an Corpora'on 31
www.ac'an.com facebook.com/ac'ancorp @ac'ancorp
Thank You
Twitter Tag: #briefr
The Briefing Room
Perceptions & Questions
Analyst: William McKnight
ANALYTICS: A BUSINESS IMPERATIVE
Formed from SUMMARIES of data
Tied to Business Actions
Continual Re-evaluation
i.e., Customer Segmentation and Profit
Adding Big Data!
ANALYTICS EXAMPLES
Number of customers in each customer state (optionally by product or multiple products)
Average balance of customers by geo Average start date in each customer lifetime value decile by geo and device
New number of customers in each state Propensity to churn by age band and device
Cost of acquisition by age and gender Average session duration by cost of acquisition
Session duration differences between first and tenth session Network with highest up time last month
Number of calls per session Best performing ad network by day part in a geo, age band and device
And on and on and on and on….
ANALYTICS ACTION
35
SMARTER MARKETING
Spend + Media Arbitrage
Opportunities + Incremental Direct Marketing Spend
Improvement:
Map Media Buys to the Best Customer Demographic
Do sponsorships align with customer base?
Monitored transactions, renewals, customer care calls
Leveraged data to pitch right product, right time
Decrease in marketing cost
Increase in revenue, profit, customer satisfaction
VEHICLES FOR BIG DATA
Data Warehouse
Regional and Departmental
Views
ADS
Applications & Engines
Operational Analytics & Hot Views
Data Marts Independent
Dependent
Relational Data
Conformed Dimensions
Last Year
This Year
Next Year
THE EVER-EXPANDING DATA WAREHOUSE
• Enterprise Data Warehouse users face huge annual upgrade expenses
• To avoid this spend, organizations are looking for lower cost alternatives.
• Movement of data to tape not desired, because data is offline and not available for analytics
• Moving infrequently used data to Hadoop is a cost-effective, online option that preserves ability to query
Cost
DATA WAREHOUSE EXPANSION
Offload data to less expensive Hadoop cluster to save on data management costs
2
As data volume
increases exponentially,
cost of warehousing
rises also
Add operational data for greater insight and
agility in analytics and BI
4
1
Combine Hadoop data with DW data for a more
comprehensive view of history 3
HDFS
HDFS
HDFS
Where should analytics be created – in a relational environment or in Hadoop?
Where should they be analyzed? Do we have enough tools in a Hadoop environment to do analysis there?
How do businesses analyze a combination of structured and unstructured data?
Is it as simple as ‘structured data to the data warehouse or analytic one-offs and unstructured data to Hadoop’?
Is using Hadoop as a data refinery the best use of Hadoop?
Does any data go to both environments? Or do just summaries get shared?
Can price/performance of a database vendor’s product be superior to an open source product?
QUESTIONS FOR ACTIAN
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
2015 Editorial Calendar coming soon!
This Month: ANALYTIC PLATFORMS
November: DISCOVERY & VISUALIZATION
December: INNOVATORS
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!
Opening slide image courtesy of Wikimedia Commons