hadoop as an analytic platform: why not?

43
Grab some coee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 08-Jul-2015

283 views

Category:

Technology


3 download

DESCRIPTION

The Briefing Room with William McKnight and Actian Live Webcast on October 14, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=135528d85baa96a07850bd35961d459d Integrating Hadoop with existing data sources, workflows and analytics can be a real challenge. While some components, like Hive and Spark, can give SQL access to Hadoop data, there isn’t much that enables Hadoop to be treated as a genuine BI and analytics platform, capable of running multiple jobs that serve multiple users and multiple applications. But what if you could turn Hadoop into a versatile, high performance development platform, forgoing all the pain of figuring out how and where to manage big data? Register for this episode of The Briefing Room to hear veteran Analyst William McKnight as he discusses the fairly swift evolution of Hadoop’s capabilities. He’ll be briefed by Jim Hare of Actian, who will tout his company’s latest addition to its Analytic Platform: Hadoop SQL Edition. He will show how Actian has leveraged Hadoop and its scale out file system to create a fully functioning platform, providing everything from an analytic database to machine learning. Visit InsideAnlaysis.com for more information.

TRANSCRIPT

Page 1: Hadoop as an Analytic Platform: Why Not?

Grab some coffee and

enjoy the

pre-show

banter

before the top of the

hour!

Page 2: Hadoop as an Analytic Platform: Why Not?

The Briefing Room

Hadoop as an Analytic Platform: Why Not?

Page 3: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Page 4: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!  Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Topics

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: ANALYTIC PLATFORMS

November: DISCOVERY & VISUALIZATION

December: INNOVATORS

Page 6: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Executive Summary

Ø  Don’t build CARRIAGES for highways

Ø  Focus on NEW opportunities

Ø  SLOWLY ween off old systems

A NEW ERA of Architecture

Page 7: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Analyst: William McKnight

William is President of McKnight Consulting Group. His clients have included 17 of the Global 2000. Many clients have gone public with their success story. His team's implementations have won multiple Best Practices awards. William is an Entrepreneur of the Year Finalist, a frequent best practices judge and an expert witness. He has hundreds of articles and dozens of white papers in publication. William has also given numerous keynote presentations worldwide at major conferences and has given hundreds of public seminars and webinars. William’s experience includes taking his company to placement on the Inc. 500 and the Dallas 100 to seller of a multi-million dollar consulting firm. He is a passionate communicator and motivator, and a former IT VP of a Fortune 50 company.

Page 8: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Actian

! Actian is a database and software development company

!   The Actian Analytics Platform connects to data and Big Data sources to perform actionable and advanced analytics

! Actian recently released Hadoop SQL Edition, a component that enables SQL access on data stored in Hadoop

Page 9: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Guest: Jim Hare

Jim Hare is Senior Director of Product Marketing for the Actian Analytics Platform, helping organizations transform big data into business value. Prior to Actian, he was Director of Marketing at IBM responsible for go-to-market strategy and messaging for the big data platform. Prior to joining IBM in 2008, Jim was vice president of product marketing and business development at Celequest, a California-based operational business intelligence vendor, which was acquired by Cognos in 2007. He has over 16 years of deep experience in enterprise software, business

intelligence, business process management, business activity monitoring, big data, and automated software testing & monitoring. Jim holds a MS in Systems Management from the University of Southern California, and an undergraduate degree from the University of Colorado at Boulder.

Page 10: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  10

Hadoop  as  an  Analy'c  Pla8orm:  Why  Not?  Jim  Hare  14  October  2014        

Page 11: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  11

$140M Revenues + Profitable

10,000+ Customers

Global Presence: 8 world-wide offices, 7x 24 multinational support model

11 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor

Ac'an  is  Delivering  Transforma'onal  Value  

“Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE

Page 12: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  12

Emergence  of  Hadoop  as  the  Data  Reservoir  

Low  Cost  Storage  for  New  Data  &  Offload  

Page 13: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  13

Ø Scalable  -­‐  store  large  data  sets  across  low  cost  servers  

Ø Cost  effec.ve  –  1/10th  the  cost  of  tradi'onal  data  storage  

Ø Flexible    -­‐  quickly  and  easily  land  any  data  in  raw  format  

Ø Fast  Access  –'maps'  data  wherever  it  is  located  on  a  cluster  

Ø Resilient  -­‐  data  is  replicated  to  other  nodes  in  the  cluster    

Benefits  of  Hadoop  

Page 14: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  14

Hadoop Deployment Survey Results

Only 22% of Hadoop projects are in production today!

14

8%  

14%  

10%  

41%  

16%  

11%   Exploring  and  educa.ng  

41%  

16%  8%  

14%  

10%  

11%  

Conduc.ng  POC  

Developing  First  Solu.on  

Pilot  First  Solu.on  

First  Solu.on  

Deployed  

Suppor.ng  Mul.ple  Analy.cs  

Source: SandHill Group Research, “How do you Hadoop? A Survey of Big Data Practitioners”, May 2014

Page 15: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  15

Because It isn’t Easy to Analyze Hadoop Data

Batch performance is

Slow Lengthy Time to

Discover Insights Expensive Skills

Lack of Data

Access & Security

Data preparation is time-consuming

Analytics Complexity

Page 16: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  16

Organiza'ons  are  Replica'ng  Hadoop  Data  to  Overcome  Analy'c  Challenges  

Rela'onal  Data  Store  

OLTP,  ERP,  CRM  

Unstructured  docs,  emails  

Server  logs  

Social/Web  data  

Sensor,  machine  data  

Geoloca'on  

Clickstream  

Discovery  

Analy'cs  

Predic've   BI  

Hadoop  

•  Duplicate  storage  &  infrastructure  costs  •  More  IT  resources  to  manage  •  Network  bandwidth  usage    •  Less  accuracy  from  data  Sampling  •  Slower  ?me  to  analysis  results

Page 17: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  17

Transforma.onal  Value  Data  Explosion  

? Customer Delight

Competitive Advantage

World-Class Risk Management

Disruptive New Business Models

Ac6an  Transforms  Hadoop  from  a  Data  Reservoir  into  a  High  Performance  Analy6cs  PlaCorm  

Discovery  Analy.cs  

Time-­‐Sensi.ve  Analy.cs  

Crea'ng  Transforma'onal  Value  from  Hadoop  Data  

!   Highest  performing,  most  industrialized  Hadoop  analy'cs  pla8orm  

! Only  end-­‐to-­‐end  analy.c  processing  na'vely  in  Hadoop    

! Most  consumable,  accessible,  manageable  Hadoop  analy'cs    

Actian Analytics PlatformTM

Analyze

Act

Connect

Hadoop  

Page 18: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  18

Elastic and Secure

•  Schedule •  Transform •  Validate •  Aggregate •  Reformat •  Join •  Orchestrate

ENERPRISE Data

SOCIAL Data

MACHINE Data

CLOUD Data

LEGACY Data

DEVICE Data

Connect to Any Data Anywhere

200+ Connectors

Embeddable High Throughput

Engine

Drag and Drop Workflow Designs

Capture Data Feeds in Batch or Real-Time

Expandable Plugin

Framework

High Volume Parallel Data Processing

7  Ingredients  Added  to  Hadoop  to  Unlock  Value  1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  

source  and  any  type  

Page 19: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  19

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework  for  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

7  Ingredients  Add  to  Hadoop  to  Unlock  Value  

Connect                            Blend  &  Enrich          Discover                    Build  &  Test  Models  

Coding  

Page 20: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  20

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework:  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

3.  1500  KNIME  Operators  +  R  analy'cs  running  in  parallel  on  HDFS  +  Hadoop  =  The  Open  Source  Trifecta  

7  Ingredients  Added  to  Hadoop  to  Unlock  Value  

Gartner Magic Quadrant for Advanced Analytics Platforms Source: Gartner (February 2014)

Page 21: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  21

Complete  End-­‐to-­‐End  Analy'cs  on  Hadoop  

Source:    2013  Rexer  Analy'cs  Survey  

Page 22: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  22

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework:  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

3.  1500  KNIME  Operators  +  R  analy'cs  running  in  parallel  on  HDFS  +  Hadoop  =  The  Open  Source  Trifecta  

4.  High-­‐Performance,  YARN-­‐based  data  processing  engine  running  on  HDFS  

7  Ingredients  Added  to  Hadoop  to  Unlock  Value  

Actian DataPrep

LEADER On-Node Processing

Read Write Prepare Analyze Read Write Analyze

Optimizer

Page 23: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  23

High  Performance,  Parallelized  Processing  on  HDFS  Without  Any  Programming  

Ac.an  Analy.cs  PlaPorm  

Hadoop – Leader Node

Optimized, On-HDFS Processing

Query Pipelining CPU Pipelining

Reuse and share all components from

operators to workflows

Optimize

Choose from five sets of operators: Connections

Transformation Data Quality

Analytics Data Science

Automatically detect resources, plan

optimal utilization, and parallelize all

workloads on Hadoop

Use dual pipeline parallelism to

accelerate performance 30X

Run fully optimized processing directly on the Hadoop node via

YARN

Take processing to where the data lives,

runs natively on Hortonworks

Visual Framework

Manage the entire analytic process in a visual framework with no coding required.

≠ ☼ ≡ ∞ ∆ ∑ √ ≈ ∑ = ? # ~ ‰

Page 24: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  24

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework:  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

3.  1500  KNIME  Operators  +  R  analy'cs  running  in  parallel  on  HDFS  +  Hadoop  =  The  Open  Source  Trifecta  

4.  High-­‐Performance,  YARN-­‐based  data  processing  engine  running  on  HDFS  

5.  High-­‐Performance,  vector  processing  engine  as  the  pajern  for  SQL  on  Hadoop  

7  Ingredients  Added  to  Hadoop  to  Create  Value  

Page 25: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  25

Vector-­‐based  SQL  Processing  Na'vely  on  HDFS  

HADOOP

YARN

HDFS

Datanode HDFS

Datanode HDFS

Datanode HDFS

Datanode HDFS

X100 X100 X100

Visual Data & Analytics Workbench

Read  Load    

Ac'an  Vector  Blend  &  Enrich  

Data  Science    &  Analy'cs  

Datanode HDFS

X100

Namenode High Performance, Industrialized SQL

Database

High Performance, Parallelized Data

Flow Engine

SQL

Standards - ANSI SQL 92 plus advanced analytics Optimized - mature, proven planner and optimizer Secure – native DBMS security Reliable - full ACID-compliance Manageable – YARN certified Performance - 30X faster than Impala Scalable – unlimited expansion as Hadoop cluster grows Native – runs natively on top of HDFS via YARN

Industrialized

Page 26: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  26

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework:  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

3.  1500  KNIME  Operators  +  R  analy'cs  running  in  parallel  on  HDFS  +  Hadoop  =  The  Open  Source  Trifecta  

4.  High-­‐Performance,  YARN-­‐based  data  processing  engine  running  on  HDFS  

5.  High-­‐Performance,  vector  processing  engine  as  the  pajern  for  SQL  on  Hadoop  

6.  Extreme-­‐Performance,  super-­‐low  latency,  massively  parallel  analy'cs  engine  

7  Ingredients  Added  to  Hadoop  to  Unlock  Value  

Page 27: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  27

Libraries of Analytics

Mas

sive

ly P

aral

lel

Inte

grat

ion

Hadoop

Sophisticated, Low Latency Analytics in

Database

Connections for Any Data

Actian Analytics PlatformTM

Enterprise Data

Machine Data

Social Data

Business Processes

Users

Machines

Applications

Data Warehouse

Real-Tim

e A

nalytic Services

Visual Framework for Data and Analytic Workflows

SaaS Data

Ac'an  Analy'cs  Pla8orm:      Next  Genera'on  Big  Data  Analy'cs  

Amazon Redshift

High Performance Data Science Natively on

Hadoop

Page 28: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  28

1.  High  speed  integra'on  to  on-­‐board  data  from  any  data  source  and  any  type  

2.  Visual  Framework:  connec'ng,  blending,  &  enriching  data,  data  science  discovery,  building  and  tes'ng  predic've  models  

3.  1500  KNIME  Operators  +  R  analy'cs  running  in  parallel  on  HDFS  +  Hadoop  =  The  Open  Source  Trifecta  

4.  High-­‐Performance,  YARN-­‐based  data  processing  engine  running  on  HDFS  

5.  High-­‐Performance,  vector  processing  engine  as  the  pajern  for  SQL  on  Hadoop  

6.  Extreme-­‐Performance,  super-­‐low  latency,  massively  parallel  analy'cs  engine  

7.  Blueprints  to  accelerate  analy'cs  applica'on  development  and  value  crea'on  

7  Ingredients  Added  to  Hadoop  to  Unlock  Value  

Page 29: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  29

Big  Data  2.0    Media  Mix  Modeling  Blueprint  

IMPACT FORECAST ANALYSIS

MARKETING IMPACT

ANALYSIS

CRMdb

All Relevant Account Info and Demographics

CONNECT

BUILD CUSTOMER

PROFILE

EDWdb

All Relevant Sales Histories

ANALYZE ACT

MAXIMIZE REVENUE

FROM MARKETING

SENTIMENT AND

CONTENT ANALYSIS

AGGREGATE SALES DATA

Hadoop Logs

Detailed ePOS Receipts

SKU LEVEL SALES DATA

BY GEO

JOIN DERIVE

AGGREGATE PREPARE

EDWdb

Marketing Vehicle Details

CAPTURE MARKETING VEHICLES

MARKETING MIX SALES

CONTRIBUTION YEARLY CHANGE

REPORT

SALES VOLUME,

EFFECTIVENESS, EFFICIENCY

AND ROI REPORT

NEW MEDIA MIX

OPTIMIZATION

MINIMIZE MARKETING SPEND TO REVENUE

RATIO

Hadoop Text Files

Campaign Response Notes

PREPARE FOR TEXT

ANALYTICS

CUSTOMER MATCH WITH

CAMPAIGNS

VEHICLE RESULTS AT GEO,

STORE, SKU AND

CUSTOMER LEVEL

Page 30: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  30

1)  Stay  tuned  for  several  exci'ng  announcements  on  16  October  2014  at  the  Strata  Conference  in  NYC!  

2)  Visit  Ac'an  at  Booth  225  for  a  demo  of                                        Ac'an  Analy'cs  Pla8orm  -­‐  the  Highest  Performing  Analy'cs  &  SQL  in  Hadoop  

3)  Download  and  try  it  out  yourself:    bigdata.ac'an.com/sql-­‐in-­‐Hadoop  

Learn  more  about  the  Ac'an  Analy'cs  Pla8orm  –  Hadoop  SQL  Edi'on  

Page 31: Hadoop as an Analytic Platform: Why Not?

Confiden'al  ©  2014  Ac'an  Corpora'on  31

www.ac'an.com    facebook.com/ac'ancorp    @ac'ancorp    

Thank  You  

Page 32: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: William McKnight

Page 33: Hadoop as an Analytic Platform: Why Not?

ANALYTICS: A BUSINESS IMPERATIVE

Formed from SUMMARIES of data

Tied to Business Actions

Continual Re-evaluation

i.e., Customer Segmentation and Profit

Adding Big Data!

Page 34: Hadoop as an Analytic Platform: Why Not?

ANALYTICS EXAMPLES

Number of customers in each customer state (optionally by product or multiple products)

Average balance of customers by geo Average start date in each customer lifetime value decile by geo and device

New number of customers in each state Propensity to churn by age band and device

Cost of acquisition by age and gender Average session duration by cost of acquisition

Session duration differences between first and tenth session Network with highest up time last month

Number of calls per session Best performing ad network by day part in a geo, age band and device

And on and on and on and on….

Page 35: Hadoop as an Analytic Platform: Why Not?

ANALYTICS ACTION

35

Page 36: Hadoop as an Analytic Platform: Why Not?

SMARTER MARKETING

Spend + Media Arbitrage

Opportunities + Incremental Direct Marketing Spend

Improvement:

Map Media Buys to the Best Customer Demographic

Do sponsorships align with customer base?

Monitored transactions, renewals, customer care calls

Leveraged data to pitch right product, right time

Decrease in marketing cost

Increase in revenue, profit, customer satisfaction

Page 37: Hadoop as an Analytic Platform: Why Not?

VEHICLES FOR BIG DATA

Data Warehouse

Regional and Departmental

Views

ADS

Applications & Engines

Operational Analytics & Hot Views

Data Marts Independent

Dependent

Relational Data

Conformed Dimensions

Page 38: Hadoop as an Analytic Platform: Why Not?

Last Year

This Year

Next Year

THE EVER-EXPANDING DATA WAREHOUSE

•  Enterprise Data Warehouse users face huge annual upgrade expenses

•  To avoid this spend, organizations are looking for lower cost alternatives.

•  Movement of data to tape not desired, because data is offline and not available for analytics

•  Moving infrequently used data to Hadoop is a cost-effective, online option that preserves ability to query

Cost

Page 39: Hadoop as an Analytic Platform: Why Not?

DATA WAREHOUSE EXPANSION

Offload data to less expensive Hadoop cluster to save on data management costs

2

As data volume

increases exponentially,

cost of warehousing

rises also

Add operational data for greater insight and

agility in analytics and BI

4

1

Combine Hadoop data with DW data for a more

comprehensive view of history 3

HDFS

HDFS

HDFS

Page 40: Hadoop as an Analytic Platform: Why Not?

Where should analytics be created – in a relational environment or in Hadoop?

Where should they be analyzed? Do we have enough tools in a Hadoop environment to do analysis there?

How do businesses analyze a combination of structured and unstructured data?

Is it as simple as ‘structured data to the data warehouse or analytic one-offs and unstructured data to Hadoop’?

Is using Hadoop as a data refinery the best use of Hadoop?

Does any data go to both environments? Or do just summaries get shared?

Can price/performance of a database vendor’s product be superior to an open source product?

QUESTIONS FOR ACTIAN

Page 41: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Page 42: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

2015 Editorial Calendar coming soon!

This Month: ANALYTIC PLATFORMS

November: DISCOVERY & VISUALIZATION

December: INNOVATORS

Page 43: Hadoop as an Analytic Platform: Why Not?

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!

Opening slide image courtesy of Wikimedia Commons