2015 02 12 talend hortonworks webinar challenges to hadoop adoption

34
1 ©2015 Talend Inc. Challenges to Hadoop Adop0on: If You Can Dream It, You Can Build It February 12, 2015

Upload: hortonworks

Post on 14-Jul-2015

709 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

1

©2015 Talend Inc.

Challenges  to  Hadoop  Adop0on:  If  You  Can  Dream  It,  You  Can  Build  It    February  12,  2015

Page 2: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

2

Welcome  

A  few  logis0cal  points..    

•  All  par0cipants  are  muted  

•  You  may  ask  ques0ons  using  the  Q&A  panel  located  on  boFom  or  GoToWebinar  applet  

•  Answers  will  be  provided  aJer  the  presenta0on  

•  If  0me  is  too  short  to  address  all  ques0ons,  answers  will  be  provided  via  email  

•  To  receive  a  replay  of  our  webinar  today,  please  send  us  an  email  to  [email protected]  

•  If  you  are  experiencing  connec0on  problems,  please  use  the  Q&A  panel  to  communicate  

Page 3: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

3

©2015 Talend Inc.

Challenges  to  Hadoop  Adop0on:  If  You  Can  Dream  It,  You  Can  Build  It    February  12,  2015

Page 4: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

4

Your  Speakers  Today    

Jim Walker Director, Product Marketing

Shawn James Director, Alliances & Business Development

Mark Balkenende Sr. Sales Solution Architect

Page 5: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP

Winter 2015 Version 1.0

Hortonworks. We do Hadoop.

Page 6: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 7: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop emerged as foundation of new data architecture

Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data •  Built by Yahoo! to be the heartbeat of its ad & search business

•  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises

•  Incredibly disruptive to current platform economics

Traditional Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source

Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade

Application

Storage HDFS

Batch Processing MapReduce

Page 8: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Modern Data Architecture emerges to unify data & processing

Modern Data Architecture •  Enable applications to have access to

all your enterprise data through an efficient centralized platform

•  Supported with a centralized approach governance, security and operations

•  Versatile to handle any applications and datasets no matter the size or type

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

SOU

RC

ES

Existing Systems

ERP   CRM   SCM  

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization & Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-Time Batch Partner ISV Batch Batch MPP  

EDW  

Page 9: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a “data lake”

Page 10: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop Driver: Cost optimization

Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer

Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL

Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

HDP helps you reduce costs and optimize the value associated with your EDW

AN

ALY

TIC

S D

ATA

SYST

EMS

Data Marts

Business Analytics

Visualization & Dashboards

HDP 2.2

ELT °

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Cold Data, Deeper Archive & New Sources

Enterprise Data

Warehouse

Hot

MPP

In-Memory

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Existing Systems

ERP   CRM   SCM  

SOU

RC

ES

Page 11: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Single View Improve acquisition and retention

Predictive Analytics Identify your next best action

Data Discovery Uncover new findings

Financial Services

New Account Risk Screens Trading Risk Insurance Underwriting

Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service

Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement

Telecom

Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse

Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis

Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers

Retail

360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase

Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs

Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior

Manufacturing

Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data

Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance

Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields

Healthcare

Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials

Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste

Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service

Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration

DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells

Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness

Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting

Hadoop Driver: Advanced analytic applications

Page 12: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop Driver: Enabling the data lake SC

ALE

SCOPE

Data Lake Definition •  Centralized Architecture

Multiple applications on a shared data set with consistent levels of service

•  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.

•  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.

Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps

Goal: •  Centralized Architecture •  Data-driven Business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

Page 13: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Challenges to Hadoop Adoption

•  Where do I start? Why is this of value to me and my organization?

•  Hadoop is complex, what do I use for what?

•  It is too complex. I don’t have any trained Hadoop resources.

Many have been down this path…

Page 14: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

14

Connec3ng  the  Data-­‐Driven  Enterprise

Page 15: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

15

Main  Challenges  in  the  Data  Integra3on  Market  

BIG  DATA  More  data,  less  structure

PRODUCTIVITY  Can’t  keep  up  with  demand  

COST  Expensive  solu3ons  

SKILLS  Hard  to  find  talent  

Page 16: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

16

The  Big  Data  Demand  

4.4  MILLION  JOBS  IN  BIG  DATA  BY  2015  but  only  one  third  of  

those  jobs  will  be  filled  Source: Gartner

Page 17: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

17

The  Hadoop  Ecosystem  is  Complex  

Source:  “Hadoop  Ecosystem  Overview”,  Forrester  2014  

Page 18: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

18

Talend  Brings  Unmatched  Produc3vity    

HAND-­‐CODING  

•  Unproduc3ve

•  Need  specialized  skills

•  Hard  to  maintain

•  Limited  support

TALEND  ENTERPRISE  

•  800+  components

•  Generates  op3mized  code

•  Collabora3on  &  management

•  Gold  support  (SLAs)

Page 19: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

19

Future-­‐Proof  Architecture  With  Na3ve  Code  Gen  

ETL  Day-­‐to-­‐day  integra3on

ELT  DW  Appliance

ESB  Messaging,  Rou3ng,  Transforma3on

HADOOP  Highly

Scalable

Spark

Page 20: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

20 Select Icons made by Freepik, Situ Herrera, www.flaticon.com

Talend  Big  Data  

Legacy Systems

ERP

Internet of Things

DBMS / EDW

NoSQL Standard Reports Ad-hoc Query Tools

Data Mining

MDD/OLAP

Analytical Applications

NoSQL

Web Logs

Develop and Test Operations Team

Studio

Talend Big Data

Inge

stio

n

Map Profile Parse Match

Cleanse Standardize Change Data Capture

Machine Learning

Share Schedule

Native A

ccess Future Proof Architecture

Lowest TCO

Increased Productivity

Benefits

Page 21: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

21

Easiest  and  Most  Powerful  Integra3on  Solu3on  for  Big  Data

Talend  Big  Data  

Page 22: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

22

Main  Challenges  in  the  Data  Market  

SCALABLE   AGILE  

LOWEST  TCO  EASY  

Page 23: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

23

1,800  Leading  Brands  Use  Talend  

FINANCE  &  INSURANCE

SERVICES

MANUFACTURING  &  RETAIL

PUBLIC  SECTOR  &  EDUCATION

Page 24: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

24

©2015 Talend Inc

Live  Demo  

Page 25: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

25

Key  Takeaways  

•  See  how  Talend’s  Big  Data  Pla[orm  addresses  the  Skills  Gap •  See  how  Talend  will  increase  your  Big  Data  Produc3vity •  Agree  Talend  and  Hortonworks  has  the  technology  and  skills  to  sa3sfy  your  business  requirements

BIG  DATA  More  data,  less  structure

PRODUCTIVITY  Can’t  keep  up  with  demand  

SKILLS  Hard  to  find  talent  

Page 26: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

26

Demonstra0on  Use  Case  

Objec3ve  of  the  Use  Case  was  to  iden3fy    data  quality  issues  prior  to  loading  data  to  the  EDW  without  increasing  the  actual  load  window. •  Load    500  TB  Compressed  Files  to  HFDS -  3rd  Party  Sales/Prescribing  files  delivered  by  Vendor

•  Compute  Monthly  Totals -  Prior  to  loading  to  EDW  compare  prior  month’s  totals  to  current  Month  totals  within  new  data  

files  

•  Display  Comparison  results  in  Analy3cal  Tool -  Display  total  Sales  comparison  for  each  Product  to  quickly  show  Data  Quality  issues  before  

loading  to  EDW  Staging

Page 27: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

27

Typical  3rd  Party  Data  Load  

Data Preparation Warehouse Processing Final Reports / Quality Check

Bad Big Data Quality issues results in lost time, resource & revenue

Page 28: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

28

Data  Warehouse  Op0miza0on  

Data Preparation Warehouse Processing Final Reports / Quality Check

Hadoop Cluster ü Upfront Quality Checks

ü Identify Master records earlier

ü Load Uncompressed data

directly to DWH staging

Optimized Loading

Page 29: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

29

©2015 Talend Inc

Live  Demo  

Page 30: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

30

What  stood  out  most?

Recap  on  the  Demonstra0on?    

•  Hortonworks  and  Talend  can  help  you  reduce  costs  

• Offload  costly  ETL  process   •  Enrich  the  value  of  your  EDW

• Graphical  drag  and  drop  visual  environment  showcasing  Talend  and  Hortonworks

 

Page 31: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

31

Hortonworks/Talend  Sandbox  

•  Graphical  drag  and  drop  visual  environment  showcasing  Hortonworks -  Visually  see  the  results  of  integra3on  process

•  Accelerates  data  loading  and  transforma3on  with  Hadoop -  Build  and  deploy  MapReduce  and  Pig  jobs  on  YARN

•  Pre-­‐built  use  cases:    data  warehouse  op3miza3on,  clickstream  data,  Twiger  sen3ment,  Apache  weblogs

•  Demonstra3ons  of  several  NoSQL  databases  

Page 32: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

32

From  Zero  to  Big  Data  in  10  Minutes  Download free www.talend.com/hortonworks-­‐sandbox

•  Get up and running in minutes, not weeks, with a big data Sandbox and demos

•  Includes: Sentiment analysis, ETL Offload, Log file analysis

•  Start working with Talend & Hortonworks today!

Page 33: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

33

©2015 Talend Inc

Back  up  slides  

Page 34: 2015 02 12 talend hortonworks webinar challenges to hadoop adoption

34

HDFS2  (Redundant,  Reliable  Storage)  

YARN  (Cluster  Resource  Management)      

BATCH  (MapReduce)  

INTERACTIVE  (Tez)  

STREAMING  (Storm,  Spark)  

GRAPH  (Giraph)  

NoSQL  (MongoDB)  

Events  (Falcon)  

ONLINE  (HBase)  

OTHER  (Search)  

TRANSFORM  (Data  Refinement)  PROFILE   PARSE  MAP   CDC  CLEANSE   STANDARD-­‐  

IZE  MACHINE  LEARNING  MATCH  

TAP  (Inges3on)  

SQOOP  FLUME  

HDFS  API  

HBase  API  HIVE  

800+  

DELIVER  (as  an  API)  

Ac3veMQ  Karaf  Camel  CXF  Kaca  Storm  Meta  Security  

MDM  iPaaS  Govern  HA  

Reference  Architecture