2015 02 12 talend hortonworks webinar challenges to hadoop adoption
TRANSCRIPT
1
©2015 Talend Inc.
Challenges to Hadoop Adop0on: If You Can Dream It, You Can Build It February 12, 2015
2
Welcome
A few logis0cal points..
• All par0cipants are muted
• You may ask ques0ons using the Q&A panel located on boFom or GoToWebinar applet
• Answers will be provided aJer the presenta0on
• If 0me is too short to address all ques0ons, answers will be provided via email
• To receive a replay of our webinar today, please send us an email to [email protected]
• If you are experiencing connec0on problems, please use the Q&A panel to communicate
3
©2015 Talend Inc.
Challenges to Hadoop Adop0on: If You Can Dream It, You Can Build It February 12, 2015
4
Your Speakers Today
Jim Walker Director, Product Marketing
Shawn James Director, Alliances & Business Development
Mark Balkenende Sr. Sales Solution Architect
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Winter 2015 Version 1.0
Hortonworks. We do Hadoop.
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional systems under pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data • Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises
• Incredibly disruptive to current platform economics
Traditional Hadoop Advantages ü Manages new data paradigm ü Handles data at scale ü Cost effective ü Open source
Traditional Hadoop Had Limitations " Batch-only architecture " Single purpose clusters, specific data sets " Difficult to integrate with existing investments " Not enterprise-grade
Application
Storage HDFS
Batch Processing MapReduce
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Modern Data Architecture • Enable applications to have access to
all your enterprise data through an efficient centralized platform
• Supported with a centralized approach governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP
EDW
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop adoption follows a predictable journey Cost Optimization, new analytic apps, and ultimately to a “data lake”
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer
Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL
Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
AN
ALY
TIC
S D
ATA
SYST
EMS
Data Marts
Business Analytics
Visualization & Dashboards
HDP 2.2
ELT °
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data, Deeper Archive & New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SOU
RC
ES
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Single View Improve acquisition and retention
Predictive Analytics Identify your next best action
Data Discovery Uncover new findings
Financial Services
New Account Risk Screens Trading Risk Insurance Underwriting
Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement
Telecom
Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse
Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
Retail
360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase
Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Manufacturing
Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data
Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Healthcare
Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials
Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting
Hadoop Driver: Advanced analytic applications
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the data lake SC
ALE
SCOPE
Data Lake Definition • Centralized Architecture
Multiple applications on a shared data set with consistent levels of service
• Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.
Drivers: 1. Cost Optimization 2. Advanced Analytic Apps
Goal: • Centralized Architecture • Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Challenges to Hadoop Adoption
• Where do I start? Why is this of value to me and my organization?
• Hadoop is complex, what do I use for what?
• It is too complex. I don’t have any trained Hadoop resources.
Many have been down this path…
14
Connec3ng the Data-‐Driven Enterprise
15
Main Challenges in the Data Integra3on Market
BIG DATA More data, less structure
PRODUCTIVITY Can’t keep up with demand
COST Expensive solu3ons
SKILLS Hard to find talent
16
The Big Data Demand
4.4 MILLION JOBS IN BIG DATA BY 2015 but only one third of
those jobs will be filled Source: Gartner
17
The Hadoop Ecosystem is Complex
Source: “Hadoop Ecosystem Overview”, Forrester 2014
18
Talend Brings Unmatched Produc3vity
HAND-‐CODING
• Unproduc3ve
• Need specialized skills
• Hard to maintain
• Limited support
TALEND ENTERPRISE
• 800+ components
• Generates op3mized code
• Collabora3on & management
• Gold support (SLAs)
19
Future-‐Proof Architecture With Na3ve Code Gen
ETL Day-‐to-‐day integra3on
ELT DW Appliance
ESB Messaging, Rou3ng, Transforma3on
HADOOP Highly
Scalable
Spark
20 Select Icons made by Freepik, Situ Herrera, www.flaticon.com
Talend Big Data
Legacy Systems
ERP
Internet of Things
DBMS / EDW
NoSQL Standard Reports Ad-hoc Query Tools
Data Mining
MDD/OLAP
Analytical Applications
NoSQL
Web Logs
Develop and Test Operations Team
Studio
Talend Big Data
Inge
stio
n
Map Profile Parse Match
Cleanse Standardize Change Data Capture
Machine Learning
Share Schedule
Native A
ccess Future Proof Architecture
Lowest TCO
Increased Productivity
Benefits
21
Easiest and Most Powerful Integra3on Solu3on for Big Data
Talend Big Data
22
Main Challenges in the Data Market
SCALABLE AGILE
LOWEST TCO EASY
23
1,800 Leading Brands Use Talend
FINANCE & INSURANCE
SERVICES
MANUFACTURING & RETAIL
PUBLIC SECTOR & EDUCATION
24
©2015 Talend Inc
Live Demo
25
Key Takeaways
• See how Talend’s Big Data Pla[orm addresses the Skills Gap • See how Talend will increase your Big Data Produc3vity • Agree Talend and Hortonworks has the technology and skills to sa3sfy your business requirements
BIG DATA More data, less structure
PRODUCTIVITY Can’t keep up with demand
SKILLS Hard to find talent
26
Demonstra0on Use Case
Objec3ve of the Use Case was to iden3fy data quality issues prior to loading data to the EDW without increasing the actual load window. • Load 500 TB Compressed Files to HFDS - 3rd Party Sales/Prescribing files delivered by Vendor
• Compute Monthly Totals - Prior to loading to EDW compare prior month’s totals to current Month totals within new data
files
• Display Comparison results in Analy3cal Tool - Display total Sales comparison for each Product to quickly show Data Quality issues before
loading to EDW Staging
27
Typical 3rd Party Data Load
Data Preparation Warehouse Processing Final Reports / Quality Check
Bad Big Data Quality issues results in lost time, resource & revenue
28
Data Warehouse Op0miza0on
Data Preparation Warehouse Processing Final Reports / Quality Check
Hadoop Cluster ü Upfront Quality Checks
ü Identify Master records earlier
ü Load Uncompressed data
directly to DWH staging
Optimized Loading
29
©2015 Talend Inc
Live Demo
30
What stood out most?
Recap on the Demonstra0on?
• Hortonworks and Talend can help you reduce costs
• Offload costly ETL process • Enrich the value of your EDW
• Graphical drag and drop visual environment showcasing Talend and Hortonworks
31
Hortonworks/Talend Sandbox
• Graphical drag and drop visual environment showcasing Hortonworks - Visually see the results of integra3on process
• Accelerates data loading and transforma3on with Hadoop - Build and deploy MapReduce and Pig jobs on YARN
• Pre-‐built use cases: data warehouse op3miza3on, clickstream data, Twiger sen3ment, Apache weblogs
• Demonstra3ons of several NoSQL databases
32
From Zero to Big Data in 10 Minutes Download free www.talend.com/hortonworks-‐sandbox
• Get up and running in minutes, not weeks, with a big data Sandbox and demos
• Includes: Sentiment analysis, ETL Offload, Log file analysis
• Start working with Talend & Hortonworks today!
33
©2015 Talend Inc
Back up slides
34
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, Spark)
GRAPH (Giraph)
NoSQL (MongoDB)
Events (Falcon)
ONLINE (HBase)
OTHER (Search)
TRANSFORM (Data Refinement) PROFILE PARSE MAP CDC CLEANSE STANDARD-‐
IZE MACHINE LEARNING MATCH
TAP (Inges3on)
SQOOP FLUME
HDFS API
HBase API HIVE
800+
DELIVER (as an API)
Ac3veMQ Karaf Camel CXF Kaca Storm Meta Security
MDM iPaaS Govern HA
Reference Architecture