syncsort, tableau, & cloudera present: break the barriers to big data insight
TRANSCRIPT
1 ©2014 Cloudera, Inc. All rights reserved.1
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
2
Agenda
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
• Data Warehouse Vision & Reality• What is legacy data & why an Enterprise Data Hub• Offloading legacy data and workloads to Hadoop• Transform all types of data into self-service analytics• Live Demonstration• Customer case study• Q&A
4
Real-Time
Mainframe
Oracle
ERP
ETL ETL
Data Mart
DataWarehouse
File
XML
The Data Warehouse Vision -1998
4
Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth
Data Mart
Data Mart
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
5
Data Warehouse Reality 2014
5
Real-Time
Mainframe
Oracle
ERP
ETL ETL
Data Mart
File
XML
Data Integration & ETL Tools would enable a Single, Consistent Version of the Truth
Data Mart
Data Mart
Dormant Data
Staging / ELT
New
Reports
SLA’s
New
ColumnComplete
History
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
6
The Data Warehouse Vision vs Reality
Fresher data
Longer history data
Faster analytics
More data sources
Lower costs
Longer ELT batch windows
Shorter data retention
Slower queries
Weeks/months just to add new data fields
Growing costs
Vision Reality
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
7
Mainframes | A Critical Source of Big Data
7
Top 25World Banks
9 of World’s
Top Insurers
23 of Top 25 US
Retailers
71%Fortune 500
30 Billion Bus. Transactions / day
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
8
Suits & Hoodies – Working Together
8
Integration
Gaps
Expertise
Gaps
• COBOL appeared in 1959, Hadoop in 2005• Mainframe & Hadoop skills shortage
Security
Gaps
• Hosts mission critical sensitive data• Very difficult to install new software on MF
Costs
Gaps
• Mainframe data is (expensive) Big Data• Even FTP costs CPU cycles (MIPS)
• Connectivity• Data conversion (EBCDIC vs ASCII)
Suits & Hoodies idea: Merv Adrian, Gartner Research.
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
9
Expanding Data Requires A New Approach
9
1980sBring Data to Compute
NowBring Compute to Data
Relative size & complexity
DataInformation-centric
businesses use all data:
Multi-structured, internal & external data
of all types
Compute
Compute
Compute
Process-centric businesses use:
• Structured data mainly• Internal data only• “Important” data only
Compute
Compute
Compute
Data
Data
Data
Data
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
10
From Apache Hadoop to an enterprise data hub
10
Open SourceScalableFlexibleCost-Effective
✔
Managed
Open Architecture
Secure and Governed
✖
✖
✖
BATCHPROCESSING
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE
FILESYSTEM
MAPREDUCE
HDFS
Core Apache Hadoop is great, but…
1) Hard to use and manage.
2) Only supports batch processing.
3) Not comprehensively secure.
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
11
From Apache Hadoop to an enterprise data hub
11
Open SourceScalableFlexibleCost-Effective
✔
Managed
Open Architecture
Secure and Governed
✔
BATCHPROCESSING
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM
MA
NA
GEM
ENTFILESYSTEM
MAPREDUCE
HDFS
CL
OU
DE
RA
MA
NA
GE
R
✖
✖
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
12
From Apache Hadoop to an enterprise data hub
12
Open SourceScalableFlexibleCost-Effective
✔
Managed
Open Architecture
Secure and Governed
✔
✔
BATCHPROCESSING
ANALYTICSQL
SEARCHENGINE
MACHINELEARNING
STREAMPROCESSING
3RD PARTYAPPS
WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE SYSTEM
MA
NA
GEM
ENTFILESYSTEM ONLINE NOSQL
MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING
YARN
HDFS HBASE
CL
OU
DE
RA
MA
NA
GE
R
✖
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
13
From Apache Hadoop to an enterprise data hub
13
Open SourceScalableFlexibleCost-Effective
✔
Managed
Open Architecture
Secure and Governed
✔
✔
✔
BATCHPROCESSING
ANALYTICSQL
SEARCHENGINE
MACHINELEARNING
STREAMPROCESSING
3RD PARTYAPPS
WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE
DA
TAM
AN
AG
EMEN
TSYSTEM
MA
NA
GEM
ENTFILESYSTEM ONLINE NOSQL
MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING
YARN
HDFS HBASE
CL
OU
DE
RA
NA
VIG
AT
OR
CL
OU
DE
RA
MA
NA
GE
R
SENTRY
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
14
From Apache Hadoop to an enterprise data hub
14
Open SourceScalableFlexibleCost-Effective
✔
Managed
Open Architecture
Secure and Governed
✔
✔
✔
BATCHPROCESSING
ANALYTICSQL
SEARCHENGINE
MACHINELEARNING
STREAMPROCESSING
3RD PARTYAPPS
WORKLOAD MANAGEMENT
STORAGE FOR ANY TYPE OF DATAUNIFIED, ELASTIC, RESILIENT, SECURE
DA
TAM
AN
AG
EMEN
TSYSTEM
MA
NA
GEM
ENT
CLOUDERA’S ENTERPRISE DATA HUB
FILESYSTEM ONLINE NOSQL
MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING
YARN
HDFS HBASE
CL
OU
DE
RA
NA
VIG
AT
OR
CL
OU
DE
RA
MA
NA
GE
R
SENTRY
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
15
Partners
Proactive &Predictive Support
ProfessionalServices
Training
Cloudera: Your Trusted Advisor for Big Data
15
Advance from Strategy to ROI with Best Practices and Peak Performance
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
16 ©2014 Cloudera, Inc. All rights reserved.16 ©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
17
The Impact of ELT & Dormant Data on the EDW
17 ©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
ELT drives up to 80% of database capacity
Dormant – rarely used data – waste premium storage
ETL/ELT processes on dormant data waste premium CPU cycles
Hot Warm Cold Data
Transformations (ELT) of unused data
1818 ©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
19
Where to Start?
19
How to identify dormant data?
What workloads will deliver the biggest impact?
How will you access &
move all your data?
Can you secure the new environment?
How do you optimize it?How do you manage it?
How do you make it business-class?
What tools do you need?
How will you leverage all your data, including mainframes?
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
2020
Offload Legacy Data & Workloads to The Enterprise Data Hub
Phase III:
Optimize & SecurePhase II:
OffloadPhase I:
Identify
One Framework. Blazing Performance, Iron-Clad Security, Disruptive Economics
• Identify data & workloads
most suitable for offload
• Focus on those that will
deliver maximum savings &
performance
• Access and move virtually any data e.g. mainframe to Enterprise Data Hub with one tool
• Easily replicate existing staging workloads in Hadoop using a graphical user interface
• Deploy on premises and in Cloud• Optimize the new environment• Manage & secure all your data
with business class tools• Deliver self-service reporting
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
22
The Problem: Volume of DataBusinesses are struggling to unlock exploding data
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
23
The Problem: Diverse DataBusinesses and their people are struggling to unlock diverse data
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
24
The Problem: Old School
SoftwareTraditional technologies are complicated, inflexible and slow moving
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
25
The Tableau RevolutionFast and easy analytics for everyone
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
26
FlexibleTransform all types of data into self-service analytics
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
27
For EveryoneEase of use leads to adoption across all departments and use cases
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
29
Case Study: Optimize EDW Leading Financial Org
29
0
50
100
150
200
250
Elap
sed
Tim
e (m
)
HiveQL217 min
SyncsortDMX-h9 min
HiveQL217 min
Mainframe Offload(74-page COBOL
copybook)
Development Effort
Syncsort DMX-h: 4 hrs.
Manual Coding: Weeks!
Benefits:
Cut development time from weeks to hours Reduced complexity 47 HiveQL scripts to 4 DMX-h graphical jobs Easily validate COBOL copybooks and find errors
Mainframe Data available to business for analytics
Staging & ELT moved out of RDBMS – Queries run faster
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
3030
Final Thoughts..
Rusty Sears
©2014 Cloudera, Syncsort, Tableau Inc. All rights reserved.
Vice President of Enterprise Data Services and Big Data at Regions Financial Corporation