1 © Copyright 2014 EMC Corporation. All rights reserved.
Breakout: Move to the Business Data Lake – Not as Hard as it Sounds Michael Wood & Steve Jones
2 © Copyright 2014 EMC Corporation. All rights reserved.
Agenda
! Introductions
! What is the Data Lake? (…and better yet, Why?)
! Business Demands on Data
! Dealing with People and Technology Realistically
! No Rip and Replace/Evolve Towards Business Value
! Call to Action
3 © Copyright 2014 EMC Corporation. All rights reserved.
What Do We Need to Change?
• Data Volume Exploding
• Importance of Analytics Accelerating
• Demand for Different Kinds of Data
Enterprise Data Systems
Limited by Schema! Limited by Cost!
Data that Doesn’t Fit is Discarded!
4 © Copyright 2014 EMC Corporation. All rights reserved.
What if We Can Break Out?
BATTLE-TESTED MPP DATABASE
MPP QUERY ON HADOOP
IN-MEMORY DATA GRID
Store Everything! Analyze Anything!
5 © Copyright 2014 EMC Corporation. All rights reserved.
Multiple Internal Views– Consistently Compromised
Cor
pora
te
Ad-
hoc
LOB
M
anag
emen
t
Ope
ratio
ns
Market
Ope
ratio
ns
LOB mart Spreadsheets
Line of business
Transactional systems
CRM ERP PLM
EDW Corporate ODS
Web
6 © Copyright 2014 EMC Corporation. All rights reserved.
Multiple Internal Views–Consistently Compromised
Cor
pora
te
Ad-
hoc
LOB
M
anag
emen
t
Ope
ratio
ns
Market
Ope
ratio
ns
LOB mart Spreadsheets
Line of business
Transactional systems
CRM ERP PLM
EDW Corporate ODS
Web
Fit
Detail
Freshness
Fidelity
7 © Copyright 2014 EMC Corporation. All rights reserved.
Why the Single View Fails
Div
isio
n 1
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
K
PIs
Div
sion
al K
PIs
Corporate
Now agree on
everything D
ivis
ion
2
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Div
isio
n 3
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Div
isio
n 4
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Corporate KPIs
8 © Copyright 2014 EMC Corporation. All rights reserved.
And That Was When We Just Worked Internally…
• The volumes of data are exploding • The ability to control and dictate in
an ‘outside-in’ world is minimal • More and more business value is
beyond the core transactions • The old approach of ‘a single view’ is
impossible in a world of federated internal and external data
Core transactions
9 © Copyright 2014 EMC Corporation. All rights reserved.
Remember…
Culture eats strategy for breakfast. – Peter Drucker
10 © Copyright 2014 EMC Corporation. All rights reserved.
How Do Pivotal & Capgemini Deliver the Business Data Lake
Govern where it matters
Capgemini’s Information governance approach " MDM & RDM data integrated " Information RADAR approach to identification
Encourage local requirements
" HAWQ – Traditional disk-based structured SQL " Pivotal GemFire XD – Fast in-memory database " Pivotal GemFire XD – Real-time analytics and integration
Distill on demand " HAWQ " Structured SQL on Pivotal HD " Pivotal Data Dispatch " Data movement and transformation
Store everything " Pivotal HD " Low cost " Simplified deployment
Save 80% on Data Storage
Compress the time to
value
Sell to the business and IT
Capgemini’s end to end
value
11 © Copyright 2014 EMC Corporation. All rights reserved.
What Does This Mean?
HD
FS
Load everything
Keep the history
Business driven North America
operations Marketing campaign
EMEA data mart
Distill
HAW
Q
Transactional systems
CRM PLM ERP Sensor Network Web Social Media Market Supplier
12 © Copyright 2014 EMC Corporation. All rights reserved.
Business driven
Customers
Orders
Inventory
Customers
Campaign
Contract
Customers
Orders
Invoices
What Does This Mean?
HD
FS
Load everything
Keep the history
Distill
HAW
Q
Transactional systems
CRM PLM ERP Sensor Network Web Social Media Market Supplier
13 © Copyright 2014 EMC Corporation. All rights reserved.
What Does This Mean?
Business driven
Customers
Orders
Inventory
Customers
Campaign
Contract
Customers
Orders
Invoices
Distill
HAW
Q
HD
FS
Load everything
Keep the history
Transactional systems
CRM PLM ERP Sensor Network Web Social Media Market Supplier
Information governance MDM and RDM
The need to share We need a global view
on customers
Customers Customers Customers
Customer The global view Revenue
14 © Copyright 2014 EMC Corporation. All rights reserved.
Why the Business Data Lake Succeeds
Div
isio
n 1
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
K
PIs
Div
sion
al K
PIs
Corporate
Div
isio
n 2
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Div
isio
n 3
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Div
isio
n 4
Sales
Finance
Supply chain
Marketing
R&D
Pers
onal
KPI
s
Div
sion
al K
PIs
Corporate KPIs
Now agree
where it counts
15 © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier System monitoring System management
Unified Data Management Tier Data mgmt.
services MDM RDM
Audit and policy mgmt.
Processing Tier
Workflow management
Distillation Tier
HDFS storage Unstructured and structured data
In-memory MPP database
Real-time
Micro batch
Mega batch
SQL NoSQL
SQL MapReduce
Query interfaces
SQL
Sources Action Tier
Real-time ingestion
Micro batch ingestion
Batch ingestion
Real-time insights
Interactive insights
Batch insights
16 © Copyright 2014 EMC Corporation. All rights reserved.
How the Business Data Lake Works Structured tier
* SDH = Source Data History
Structured data tier
Business mart LOB Ad-hoc analytics LOB analytics hub
Business mart model LOB Ad-hoc analytics model LOB analytics Model
All data loaded ‘as is’ from sources with history automatically added
LOB creates their model
Maps their model to the sources
Source
Distillation tier
Map Map Map Map Map Map Map Map
Data storage
Source Source Source Source Source Source Source
SDH SDH SDH SDH SDH SDH SDH SDH
17 © Copyright 2014 EMC Corporation. All rights reserved.
How the Corporate View Works
Local view
Corporate standards
Master data and reference data
Corporate view
Customer x-ref
Customer MDM
Invoices Orders
Customer
Invoices Orders
BU1
Info
rmat
ion
gove
rnan
ce
BU2 BU3
Customer
Invoices
Orders
Customer
Invoices
Orders
Customer
Invoices
Orders
18 © Copyright 2014 EMC Corporation. All rights reserved.
The New Philosophy
Business Data Lake
Store everything
Encourage local
Govern only the common
Treat global as a local view
It’s all about insight at the point of action
19 © Copyright 2014 EMC Corporation. All rights reserved.
Call to Action • Learn More about the Business Data Lake:
- http://www.gopivotal.com/big-data/businessdatalake
• Learn about Capgemini’s capabilities - http://www.capgemini.com/big-data-analytics/business-data-lake
• Partners can get involved at http://www.gopivotal.com/partners
• Visit the EMC booth to discover how the EMC Federation of Companies helps drive the Data Lake
• Follow Us on Twitter! - Michael - @aBitCloudy - Steve - @mosesjones