Data LakeBUILDING AGILE BIGDATA ANALYTICS PLATFORM
Rama KattungaSystems DirectorEnterprise Analytics
About me
Experience Worked @
System Director | Enterprise Analytics
Big Data Strategies Analytics Playing with petabytes is passion Currently building a unified and unique data
platform for healthcare
“Culture eats Strategy for …..”
Culture is todays’ major performance differentiator
Culture is the foundation for the strategy
What is Data Lake?
Data Lake
Ecosystem
“a place to store practically unlimited
amounts of data of any format, schema and
type that is relatively inexpensive and
massively scalable”
"If you think of a datamart as a store of bottled water –
cleansed and packaged and structured for easy
consumption – the data lake is a large body of water in a
more natural state. The contents of the data lake stream in
from a source to fill the lake, and various users of the lake
can come to examine, dive in, or take samples.“
- James Dixon, Pentaho CTO
Water packaging
CRM ERP Finance
ETLBusiness Area 1
CRM ERP Finance
ETLBusiness Area 2
CRM ERP Finance
ETLSingle Source of Truth
Business Area 3
Today’s Model: Traditional Extract Transform Load
BusinessIT Supported
IT Pro
End Users
Extract
Existing Data LOB Applications
FilesData Marts
Data
Quality
Analysis Reports Dashboards &
Scorecards
Provision
Analysis
Cubes
Data Warehouse
Transform &
Load
Spreadsheets
Specialized Tools
6-9
Mo
nth
s
Change
$$$
3-6
Mo
nth
s
Satisfaction Low
Data Marts
High cost of rework
Water packaging
CRM ERP Finance
ETLBusiness Area 1
CRM ERP Finance
ETLBusiness Area 2
CRM ERP Finance
ETLSingle Source of Truth
Business Area 3
CRM ERP EMR
LOB CORPORATE
Local
Data
LOB
MartEDW
Transactional Systems
Managed
Self-ServiceProduction
End Users &
Business
Extract & Load
LOB Applications
FilesData Marts
Data
Quality
Analysis Reports Dashboards &
Scorecards
Provision
Analysis
Cubes
Data Warehouse
Transform
Rapid Experiment
POC
PilotIT Support
Iterate
Transform
KeepKill
IT Pro / IT Supported
Spreadsheets
Specialized Tools
Ad Hoc
Go
ve
rna
nc
e &
Da
ta S
tew
ard
ship
Requirements
Common Platform
Better approach: ETL EL, iterate then T
Data Lake
Where can we use Data Lakes?
Ingestion challenges with Data sources like EMR systems
No data left behind
Schema on read
Scaling
Reduction of costs due to data movement
Challenges with Data Lakes?
Not a silver bullet
Not a replacement to Information Governance
Frustrating to business users if there is no schema, Metadata,
Size of the data
Security and data privacy
1 2
ONESource Reference Architecture
EMR &
Revenue Cycle
ERP
Decision Support
External and
Benchmark Data
Quality Data
Patient
Experience
ACO
ImagesSystems Management
Data Governance
Dashboards
Standard
reporting Adhoc Reporting
Predictive
Analysis
Advanced
Visualization
Business Intelligence Platform
FINANCE APPS
Clinical APPS
ACO
HIE
Hadoop Distributed File System
SQL in Hadoop SQL in Hadoop.
. . .
. .
Compute & Storage
. Compute & Storage
SQL in Hadoop SQL in Hadoop.
Data Lake Platform
Security Management
Data Quality
Project DUMP iT
Source Systems
Clinical
Operational
Financial
Supply Chain
HR
Accounting
Sales and Mktg.
IT Analytics
Population
PHP
PMG
DATA NORMALIZATION
Data Quality Management
Reference Data Management
Data Quality Rules Engine
Business Rules Engine
Data Policy Management
Business and Data Definitions
Business and Data Traceability
Hierarchy Management
Enterprise Data Model
Data Enrichment
Patient Portal
Physician Portal
FIREWALL
Information Hub
CDR
Devices
Da Vince
Robots
Portals
1 3
Thank You !
Rama.Kattunga@gmail