webinar -data warehouse augmentation: cut costs, increase power
TRANSCRIPT
Data Warehouse Augmentation Cut Costs, Increase Power
October 26, 2016
• Award-winning provider of enterprise data lake management solutions:
Integrated data lake management platform
Self-service catalog and data preparation
• Data Lake Design and Implementation Services: POC, Pilot, Production, Operations, Training
• Data Science Professional Services
3 Zaloni Proprietary
About our speakers
Pradeep Varadan, Verizon Wireline, OSS Data Science Leader
Varadan is a data scientist and enterprise architect who specializes in data challenges within telecommunications. He is tasked with providing a competitive edge focused on utilizing data analytics to drive effective decision-making. He is skilled in creating systems that can be used to understand and make better decisions involving rapid technology shifts, customer lifestyle and behavior trends and relevant changes that impact the Verizon Network.
Scott Gidley, Zaloni, VP Product Management
Gidley is responsible for the strategy and roadmap of existing and future products within the Zaloni portfolio. He is a nearly 20 year veteran of the data management software and services market. Prior to joining Zaloni, he served as senior director of product management at SAS and was previously CTO and cofounder of DataFlux Corporation.
Zaloni Confidential and Proprietary - Provided under NDA
4 Zaloni Proprietary
Current state of a corporate data flow architecture
BI/ReportingData Generators
Machines
Data ChannelsWarehouses
MartsRepositories
Data stores
4 Zaloni Proprietary
5 Zaloni Proprietary
Business Challenges:• Increased processing
time/reduced response• Lack of data lineage/lack of
visibility• Constant CapEx for hardware
upgrade• Lack of access to history
Key Challenges
IT Challenges:• Multiple data transfers• Multiple technology platforms
with data copies• Constant performance tuning
for CPU• Manual data offload for space
management
Zaloni Confidential and Proprietary - Provided under NDA
6 Zaloni Proprietary
Sources ETL Report Mart
Data DiscoveryAnalytics BI
ELT/Reporting/MiningETL
Resource consumption
Staging Warehouse
6 Zaloni Proprietary
Zaloni Confidential and Proprietary - Provided under NDA
7 Zaloni Proprietary
Typical utilization of RDBMS resources
We expend almost all CPU for low business value ETLBusiness Value
CPU
ETL to Stage
Auditing(Landing tables query)
Data Mining (Staging query)
Ad-hoc Analysis(Warehouse query)
ETL to Warehouse
ETL to Reporting
Reporting (Presentation table query)
*Size indicates frequency of use
7 Zaloni Proprietary
Zaloni Confidential and Proprietary - Provided under NDA
8 Zaloni Proprietary
~80% of system capacity used for batch processing (ELT)
8 Zaloni Proprietary
Zaloni Confidential and Proprietary - Provided under NDA
9 Zaloni Proprietary
Reduce cost of ELT/ETL by offloading to Hadoop
9 Zaloni Proprietary
Zaloni Confidential and Proprietary - Provided under NDA
10 Zaloni Proprietary
The future of enterprise data flowFu
ture
10 Zaloni Proprietary
Lega
cy
Structured Data ETL EDW+Sandbox BI/ReportingData MartsTransactionalSystems
Machine logs/IOT
Structured/ Unstructured
Data Lake
Mod
ern
T-SystemsMachines ETL Sandbox EDW BI/Reporting/
AnalyticsData Marts
Operational Dashboards/EDA/Mining/Reporting/Analytics
TransactionalSystems
EDW Data Marts ETL SandboxETL
11 Zaloni Proprietary
Increased Agility
New Insights
Improved Scalability
Data lakes are central to the modern data architecture
12 Zaloni Proprietary
Data lake challenges
• Ingestion
• Visibility and Quality
• Privacy and Compliance
• Timeliness
• Reliance on IT
• Reusability
• Rate of Change
• Skills Gap
• Complexity
Managing: Delivering:Building:
Zaloni Confidential and Proprietary - Provided under NDA
13 Zaloni Proprietary
Data Lake 360 ° : A holistic approach to actionable big data
1. Enable the lake 2. Govern the
data
3. Engage the business
• Foster a data-driven business through self-service data discovery and preparation
• Safeguard sensitive data and enable regulatory compliance
• Improve data visibility, reliability and quality to reduce time-to-insight
• Leverage the full power of a scale-out architecture with an actionable, scalable data lake
14 Zaloni Proprietary
• Managed Ingestion Ability to ingest vast amounts of data Ability to handle a wide variety of formats
(streaming, files, custom) and sources Build in repeatability through automation to pick up incoming
data and apply pre-defined processing
• Metadata Management Capture and manage operational, technical and business
metadata Provides visibility and reliability – key to finding data in the
lake Reduced time to insight for analytics File and record level watermarking provides data lineage,
enables audit and traceability
Enable the lake
15 Zaloni Proprietary
Govern the data• Data Lineage
See how data moves and how it is consumed in the data lake.
Safeguard data and reduce risk, always knowing where data has come from, where it is, and how it is being used.
• Data Quality Rules based Data validation Integration with the Managed Data Pipeline Stats and metrics for reporting and actions
16 Zaloni Proprietary
Govern the data• Data Security and Privacy
Differing permissions require enhanced data security Mask or tokenize data before published in the lake for
consumption Policy-based security
• Data lifecycle management across tiered storage environments
Hot -> Warm -> Cold on an entity level based on policies/SLAs
Across on-premise and cloud environments Provide data management features to automate scheduling
and orchestration of data movement between heterogeneous storage environments
Zaloni Confidential and Proprietary - Provided under NDA
17 Zaloni Proprietary
Engage the business
• Data Catalog See what data is available across your enterprise Contribute valuable business information to
improve search and usage Use a shopping cart experience to create sandbox
for ad-hoc and exploratory analytics
• Self-service Data Preparation Blend data in the lake without a costly IT project Perform interactive data-driven transformations Collaborate and share data assets and
transformations with peers
Zaloni Confidential and Proprietary - Provided under NDA
18 Zaloni Proprietary
Data lake reference architecture
• Data required for LOB specific views - transformed from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies• Consumers are anyone with appropriate role-based access• Single version of truth
TransientLanding Zone Raw Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of source data
• Consumers are IT, Data Stewards
• Implemented in highly regulated industries
• Original source data ready for consumption
• Consumers are ETL developers, data stewards, some data scientists
• Single source of truth with history
• Data required for LOB specific views - transformed from existing certified data
• Consumers are anyone with appropriate role-based access
Sensors (or other time series data)
Relational Data Stores
(OLTP/ODS/DW)
Logs(or other unstructured
data)
Social and shared data
16 Zaloni Proprietary
19 Zaloni Proprietary
Data lake reference architecture with ZaloniConsumption ZoneSource
System
File Data
DB Data
ETL Extracts
Streaming
TransientLanding
ZoneRaw Zone
Refined Zone
Trusted Zone
Sandbox
APIs
MetadataManagement
Data Quality Data Catalog Security
Data Lake
Business AnalystsResearchers
Data Scientists
DATA LAKE MANAGEMENT &
GOVERNANCE PLATFORM
Sensors (or other time series data)
Relational Data Stores
(OLTP/ODS/DW)
Logs(or other unstructured
data)
Social and shared data
EDWData Marts
20 Zaloni Proprietary
• Save millions in storage costs• Significantly speed up processing• Maximize the data warehouse for BI• Extract more value from all of your data
Four great reasons to augment with a data lake
21 Zaloni Proprietary
Centralized data, decentralized access
Business Analyst Business Manager Data Scientist Business SMEWhat
happened?What is
happening? What will happen? What can we control? Can I see the data?
IT Team
BusinessUsers
IT Analyst Programmer DBA/Modeler Data Scientist Data Engineer
Data Lake
Code Analysis App Implementation
App PrototypeData ModelCode Development
Operations Manager
Questions?
DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM
SELF-SERVICE DATA PREPARATION