sören eickhoff, informatica gmbh, "informatica intelligent data lake – self service for data...

26
Informatica Intelligent Data Lake Self Service for Data Analysts Februar, 2017 Sören Eickhoff Sales Consultant Central Europe [email protected]

Upload: dataconomy-media

Post on 12-Apr-2017

95 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Informatica Intelligent Data Lake Self Service for Data Analysts

Februar, 2017

Sören EickhoffSales Consultant Central [email protected]

Page 2: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Security

Cloud DataManagement

Big DataManagement

Data Integration

Master Data Management Data Quality

#1 in 6 Data Categories …

Page 3: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Platform

Data Lake

Use Case: Data Lake / Data Platform Reference Architecture

Landing ZoneStructured and unstructured enterprise and external data is landed in its raw

form, normalized and ready for use

Data AnalystData Scientist BusinessData StewardData Modeler Data Engineer

Discovery ZoneUser sandbox for self-serve access to data for exploration, data blending,

hypothesis testing, analytics, and collaboration

Production ZoneSanitized transactional, master, and reference data & enriched data models

certified for enterprise use

Machine Device, Cloud

Documents and Emails

Relational,

Mainframe

Social Media, Web

LogsImprove

Predictive Maintenance

Increase Operational Efficiency

Increase Customer

Loyalty

Reduce Security Risk

Improve Fraud

Detection

Page 4: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

• Can’t easily find trusted data

• Limited access to the data

• Frustrated by slow response from IT due to long backlog

• Constrained by disparate desktop tools, manual steps

• No way to collaborate, share, and update curated datasets

• Can’t cope with growing demand from the business

• No visibility into what the business is doing with the data

• Struggling to deliver value to the business

• Loosing the ability to govern and manage data as an asset

Challenges Faced by the Business and IT Today

ITData Analysts

Page 5: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Informatica Data Lake Management

Data Lake Management

Enterprise Information

Catalog

IntelligentData Lake

Secure@Source

TITANBlaze

Big Data Management

Intelligent Streaming

Live Data Map(metadata integration)

Big Data Management(data integration)

Data Architect / Steward

Data Scientist / Analyst

InfoSec Analyst Data Engineer

Page 6: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Unified view into enterprise information assets

• Business-user oriented solution

• Semantic search with dynamic facets

• Detailed Lineage and Impact Analysis

• Business Glossary Integration

• Relationships discovery

• High level data profiling

• Automatic Classifications with Data domains

• Business classifications with Custom Attributes

• Broad metadata source connectivity

• Big data scale

Enterprise Information Catalog

Page 7: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Self-service data preparation with collaborative data governance• Collaborative project workspaces

• Automated data ingestion

• Search data asset catalog

• Rapid blend of datasets

• Crowd-sourced data asset, tagging & data sharing

• Automated data asset discovery & Recommendations

• Rapid ‘industrialization’ of preparation steps into re-usable workflows

• Complete tracking of usage, lineage, and security

• Easily support Data Discovery Platforms

Intelligent Data Lake

Page 8: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Enterprise-wide visibility into sensitive data risks

• Sensitive data classification & discovery

• Sensitive data proliferation analysis

• Who has access to sensitive data

• User activity on sensitive data

• Sensitive Data policy-based alerting

• Multi-factor risk scoring

• Identification of highest risk areas

• Integrates data security information from 3rd parties:

- Data stores, owner, classification

- Protection status

- User access info (LDAP, IAM) and activity logs (DB, Hadoop, Salesforce, DAM)

Secure@Source

Page 9: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Easily integrate more data faster from more data sources Big Data Management

Smart Executor

Informatica Big Data Management

ETL/DI Servers

Informatica Data

Transformation Engine on dedicated DI

servers

Data Connectivi

ty

Data Integratio

nData

MaskingData

Quality Data

Governance

YARNHDFS

Map Reduce

Hive on Map

Reduce

Tez Spark

CoreCluster Aware

HiveOnTez

Spark Blaze

Hadoop Cluster

• Visual development interface accelerates developer productivity

• Near universal data connectivity

• Complex data parsing on Hadoop

• Data profiling on Hadoop

• High-speed data ingestion and extraction

• Process and deliver data at scale on Hadoop

• Dynamic schemas and mapping templates

• Data Quality and Data Governance on Hadoop

Page 10: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Take Big Data Management to the Next LevelImproving developer productivity – Dynamic Mappings Re-use PowerCenter & SQL Logic

Automatically profit from new technologies and choose best option - Smart Optimizer

MapReduceSpark

Blaze

Generic source Generic targetRule based logic

Page 11: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Informatica Intelligent Streaming

• Streaming analytics capability into the Intelligent Data Platform

• Unified UI with multiple engines underneath the covers

• Frictionless integration conversion/extension of batch mappings into streaming context

• Abstracted from runtime framework

Collect, ingest and process data in realtime and streaming

Realtimesource

Realtimetarget

Windowtransformation

Spark Streamingcode generated

Page 12: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Intelligent Datalake – Deep Dive

12

Page 13: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

DataAnalyst / Scientist

Who?

Prepare & Publish

Search & Discover

Share and Collaborate

Intelligent Data Lake

Page 14: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

How?

Applications & Databases Internet of Things

3rd Party Data

Data Modeling Tools BI Tools CustomCloud

Data Access & Metadata Connectivity

Intelligent Metadata FoundationCatalog ClassifyIndex Data Lineage

Data Relationships

Smart Domains

Data Profile

Data Discovery & Analysis Process

Recommend

Discover Collaborate

Publish

Operationalize/Monitor

Prepare

Data Analyst / Scientist

Intelligent Data Lake

Page 15: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Asset - Data you work with as a unit

Project    - A project contains     data assets and worksheets.

Recipe  - The steps taken to prepare      data in a worksheet.

Data Publication - the process of making prepared data available in the data lake

Data Preparation - The process of combining, cleansing, transforming, and structuring data from one or more data assets so that it is ready for analysis.

TerminologyIntelligent Data Lake

Page 16: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Search and DiscoveryData discovery through a powerful search engine to find relevant data

Semantic search

Fact filtering by asset, resource Type, latest , size, custom attributes…

Page 17: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Asset OverviewOverview with asset attributes and integrated profiling stats

Asset attributes collected from the source system

Asset attributes enriched by users to add business context

Column profiling stats including Null/Unique/Duplicate percentages, Inferred data types and data domains.

Details stats include value and pattern distributions

Add data asset To Project from any exploration views

Page 18: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Business Glossary Integration

View Business Glossary Assets like Terms, Policies and Categories in the Catalog

View and navigate to related technical and business assets in the catalog

Page 19: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data LineageInteractively trace data origin through summarized lineage views for analysts

Use Lineage and Impact Sliders to drill down to desired lineage levels on either side of the seed object.

Page 20: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Relationship ViewShows ecosystem of the asset in the enterprise based on association to other assets

Get a 360 Degree View of data asset using the relationship view. Includes related tables, views, domains and reports, users etc.

Ability to Zoom, find specific assets in the view and filter by asset types

Expand relationship circles to get more details on relationship types and objects.

Page 21: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Preparation continued…Excel-based data preparation on Sample data

New formula definition with type-ahead

Large number of functions available for all types of data string, numeric, date, statistical, Math etc.

Advanced functionality such as Join, Merge, Aggregate, Filter, Sort etc.

New values are calculated and shown right away

Page 22: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data Preparation continued…Excel-based data preparation on Sample data

Column level summary

Column value distributions

Column level Suggestions

Data preparation steps captured as “Recipe”

Page 23: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Data PublicationExecution of data preparation steps on actual data using Infa mapping

Publish the output of data preparation steps back to the lake

Recipe steps are translated into Informatica mapping

Informatica mapping is handed over to BDM platform for execution on actual data sources

BDM platform uses either Map/Reduce or Blaze or Spark to execute the mapping

Mapping is available to the ETL specialists to open in Informatica Developer tool to operationalize

Users credentials are used to access the underlying database.

Page 24: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Organizations need ONE solution that helps them…

Easily Find & Catalog Data &

Discover Relationships

Rapidly Prepare & Share Data ExactlyWhen it is Needed

Get instant Access to Trusted &

Secure Data for Advanced Analytics

Ingest, Cleanse, Integrate & protect data at scale

Page 25: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Forrester Wave™: Big Data Fabric, Q4 ’16

Page 26: Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self Service for Data Analysts"

Questions ?