oil and gas big data edition

25
Big Data and The Informatica Platform 9/8/2015 David Ramirez Senior Solution Architect Oil and Gas Accounts

Upload: mark-kerzner

Post on 06-Jan-2017

660 views

Category:

Technology


1 download

TRANSCRIPT

Big Data andThe Informatica Platform9/8/2015

David RamirezSenior Solution Architect Oil and Gas Accounts

About Informatica

• Founded: 1993 INFA Nasdaq• 2014 Revenue: $1.2b• Partners: 450+

• Major SI, ISV, OEM and On-Demand Leaders

• Customers: 5,000+• > 70% of the Global 500• Customers in 82 Countries• Direct Presence in 26 Countries• # 1 in Customer Loyalty Rankings (7

Years in a Row)

2

B2B Data Exchange

Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration.

Cloud Data IntegrationEnterprise Data Integration

Complex Event Processing

Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint.

Ultra Messaging

In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market.

Data Quality Master Data Management

Application ILM

Proven Technology Leadership

3

Problem: • Analytics teams spend most

of their time looking for and preparing data not analyzing it

• Impacts project delays, cost overruns, missed opportunities

Data Lake Solution• A single place to manage the

supply and demand of data

• Converts raw big data into fit-for-purpose, trusted, and secure information

Intelligent Data LakeManage Supply & Demand of Data

80% of the work in big data projects is data intelligence

“I spend more than half my time integrating, cleansing, and

transforming data without doing any actual analysis.”

“80% of the work in any data project is in cleaning the data”

“70% of my value is an ability to pull the data, 20% of my

value is using data-science…”

Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012

First Pilot(s)

Data Warehouse Optimization

Data Discovery

Real-Time Operational Intelligence

Escalating return on data

Lower operational IT costs

Big Data Analytics

Operationalize Big Data Insights

Predictive Maintenance

Lower Total Cost of Care

Customer X/Up-Sell

Public Safety

Fraud Detection

Machine Device, Cloud

Documents and Emails

Relational, Mainframe

Social Media, Web Logs

Driv

en b

y IT

D

riven

by

Bus

ines

s

Lower Infrastructure Cost Added Business Value

What’s Hadoop?

Intelligent Data Lake

Intelligent Data LakePlatform for Big Data Projects

Informatica knows the Data LifecycleRelated Challenges

Source:- Gartner

InformaticaPlatform

DataIngestion

Refinement

Mastery/Delivery

Data Security

DataRetirement

• Data Quality•Exception Management

• Any Platform, Appication•Structured, Unstructured•Any latency

• Master Data Management• Data Integration Hub

• Data Archive•Records Retention/Discovery•Data Masking

Informatica Platform Overview

RelationalDB

.pdf, email,

email

Dev

Test

Prod Archive

ILM-Archive

3. Analyze

1. Profile 2. Define Targets

5. Monitor

4. BuildRules

DATA QUALITY

SECURITY

ETL

PowerCenter

MDM

Multi-Domain

Informatica Data Quality

MaterialsWellhead CustomerCustomer

CustomerWellhead

WellheadMaterials

Materials

Databases

Unstructured Data

Big Data

Cloud

Visualizations

Application Database Partner Data

SWIFT NACHA HIPAA …

Cloud Computing Unstructured

Data Warehouse

DataMigration

Test DataManagement& Archiving

Master DataManagement

Data Synchronization

B2B DataExchange

DataConsolidation

The Informatica DI PlatformComprehensive, Unified, Open and Economical platform

Data Sources Applications

Data Warehouse

MDM / PIM

Data Ingestion

Visualization

Data Governance

Data Security

Archiving

Replication

Data Streaming

Change DataCapture

Batch Load

Data Virtualization

Event-BasedProcessing

Data Integration Hub

Data Integration & Data Quality

Agile Analytics

Advanced Analytics

Machine Learning

Virtual Data Machine

Data Management Data Delivery

Machine Device, Cloud

Documents and Emails

Relational, Mainframe

Social Media, Web Logs

Mobile Apps

Visualization& Analytics

Real-Time Alerts

Batch Load

Pub / Sub

Data Service

Integrate & Prepare

Loose Coupling & Abstraction

11

DevelopmentAgility

1

Logical Data Objects

PRODUCT …CUSTOMER ORDER

Jumpstart/Accelerate Projects

Data SourceData SourceData Source

1 Instant Business-IT Collaboration with Analyst Tool 2 Profile to Discover Data

Patterns and Issues

3

4

Prototype and ValidateResults

Data Source

Fine-tune and Deploy Desired Solution in Days

Business

IT

IT

Business

Business IT

Business

IT

CommonRepository

Entire Life Cycle Supported by PowerCenter Standard Edition 9.6

13

EnterpriseScalability

2

Scale-up As Your Needs Grow

14

IT

IT

IT

ITHigh Availability

PushdownOptimization

Enterprise Grid

ConcurrentUsers

PartitionedData

IT

Included in PowerCenter Advanced Edition 9.6

15

Manage Metadata for Better Data InsightsData

LineageConsolidated Metadata Catalog

Federated Business Glossary

Mainframe Flat FilesDatabase Data Modeling BI ToolsERP

Metadata Repository

Custom

Metadata Reports

3rd party BI

Metadata Bookmarks

16

Common Biz Language Via Business Glossary

Provide a common vocabulary of business terms

Easily search for glossary assets with workflow

Manage relationships with other assets

Manage business policies governing the assets

Analyst

17

Operational Confidence

3

Improve Operational Confidence With Automated Testing and Monitoring

18

End-to-End Agility

RequirementsGathering

Prototype& Validate Deploy

IT

ITBusiness

IT

IT

Business Satisfied

Business-IT Collaboration

Develop

Business

IT

IT

SelfService

Monitor

IT

Test

IT

Automate Data Validation Testing

Data Validation Testing Capability

Enterprise Data

PowerCenter

Execute Tests

DVO Repository & Warehouse ReportsDatabase

Views

Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float

V_Summary

Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float

V_Tests

Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float

V_Results

Define Tests

DVO Clients

Write Results

Data Accessed

• Relational databases• Flat files• Mainframe data• DW Appliances• Cloud-based data

Proactively Monitor with PowerCenter 9.6

20

PowerCenterWS Hub

Send Alerts to Stakeholders

EnvironnentInformation

Get Operating System, Database Statistics

PowerCenterRepository Automated Monitoring

and Detection (Source Feeds, Rules/Templates, Watchlists, Alerts)

Analyst

IT

IT Operations

Analyst

Configure / Build Rules

1

2

4

Get PowerCenterStatistics

Monitor PowerCenter Operations3

1. Entire Informatica mapping translated to optimal open source project

2. Currently, MapReduce submitted to Hadoop cluster.

3. Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe

MapReduce

UDF

Informatica on HadoopInformatica Execution on Hadoop Architecture

Flink

INFA’s Unified Platform = Strong Time-to-Value

“Informatica and Microsoft are so much more consistent than their competitors [because] the platforms provided by these companies support transferable skills across projects more flexibly than do their rivals.“

TCO – Informatica vs. Hand Coding

Informatica

Hand Coding

$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000

$8,500

$11,500

Average Costs (3-year TCO) per project per end point

Hand coding

Informatica

0 1 2 3 4 5 6

2.4

1

2.4

0.7

5.3

1.2

2.7

0.8

Master Data managementData WarehousingData MigrationApplication Integration

Informatica is Far More Productive than Hand Coding

Source: “ Comparative Costs and Uses for Data Integration Platforms” Bloor Research, March 2014 24

Average Time to Develop by Project Type (Weeks)

Depending on the project hand coding can take more than 4 weeks longer to develop!

• Demo – Data Profiling on Hadoop

https://www.youtube.com/watch?v=Nd6UfuteiTY

Big Data – Data Profiling on Hadoop

25