ten tools for ten big data areas 01 informatica

Post on 16-Apr-2017

1.187 Views

Category:

Internet

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Informatica OverviewTen Tools for Ten Big Data Areas

Series 01 Big Data Integration

www.sparkera.ca

2

Ten Tools for Ten Big Data Areas – Overview

© Sparkera. Confidential. All Rights Reserved

10 Tools10 Areas Data Warehouse

Data Platform

Data Bus

Programm

ing

Data A

nalyti

cs

Sear

ch a

nd In

dex

Visualization

Data Integration

Streaming

Data

base

First ETL fully on Yarn

Data storing platformData computing platform

SQL & Metadata

Visualize with just few clicks

Powerful as JavaSimple as Python

real-time streamingMade easier

Yours Google

Lightning-fast cluster computing

Real-time distributed data store

High throughput distributed messaging

3

Agenda

© Sparkera. Confidential. All Rights Reserved

About data integration

2 About Informatica company and its approach

3 Informatica architecture, client, server components, developer tool overview

4 Informatica why and why not

5 Informatica job trend

1

Little About DI – Data Integration

• DI involves combining data residing in different sources and providing users with a unified view of these data.

• DI process is also called Enterprise Information Integration (EII).

• DI usually means ETL - data extract, transformation, load.

• 80% of enterprise data projects' efforts are spent on DI work.

• Data cleansing, audit, master data management are usually considered with DI.

© Sparkera. Confidential. All Rights Reserved

About Informatica Company

• Found in 1993• 2014 revenue – US$1.05 billion• Average growth rate 17% per year• Employee – 5500+• Customers – 5000• Value customer covers up to 70% of global top 500 company • Partners – 500+• Cover various business, industries and government organizations

including telecommunications, health care, financial and insurance services.

• A company dedicate on data integration and management• Bought out as private company on August 2015.

© Sparkera. Confidential. All Rights Reserved

The Tradition Approach

Application Database Partner Data

SWIFT NACHA HIPAA …

Cloud Computing Unstructured

87% of enterprises use hand-coding for data integration

75% of enterprises reported increased maintenance costs

Data Warehouse

DataMigration

Test DataManagement& Archiving

Master DataManagement

Data Synchronization B2B Data

ExchangeData

ConsolidationComplex

EventProcessing

UltraMessaging

© Sparkera. Confidential. All Rights Reserved

The Informatica Approach

Application Partner Data

SWIFT NACHA HIPAA …

Cloud Computing UnstructuredDatabase

Data Warehouse

DataMigration

Test DataManagement& Archiving

Master DataManagement

Data Synchronization

B2B DataExchange

DataConsolidation

ComplexEvent

ProcessingUltra

Messaging

© Sparkera. Confidential. All Rights Reserved

Informatica Latest Products v9.6

• Data Integration PowerCenter PowerExchange

• Master Data Management

• Cloud Integration

• Big Data BDE – Informatica Developer Big data parser

© Sparkera. Confidential. All Rights Reserved

Informatica PowerCenter Overview

• An ETL tool ( Extract, Transform and Load)

• The main advantages over other ETL tools lies in its robustness, across OS, and high performance.

• It can read from a variety of different sources and write to as many targets, while transforming data in between.

• The architecture design use SOA concept for better extensibility and high availability

• Single sign on access, built-in version control, GUI development, built-in schedule and monitoring

© Sparkera. Confidential. All Rights Reserved

Informatica PowerCenter Architecture

© Sparkera. Confidential. All Rights Reserved

Informatica PowerCenter Client Component

• Repository Manager – meta data management

• Designer – Tool to build mapping for ETL logic

• Workflow Manager – Tool to build/run session and workflow

• Workflow Monitor – Tool to monitor job running

• Administration Console (browser based) - administration

© Sparkera. Confidential. All Rights Reserved

Repository Manager

Navigate through multiple folders and repositories, export & import, user & folder management

© Sparkera. Confidential. All Rights Reserved

Designer

Create and debug mapping & maplet including source, target, transformations for core ETL logic.

© Sparkera. Confidential. All Rights Reserved

Workflow Manager

Create, schedule, and run session, workflow, worklet wrapping mapping.

© Sparkera. Confidential. All Rights Reserved

Workflow Monitor

Monitor running statistics and control execution of workflows.

© Sparkera. Confidential. All Rights Reserved

Administration Console

Monitor and manager various of Informatica service, licenses, etc.

© Sparkera. Confidential. All Rights Reserved

Informatica PowerCenter Server Components

• Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables.

• Integration service: The Integration service runs sessions and workflows.

• Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services.

• Informatica service: Overall service management and coordination

© Sparkera. Confidential. All Rights Reserved

Informatica Big Data Edition Overview

Extract, load, and transform with big data ecosystem.

© Sparkera. Confidential. All Rights Reserved

Informatica BDE Component - Developer

BDE is all in one tool and can fully push job running on Hadoop

Developer component• Mapping – Tool to build mapping for ETL logic• Maplet – Reusable mapping• Workflow – Tool to build workflow• Application – Tool to deploy mapping/workflow

Others• Monitoring Console (browser based) – job monitoring• Administration Console (browser based) - administration

© Sparkera. Confidential. All Rights Reserved

Why Informatica Product

• Proven technology leadership• A track record of continuous innovation• The most neutral trusted partner – very focus• Long history of customer success• Over 5000+ industry leaders relies on Informatica• Major banks, telecom, insurance, energy, health, research

companies are using Informatica in Toronto • Easy and popular to use• Pull push job to Hadoop• Connector for many kinds of source• Performance and reliability

© Sparkera. Confidential. All Rights Reserved

Side Effect - When May Not To

• High price: 150K+ to start

• Get challenges from ELT – Leverage database for transformation. Need investment on ETL server. Its push to database optimization has limitations.

• Schedule, monitoring, and version control functions are limited

• BDE is relative new although the concept is great

• Alternatives - MS SSIS, Talend Studio, Pentaho Data Integration

© Sparkera. Confidential. All Rights Reserved

Informatica Job Trends

Level Junior Level(20%)

Middle Level(40%)

Expert Level(40%)

Position ETL developerInformatica dev.DW developer

Sr. ETL developerData SpecialistETL specialistETL designerETL Admin

Big data ETL dev.BDE developerInformatica architectInformatica consultant

Tool PowerCenter Informatica Developer

Other

UsagePercentage

80% 10% 10%

© Sparkera. Confidential. All Rights Reserved

www.sparkera.ca

BIG DATA is not only about data, but the understanding of the data and how people use data actively to improve their life.

top related