ten tools for ten big data areas 01 informatica

23
Informatica Overview Ten Tools for Ten Big Data Areas Series 01 Big Data Integration www.sparkera.ca

Upload: will-du

Post on 16-Apr-2017

1.187 views

Category:

Internet


2 download

TRANSCRIPT

Page 1: Ten tools for ten big data areas 01 informatica

Informatica OverviewTen Tools for Ten Big Data Areas

Series 01 Big Data Integration

www.sparkera.ca

Page 2: Ten tools for ten big data areas 01 informatica

2

Ten Tools for Ten Big Data Areas – Overview

© Sparkera. Confidential. All Rights Reserved

10 Tools10 Areas Data Warehouse

Data Platform

Data Bus

Programm

ing

Data A

nalyti

cs

Sear

ch a

nd In

dex

Visualization

Data Integration

Streaming

Data

base

First ETL fully on Yarn

Data storing platformData computing platform

SQL & Metadata

Visualize with just few clicks

Powerful as JavaSimple as Python

real-time streamingMade easier

Yours Google

Lightning-fast cluster computing

Real-time distributed data store

High throughput distributed messaging

Page 3: Ten tools for ten big data areas 01 informatica

3

Agenda

© Sparkera. Confidential. All Rights Reserved

About data integration

2 About Informatica company and its approach

3 Informatica architecture, client, server components, developer tool overview

4 Informatica why and why not

5 Informatica job trend

1

Page 4: Ten tools for ten big data areas 01 informatica

Little About DI – Data Integration

• DI involves combining data residing in different sources and providing users with a unified view of these data.

• DI process is also called Enterprise Information Integration (EII).

• DI usually means ETL - data extract, transformation, load.

• 80% of enterprise data projects' efforts are spent on DI work.

• Data cleansing, audit, master data management are usually considered with DI.

© Sparkera. Confidential. All Rights Reserved

Page 5: Ten tools for ten big data areas 01 informatica

About Informatica Company

• Found in 1993• 2014 revenue – US$1.05 billion• Average growth rate 17% per year• Employee – 5500+• Customers – 5000• Value customer covers up to 70% of global top 500 company • Partners – 500+• Cover various business, industries and government organizations

including telecommunications, health care, financial and insurance services.

• A company dedicate on data integration and management• Bought out as private company on August 2015.

© Sparkera. Confidential. All Rights Reserved

Page 6: Ten tools for ten big data areas 01 informatica

The Tradition Approach

Application Database Partner Data

SWIFT NACHA HIPAA …

Cloud Computing Unstructured

87% of enterprises use hand-coding for data integration

75% of enterprises reported increased maintenance costs

Data Warehouse

DataMigration

Test DataManagement& Archiving

Master DataManagement

Data Synchronization B2B Data

ExchangeData

ConsolidationComplex

EventProcessing

UltraMessaging

© Sparkera. Confidential. All Rights Reserved

Page 7: Ten tools for ten big data areas 01 informatica

The Informatica Approach

Application Partner Data

SWIFT NACHA HIPAA …

Cloud Computing UnstructuredDatabase

Data Warehouse

DataMigration

Test DataManagement& Archiving

Master DataManagement

Data Synchronization

B2B DataExchange

DataConsolidation

ComplexEvent

ProcessingUltra

Messaging

© Sparkera. Confidential. All Rights Reserved

Page 8: Ten tools for ten big data areas 01 informatica

Informatica Latest Products v9.6

• Data Integration PowerCenter PowerExchange

• Master Data Management

• Cloud Integration

• Big Data BDE – Informatica Developer Big data parser

© Sparkera. Confidential. All Rights Reserved

Page 9: Ten tools for ten big data areas 01 informatica

Informatica PowerCenter Overview

• An ETL tool ( Extract, Transform and Load)

• The main advantages over other ETL tools lies in its robustness, across OS, and high performance.

• It can read from a variety of different sources and write to as many targets, while transforming data in between.

• The architecture design use SOA concept for better extensibility and high availability

• Single sign on access, built-in version control, GUI development, built-in schedule and monitoring

© Sparkera. Confidential. All Rights Reserved

Page 10: Ten tools for ten big data areas 01 informatica

Informatica PowerCenter Architecture

© Sparkera. Confidential. All Rights Reserved

Page 11: Ten tools for ten big data areas 01 informatica

Informatica PowerCenter Client Component

• Repository Manager – meta data management

• Designer – Tool to build mapping for ETL logic

• Workflow Manager – Tool to build/run session and workflow

• Workflow Monitor – Tool to monitor job running

• Administration Console (browser based) - administration

© Sparkera. Confidential. All Rights Reserved

Page 12: Ten tools for ten big data areas 01 informatica

Repository Manager

Navigate through multiple folders and repositories, export & import, user & folder management

© Sparkera. Confidential. All Rights Reserved

Page 13: Ten tools for ten big data areas 01 informatica

Designer

Create and debug mapping & maplet including source, target, transformations for core ETL logic.

© Sparkera. Confidential. All Rights Reserved

Page 14: Ten tools for ten big data areas 01 informatica

Workflow Manager

Create, schedule, and run session, workflow, worklet wrapping mapping.

© Sparkera. Confidential. All Rights Reserved

Page 15: Ten tools for ten big data areas 01 informatica

Workflow Monitor

Monitor running statistics and control execution of workflows.

© Sparkera. Confidential. All Rights Reserved

Page 16: Ten tools for ten big data areas 01 informatica

Administration Console

Monitor and manager various of Informatica service, licenses, etc.

© Sparkera. Confidential. All Rights Reserved

Page 17: Ten tools for ten big data areas 01 informatica

Informatica PowerCenter Server Components

• Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables.

• Integration service: The Integration service runs sessions and workflows.

• Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services.

• Informatica service: Overall service management and coordination

© Sparkera. Confidential. All Rights Reserved

Page 18: Ten tools for ten big data areas 01 informatica

Informatica Big Data Edition Overview

Extract, load, and transform with big data ecosystem.

© Sparkera. Confidential. All Rights Reserved

Page 19: Ten tools for ten big data areas 01 informatica

Informatica BDE Component - Developer

BDE is all in one tool and can fully push job running on Hadoop

Developer component• Mapping – Tool to build mapping for ETL logic• Maplet – Reusable mapping• Workflow – Tool to build workflow• Application – Tool to deploy mapping/workflow

Others• Monitoring Console (browser based) – job monitoring• Administration Console (browser based) - administration

© Sparkera. Confidential. All Rights Reserved

Page 20: Ten tools for ten big data areas 01 informatica

Why Informatica Product

• Proven technology leadership• A track record of continuous innovation• The most neutral trusted partner – very focus• Long history of customer success• Over 5000+ industry leaders relies on Informatica• Major banks, telecom, insurance, energy, health, research

companies are using Informatica in Toronto • Easy and popular to use• Pull push job to Hadoop• Connector for many kinds of source• Performance and reliability

© Sparkera. Confidential. All Rights Reserved

Page 21: Ten tools for ten big data areas 01 informatica

Side Effect - When May Not To

• High price: 150K+ to start

• Get challenges from ELT – Leverage database for transformation. Need investment on ETL server. Its push to database optimization has limitations.

• Schedule, monitoring, and version control functions are limited

• BDE is relative new although the concept is great

• Alternatives - MS SSIS, Talend Studio, Pentaho Data Integration

© Sparkera. Confidential. All Rights Reserved

Page 22: Ten tools for ten big data areas 01 informatica

Informatica Job Trends

Level Junior Level(20%)

Middle Level(40%)

Expert Level(40%)

Position ETL developerInformatica dev.DW developer

Sr. ETL developerData SpecialistETL specialistETL designerETL Admin

Big data ETL dev.BDE developerInformatica architectInformatica consultant

Tool PowerCenter Informatica Developer

Other

UsagePercentage

80% 10% 10%

© Sparkera. Confidential. All Rights Reserved

Page 23: Ten tools for ten big data areas 01 informatica

www.sparkera.ca

BIG DATA is not only about data, but the understanding of the data and how people use data actively to improve their life.