how to use the right tools for operational data integration

28
Attribution-NonCommercial-No Derivative http://creativecommons.org/licenses/by-nc-nd/3.0/us/ How to Use the Right Tools for Operational Data Integration Mark R. Madsen – March, 2009 http://ThirdNature.net

Upload: mark-madsen

Post on 27-Jan-2015

109 views

Category:

Technology


3 download

DESCRIPTION

Webcast on data integration outside the data warehouse in operational contexts and how open source fits in this area. If you want to download the slides or listen to a replay you can find this talk under "How to Use the Right Tools for Operational Data Integration" at http://www.talend.com/webinar/archive/ Detailed Description: Data integration tools were once used solely in support of data warehousing, but that has been changing over the past few years. The fastest growing area today for data integration is outside the data warehouse, whether it's one-time data movement for migrations and consolidations or real-time data synchronization for master data management projects. Data integration tools have proven to be faster, more flexible and more cost effective for operational data integration than the common practice of hand-coding or using application integration technologies. The developer focus of these technologies also makes them a prime target for open source commoditization. During the presentaiton you will learn about the differences between analytical and operational data integration, technology patterns and options, and recommendations for how to begin using tools for operational data integration. During this presentation you will learn: - How to map common project scenarios to integration architectures and tools - The technology and market changes that favor use of tools for operational data integration - The differing requirements for operational vs. analytic data integration - Advantages of open source for data integration tasks

TRANSCRIPT

Page 1: How to Use the Right Tools for Operational Data Integration

Attribution-NonCommercial-No Derivativehttp://creativecommons.org/licenses/by-nc-nd/3.0/us/

How to Use the Right Tools for Operational Data Integration Mark R. Madsen – March, 2009 http://ThirdNature.net

Page 2: How to Use the Right Tools for Operational Data Integration

Slide 2March 2009 Mark R. Madsen

What We’re Asked For

(simulation)

Page 3: How to Use the Right Tools for Operational Data Integration

Slide 3March 2009 Mark R. Madsen

How It Makes Us Feel

Page 4: How to Use the Right Tools for Operational Data Integration

Slide 4March 2009 Mark R. Madsen

How We Want to Feel

Page 5: How to Use the Right Tools for Operational Data Integration

Slide 5March 2009 Mark R. Madsen

Spending Priorities in IT

In 2007 and 2008 this is where the money went…but you can’t do most of these without data integration.

Sources: CIO Insight

Page 6: How to Use the Right Tools for Operational Data Integration

Slide 6March 2009 Mark R. Madsen

Technology Priorities in IT

Data integration moved up to #3 spot for CIOs in 2008

Sources: CIO Insight

Page 7: How to Use the Right Tools for Operational Data Integration

Slide 7March 2009 Mark R. Madsen

The Cost Problem Management Reacts To

Source: IDC

Page 8: How to Use the Right Tools for Operational Data Integration

Slide 8March 2009 Mark R. Madsen

Where We Often Are Today: Point to Point

Databases Documents Flat Files XML Services ERP Applications

Source Environments

Typical scenario:• Disparate data

• Heterogeneous sources

• Point integration

• Minimal reuse

• No tools

Page 9: How to Use the Right Tools for Operational Data Integration

Slide 9March 2009 Mark R. Madsen

The Desired Future State

Databases Documents Flat Files XML Services ERP Applications

Source Environments

“Data as a platform” provides:• Standards-based interfaces

• Single views of disparate source data

• Single point of access / integration

• Reuse of data

Data Platform …but you can’t achieve this by writing more application code

Page 10: How to Use the Right Tools for Operational Data Integration

Slide 10March 2009 Mark R. Madsen

Application Integration

Data Integration

Managing the flow of events

Managing the flow of data and access

Standardizes the transaction or service

Standardizes the data

Tools abstract the transport and system endpoints

Tools abstract the transport, system, representation and manipulation

Must write code at endpoints to manipulate data

Data structure, format and manipulation is abstracted

Focus on code - data as a byproduct

Focus on data - data as the product

Reusable functions, not data

Reusable data, not functions

Application versus Data Integration

Page 11: How to Use the Right Tools for Operational Data Integration

Slide 11March 2009 Mark R. Madsen

Analytic versus Operation Data Integration

Analytic OperationalMost of a BI project’s effort is spent on data integration

Most of an application project is focused on features, not DI

Many disparate sources One or a few sources

Generally unidirectional One-way or bidirectional

Large data volumes Large data volume for some, small volume for others

Usually loaded daily Often loaded more often, varies based on project type

Low concurrency Low to high concurrency

High latency Low to high latency

Page 12: How to Use the Right Tools for Operational Data Integration

Slide 12March 2009 Mark R. Madsen

Architectural Models for Data Integration

Control

Data Access Model

Physical

Virtual

Distributed Centralized

Page 13: How to Use the Right Tools for Operational Data Integration

Slide 13March 2009 Mark R. Madsen

ConsolidationCommon operational DI scenarios where this model is appropriate:

• Migrations• Upgrades• Consolidations• Managing master / reference data

Characteristics:• Large data volumes to move or access• One time data movement• Usually unidirectional• Transformation or cleansing required

Page 14: How to Use the Right Tools for Operational Data Integration

Slide 14March 2009 Mark R. Madsen

PropagationCommon scenarios:

• Copying data that can’t be accessed directly / remotely

• Synchronizing data• Data cross-referencing• Infrequent / one-time extracts

Characteristics:• Can be one-way or bi-directional• Often repetitive data movement• Medium to large data volume (but not

always)

Page 15: How to Use the Right Tools for Operational Data Integration

Slide 15March 2009 Mark R. Madsen

Federation

Common scenarios:• Real-time / low latency data access• Security / regulatory requirements that

prevent copying data• Impractical to create a central

database (e.g. # sources, latency)• Centralized data services

Characteristics:• One-way• Lower data volumes• Higher concurrency

Page 16: How to Use the Right Tools for Operational Data Integration

Slide 16March 2009 Mark R. Madsen

Choosing ModelsThere are some basic criteria and tradeoffs to consider:• Data currency vs. latency• Diversity of data sources• Data cleansing & transformation• Predictability of performance• Access to the same data is

needed via different interfaces• Non-relational sources• Frequency of access• Data volumes• And more…

Page 17: How to Use the Right Tools for Operational Data Integration

Slide 17March 2009 Mark R. Madsen

A Handy Comparison Chart

Physical VirtualData currencyQuery performance / latencyFrequency of accessDiversity of data sourcesDiversity of data typesNon-relational data sourcesTransformation and cleansingPredictability of performanceMultiple interfaces to same dataLarge query / data volumeNeed for history / aggregation

Consolidation ModelCriteria

Page 18: How to Use the Right Tools for Operational Data Integration

Slide 18March 2009 Mark R. Madsen

Three Implementation Choices

•Write code! It’s fun! It’s easy! At first.•Buy proprietary data integration tools•Use available open source tools

Page 19: How to Use the Right Tools for Operational Data Integration

Slide 19March 2009 Mark R. Madsen

Hand-coded IntegrationWhy is this so common?• DI is an afterthought on application projects• It’s just data• It’s hard to justify expensive tools for ODI• Developers and DBAs don’t talk

The market is changing:• Lower tolerance for the high cost of

custom DI development and maintenance• External data challenges• Bad fit for consolidation projects

Products get better over time. Hand-written code gets worse.

Page 20: How to Use the Right Tools for Operational Data Integration

Slide 20March 2009 Mark R. Madsen

Buying Data Integration ToolsBuying is the usual alternative, mostly ETL tools.

• ETL vendors are branching out• Many companies have ETL for BI

But…• Poor fit for propagation and

synchronization tasks• Centralized servers• Licensing costs / problems for

consolidation tasks or broad use

Integration code is single-purpose, tools are multi-purpose. You should always go with tools – when you can afford them.

Page 21: How to Use the Right Tools for Operational Data Integration

Slide 21March 2009 Mark R. Madsen

Use of Tools vs. Hand Coding

0%

10%

20%

30%

40%

50%

60%High Use Medium Use Low Use None

ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI ETL EDR EII EAI

Source: TDWI, 2006

Page 22: How to Use the Right Tools for Operational Data Integration

Slide 22March 2009 Mark R. Madsen

Open Source: End of Buy vs. Build

Open source avoids the pitfalls of coding and gains the advantages of using tools.

• Tools can be distributed with little to no license restrictions

• Application projects budget for features, not glue

• Even basic tools have obvious operational advantages over hand-coding

Why build custom code when there are comparable tools available?

Page 23: How to Use the Right Tools for Operational Data Integration

Slide 23March 2009 Mark R. Madsen

Benefits ReportedAfter your organization adopted open source software, what was the primary benefit of its use?

Source: The 451 Group

31%

31%

15%

10%

7%

4%

3%

Flexibility

Lower cost

Reduced dependence on vendors

Performance

Reliability

Security

Other

Page 24: How to Use the Right Tools for Operational Data Integration

Slide 24March 2009 Mark R. Madsen

A Side Benefit of Flexibility

Comparison of time taken to evaluate tools

Source: Yankee Group

Page 25: How to Use the Right Tools for Operational Data Integration

Slide 25March 2009 Mark R. Madsen

Recommendations

1. Differentiate between analytic data integration and operational data integration

2. Stop hand-coding unless the problem really is trivial, and this includes table replication and DBA SQL scripts

3. Use the right data integration model for the problem

4. Augment existing data integration infrastructure with open source

5. Make open source the default option for data integration tools

Page 26: How to Use the Right Tools for Operational Data Integration

Slide 26March 2009 Mark R. Madsen

Creative CommonsThanks to the people who made their images available via creative commons:red pill blue pill - http://www.flickr.com/photos/rcrowley/2540057217/red pill blue pill2 - http://www.flickr.com/photos/thomasthomas/258931782/happy dog jumping in meadow - http://flickr.com/photos/cenz/16128560/Writing code – http://flickr.com/photos/cdm/72250667/Woodworking – http://flickr.com/photos/rigoletto/126367565/Febo – http://flickr.com/photos/jshyun/1573065713/open_air_market_bologn - http://flickr.com/photos/pattchi/181259150/

Page 27: How to Use the Right Tools for Operational Data Integration

Slide 27March 2009 Mark R. Madsen

Thanks

Page 28: How to Use the Right Tools for Operational Data Integration

Slide 28March 2009 Mark R. Madsen

Creative CommonsThis work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.