when qualitystage is a better etl tool than datastage

8
Create an Account Log In Toolbox for IT Topics Business Intelligence Blogs Tweet 0 0 0 When QualityStage is a better ETL tool than DataStage Vincent McBurney Feb 21, 2008 | Comments (13) QualityStage remains the undiscovered gem in the Information Server suite. I would go so far as to say it's the best stand alone single purchase of the entire suite. Better than DataStage. And it costs the same as DataStage. In fact there is no other single data quality tool that can match it for scope and performance. It runs on a massively scalable parallel architecture, it's got an intuitive GUI design across both ETL and data quality functions, it's got a huge variety of source and targets including native parallel connectivity to Oracle, Teradata, SQL Server and DB2. You might find a combination of products from Trillium, Informatica, Ab Initio or Oracle that could match it but then you would be dealing with more than one product and separate metadata repositories. This week IBM released the QualityStage module for SERP 8.0 for Canada Post address certification and cheaper mailout rates and a new free 900+ page RedBook IBM WebSphere QualityStage Methodologies, Standardization, and Matching. QualityStage 6 and 7 was a bit dodgy - it was all wizard based, it wasn't truly client- server. More like client-MS Access-flat file-server. The best way to use QualityStage was to get DataStage and then shoe horn QualityStage into it using the plugin. These Access and file repositories became a problem in a multi developer environment with lost and conflicting changes and mismatched metadata. It was a bit like decorating a cake with paintball guns. Why QualityStage 8 is Light Years ahead of QualityStage 7 QualityStage 8.0 changed all that. Much like Die Hard 4.0 it's a return to form for the product with the key ingredient being the Designer. Appearance: QualityStage 8.0 is built into the DataStage Designer which is now awkwardly known as the DataStage and QualityStage Designer. This gives you the GUI data flow style interface and it lets you get to all settings via the GUI instead of having to dip into text files. True client-server with job locking, notification and release for multiple developers. New data quality bling bling such as frequency graphs and pass testing. ETL: With this release IBM ripped all the ETL steps out of QualityStage and replaced them with DataStage stages. QualityStage 8 has a subset of DataStage stages plus the data quality stages. It has all the source and target stages, it has the most popular parallel stages of Transformer, Lookup and Join. These are much more efficient and powerful and easier to use than the old QualityStage functions. You get all the bonuses of DataStage: the source connector stages, the parallel framework, the common repository. QualityStage 8 is great standalone but most customers will probably be using it with 1 Recommend Share Your email address FOLLOW BEGIN NOW Work With Me Links Categories Big Data GO Tooling Around in the IBM InfoSphere by Vincent McBurney Vincent McBurney is an IBM Champion for Information Integration and has been blogging for many years on InfoSphere software and ... more Receive the latest blog posts: Share Your Perspective Share your professional knowledge and experience with peers. Start a blog on Toolbox for IT today! If you are an expert in InfoSphere software and want to work for the biggest IBM partner in Australia and New Zealand get in touch with me via ITToolbox or Linked In. Steal This IM Methodology Informatica Data Quality Blog DataFlux Community of Experts Data Governance Blog dq:view - Steve Tuck on Data Quality

Upload: muralikrishna

Post on 02-Oct-2015

13 views

Category:

Documents


2 download

DESCRIPTION

When QualityStage is a Better ETL Tool Than DataStage

TRANSCRIPT

  • Create an AccountLog In

    Toolbox for IT Topics Business Intelligence Blogs

    Tweet 0 0 0

    When QualityStage is a better ETL tool thanDataStage

    Vincent McBurney Feb 21, 2008 | Comments (13)

    QualityStage remains the undiscovered gem in the Information Server suite. I would

    go so far as to say it's the best stand alone single purchase of the entire suite.

    Better than DataStage. And it costs the same as DataStage.

    In fact there is no other single data quality tool that can match it for scope and

    performance. It runs on a massively scalable parallel architecture, it's got an intuitive

    GUI design across both ETL and data quality functions, it's got a huge variety of

    source and targets including native parallel connectivity to Oracle, Teradata, SQL

    Server and DB2. You might find a combination of products from Trillium, Informatica,

    Ab Initio or Oracle that could match it but then you would be dealing with more than

    one product and separate metadata repositories.

    This week IBM released the QualityStage module for SERP 8.0 for Canada Post

    address certification and cheaper mailout rates and a new free 900+ page RedBook

    IBM WebSphere QualityStage Methodologies, Standardization, and Matching.

    QualityStage 6 and 7 was a bit dodgy - it was all wizard based, it wasn't truly client-

    server. More like client-MS Access-flat file-server. The best way to use

    QualityStage was to get DataStage and then shoe horn QualityStage into it using the

    plugin. These Access and file repositories became a problem in a multi developer

    environment with lost and conflicting changes and mismatched metadata. It was a

    bit like decorating a cake with paintball guns.

    Why QualityStage 8 is Light Years ahead of QualityStage 7

    QualityStage 8.0 changed all that. Much like Die Hard 4.0 it's a return to form for

    the product with the key ingredient being the Designer.

    Appearance: QualityStage 8.0 is built into the

    DataStage Designer which is now awkwardly

    known as the DataStage and QualityStage

    Designer. This gives you the GUI data flow

    style interface and it lets you get to all settings

    via the GUI instead of having to dip into text

    files. True client-server with job locking,

    notification and release for multiple developers.

    New data quality bling bling such as frequency

    graphs and pass testing.

    ETL: With this release IBM ripped all the ETL

    steps out of QualityStage and replaced them with DataStage stages. QualityStage

    8 has a subset of DataStage stages plus the data quality stages. It has all the

    source and target stages, it has the most popular parallel stages of Transformer,

    Lookup and Join. These are much more efficient and powerful and easier to use

    than the old QualityStage functions. You get all the bonuses of DataStage: the

    source connector stages, the parallel framework, the common repository.

    QualityStage 8 is great standalone but most customers will probably be using it with

    1Recommend Share

    Your email address FOLLOW

    BEGIN NOW

    Work With Me

    Links

    Categories

    Big Data GO

    Tooling Around in the IBM InfoSphereby Vincent McBurney

    Vincent McBurney is an IBM Champion for InformationIntegration and has been blogging for many years onInfoSphere software and ... more

    Receive the latest blog posts:

    Share Your Perspective

    Share your professional knowledge and

    experience with peers. Start a blog on Toolbox forIT today!

    If you are an expert in InfoSphere software and want to workfor the biggest IBM partner in Australia and New Zealand

    get in touch with me via ITToolbox or Linked In.

    Steal This IM Methodology

    Informatica Data Quality Blog

    DataFlux Community of Experts

    Data Governance Blog

    dq:view - Steve Tuck on Data Quality

  • Information Analyzer, Metadata Workbench and DataStage with the shared

    repository.

    Shopping for an ETL and data quality tool

    What should you buy when shopping for a data integration tool?

    - When you buy DataStage you get every ETL stage and no Quality stages.

    - When you buy QualityStage you get every Quality stage and most (but not

    all) ETL stages.

    - When you buy both products you get all stages.

    So there is a LOT of overlap between the two products, which is great for developers

    as DataStage developers know about 50% of QualityStage. When you buy both

    you would expect a big discount due to the product overlap.

    When choosing between DataStage and QualityStage it's all about what

    QualityStage leaves out.

    When to Choose DataStage

    For starters there is the Slowly Changing Dimension stage that makes DataStage a

    better bet for Data Warehouses and dimensional models. In a Data Warehouse

    you might not want deep data quality processing - you might decide that really shit

    hot data quality work belongs in the source systems and not in the DW. The Slowly

    Changing Dimension stage will save you a lot of development time.

    Another feature of DataStage over QualityStage is the extensibility into custom

    stages and wrappers. If you are doing a lot of complex transformations and

    formulas you might prefer the ability to write special stages in DataStage. If you

    want to write a generic validation stage or a dynamic job that gets all its metadata

    from schema files you want DataStage.

    QualityStage breathes text, not air. Most of its special quality stages are about text

    fields - standardising them, matching them together. If you are mainly dealing with

    codes and numbers you might not need any of the quality stages.

    When to Choose QualityStage

    Most projects I've been on haven't needed custom stages or custom wrappers, most

    projects I've been on would have been happy with QualityStage. This gives you all

    the quality stage up your sleeve for no extra cost. These days I would choose

    QualityStage by default and try to find a reason for switch to DataStage - or buy

    both!

    Data Migration is a big one for QualityStage as it lets you merge and clean your data

    for your brand spanking new application.

    Master Data Management has an essential requirement of QualityStage - in fact it's

    so important that IBM put it onto the InfoSphere MDM Server. This server has all of

    IBM's acquired MDM products on it - customer center and product center, plus future

    MDM centers. It comes with QualityStage - not DataStage. IBM is the only vendor

    bundling a data quality tool with an MDM tool and they can do it because

    QualityStage has so much in it. It's a fully fledged ETL tool the merge, survivor

    standardisation and de-duplication required when you populate your MDM data.

    Plus it has SOA capabilities that go hand in hand with MDM SOA.

    The new Whopping Huge QualityStage Redbook

    QualityStage documentation can be a bit sparse in terms of examples and use

    cases. If only we had a 900 page guide that includes real examples, screen shots

    and product guidance. Oh, here's one: IBM WebSphere QualityStage

    Methodologies, Standardization, and Matching.

    This IBM Redbook publication documents the procedures for

    implementing IBM WebSphere QualityStage and related technologies

    using a typical merger/acquisition financial services business scenario.

  • It is aimed at IT architects, Information Management specialists, and

    Information Integration specialists responsible for developing

    This is a massive PDF - over 900 pages and 12MB. It's a book and tutorial and

    research paper all in one. The authors are a team brought together from across

    the globe: Nagraj Alur (IBM project leader), Alok Kumar Jha (IBM software lab

    Bangalor), Barry Rosen (IBM Director in the Center for Excellence in Data

    Integration) and Torben Skov (IBM Denmark). You can apply for your own IBM

    residency to help write a RedBook at the residency information page.

    There is a bit of Lion the Witch and the Wardrobe in the QualityStage tool. It looks

    like DataStage but when you click on a data quality stage you are taken into a

    fantasy world of data quality with seemingly endless depth. Even though on the

    palette it looks 10 stages there are some stages that are full of custom canvases,

    graphs, wizards, test forms etc.

    Below are just some of the functions covered by the RedBook with some quotations

    and screenshots borrowed.

    Address Cleansing

    The RedBook provides more detail on all the different types of address cleansing

    with examples and input and output data for each. The first two cost a bundle of

    extra money and the last two are free with QualityStage:

    WAVES (Worldwide Address Verification and Enhancement System): corrects

    topographical and spelling errors, uses probabilistic matching of an address to a

    country specific reference file, covers 233 countries to the city level and 71

    countries to the street level resulting in more accurate mailing.

    CASS (USA), SERP (Canada Post Software Evaluation and Recognition

    Program), DPID (Australia Post Delivery Point Identifier) will validate and format

    data according to the standards of the postal body in each country resulting in

    mail rate discounts.

    MNS (Multinational Standardization Stage) looks at the text strings to work out

    how to standardize the country based on things like zip codes and country codes

    and separates the street and area information.

    Country Rule Set is a specialised set of rules for just one country that can give

    you more control than MNS - especially for customisation and overrides.

    CASS, SERP and DPID offer the best certification on the market as they dig into a

    reference database from the official postal authority and deliver mail discounts where

    a match is found. WAVES is the next best as it also uses reference files to validate

    address and it can be purchased for individual countries, regions or worldwide. If

    you are not doing a lot of mailouts the standard out of the box Country Rule Set may

    be enough.

    As the RedBook shows the standardization are seamlessly integrated with

    DataStage:

    Matching

    This is one of the deepest parts of the QualityStage tool as you delve into different

  • ways to match or de-duplicate records. The tool lets you organise matches as a

    number of different passes as there are so many different ways to try and identify

    matches. The RedBook takes you through examples:

    Total statistics tab

    This tab provides you with statistical data in a graphical format for all

    the passes that you run.

    The cumulative statistics are of value only if you test multiple passes

    consecutively, in the order they appear in the match specification. The

    Total

    Statistics page displays the following information:

    Cumulative statistics for the current runs of all passes in the match

    specification.

    Individual statistics for the current run of each pass.

    Charts that compare the statistics for the current run of all passes.

    Survivorship

    Once you've done a match or a de-duplication you need to merge the records - this

    is known as survivorship. Only the best parts of each record should survive at the

    elimination council when all the votes get read.

    During the Survive stage, IBM WebSphere QualityStage takes the

    following actions:

    Replaces existing data with better data from other records based

    on user specified rules

    Supplies missing values in one record with values from other records

    on the same entity

    Populates missing values in one record with values from

    corresponding records which have been identified as a group in the

    matching stage

    Enriches existing data with external data

    As usual with a QualityStage feature you can choose the standard functions or dig

    deep for advanced functions:

    When you configure the Survive stage, you choose simple rules that

    are provided in the New Rules window or you select the Complex

  • More White Papers

    13 Comments

    Read 13 comments

    Survive Expression to create your own custom rules. You use some

    or all of the columns from the source file, add a rule to each column,

    and apply the data.

    The RedBook has examples for simple rules:

    And complex rules:

    The Wrap

    You cannot show everything QualityStage can do in one blog post or even one 900

    page RedBook but it will be a lot of help to new QualityStage developers to see real

    examples with screenshots. There is a lot of fun developers can have with

    QualityStage - it's got a lot more depth to each stage than a standard ETL function.

    Disclaimer: The opinions expressed herein are my own personal opinions and do not representmy employer's view in any way.

    Vincent McBurney is an IBM Information Champion for Information Integration.

    Popular White Paper On This Topic

    Reduce Costs with Endpoint Security

    Related White Papers

    What Exactly is the Right PC Hardware?

    A smarter approach to CRM: An IBM perspective

    Robert Rich Feb 22, 2008

    Vincent,

    Great post.

    We totally agree with you which is why we're investing in solutions that plug into and

    leverage the platform.

    Robert

    Blogs Discussions Research Directory

  • dialntsdf05 Jun 12, 2008

    Thank's Vincent for this interesting post. Actualy working as a PM in BI, I'm looking

    for an installation doc on Datastage (ETL,QualitySatge and ProfilStage). Have you

    got some elements on the subject please.

    Friendly

    Dial

    Vincent McBurney Jun 16, 2008

    The good news is that IBM have published a lot of information about installing the

    Information Server. Start at the Information Server Home and you'll see HTML

    documentation on installing, migrating and using the Information Server.

    sudha Aug 26, 2008

    Excellent article

    Ritu Sethi Jan 15, 2009

    very Informative Article

    kevindewhurst Jun 12, 2009

    I know this blog post is a little old but wanted to add that DataQualityFirst has now

    enhanced the capabilities of QualityStage and provided an accelerator that most

    feel should be used on every QualityStage implementation! www.dataqualityfirst.com

    Think QStage was powerful before? Try it with PartyQualityInsight!

    Sujata Bhattacharya Jun 30, 2009

    Hi Vincent

    This is an excellent posting on the strength and capabilities of Quality Stage. I love

    the product and have been working with the product for 10 plus years.

    -Sujata

    friendkak friend Jul 31, 2009

    Hi, whaat are the best practices that can be implemented using QS? Please post a

    few. Thanks,[email protected]

    USER_1847760 Jan 6, 2010

    It is the good post by vincent and which gives good idea about quality stage.

    Malini Lakhani Sep 17, 2010

    What are the pro's and con's of using Routines vs Rule sets for data validations in

    QS?

    balu balu Nov 17, 2011

    Dear Vincent,

    Very helpfull article for the starters of QS, could you please let me know the

    place/path where we can find the sample files for Quality Stage

    balu balu Nov 17, 2011

  • SUBMITPREVIEW

    Browse all IT Blogs

    Dear Vincent,

    Very helpfull article for the starters of QS, could you please let me know the

    place/path where we can find the sample files for Quality Stage

    balu balu Nov 18, 2011

    Dear Vincent,

    Very helpfull article for the starters of QS, could you please let me know the

    place/path where we can find the sample files for Quality Stage

    Leave a Comment

    Connect to this blog to be notified of new entries.

    Name Your email address

    You are not logged in.

    Sign In to post unmoderated comments. Join the community to create your free profile today.

    Want to read more from Vincent McBurney? Check out the blog archive .

    Archive Category: QualityStage

    Keyword Tags: qualitystage datastage data quality survivorship matching standardization qualitystage 8

    Disclaimer: Blog contents express the view points of their independent authors and are not review ed for

    correctness or accuracy by Toolbox for IT. Any opinions, comments, solutions or other commentary

    expressed by blog authors are not endorsed or recommended by Toolbox for IT or any vendor. If you feel a

    blog entry is inappropriate, click here to notify Toolbox for IT.

    From Around The Web

  • We Recommend

    DIY VoIP? Free Is Good for Home Use but

    not Business: Here's Why

    Mobile Apps, Analytics, Code Halos and

    Mass Personalization

    Update KB3035583 enables additional

    capabilities for Windows Update

    notifications in Windows 8.1 and

    Windows 7 SP1

    ERP Software Vendors: Don't Always Buy

    Their "Seamless Integration" Sales Pitch

    Some Facts about SAP Early Watch Alert

    (EWA)

    3D printing

    From Around The Web

    Recommended by

    Recommended by

    Collaboration Tools

    Discussion Groups

    BlogsWiki

    Toolbox for IT

    My Home

    TopicsPeople

    Companies

    JobsWhite Paper Library

    Follow Toolbox.com

    Toolbox for IT onTwitter

    Toolbox.com on Twitter

    Toolbox.com onFacebook

    Data Center

    Data Center

    Development

    C LanguagesJava

    Visual Basic

    Web Design & Development

    Enterprise Applications

    CRMERP

    PeopleSoftSAP

    SCM

    Siebel

    Enterprise Architecture & EAI

    Enterprise Architecture & EAI

    Information Management

    Business IntelligenceDatabase

    Data Warehouse

    Knowledge ManagementOracle

    IT Management & StrategyEmerging Technology & Trends

    IT Management & StrategyProject & Portfolio Management

    Cloud ComputingCloud Computing

    Networking & Infrastructure

    HardwareNetworking

    Communications Technology

    Operating Systems

    Linux

    UNIXWindows

    SecuritySecurity

    StorageStorage

    Topics on Toolbox for IT Toolbox.com

    About

    NewsPrivacy

    Terms of Use

    Work at Toolbox.comAdvertise

    Contact usProvide Feedback

    Help TopicsTechnical Support

    PCMag Digital Group

    Other Communities

    Toolbox for HRToolbox for Finance

    Copyright 1998-2015 Ziff Davis, LLC (Toolbox.com). All rights reserved. All product names are trademarks of their respective companies. Toolbox.com is notaffiliated with or endorsed by any company listed at this site.