data rules primer - kb.informatica.com · ix preface welcome to data explorer, the informatica...

120
Data Rules Primer Informatica Data Explorer (Version 9.0)

Upload: others

Post on 23-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Data Rules Primer

Informatica Data Explorer(Version 9.0)

Page 2: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Informatica Data Explorer Data Rules PrimerVersion 9.0.0

December 2009

Copyright (c) 1998-2009 Informatica Corporation. All rights reserved.

This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other Patents Pending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and Informatica On Demand are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights reserved. Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright © Meta Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. All rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved. Copyright © Rouge Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright © Glyph & Cog, LLC. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright (c) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org.

This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 (C) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html.

The product includes software copyright (c) 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// svn.dojotoolkit.org/dojo/trunk/LICENSE.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright (C) 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright (c) 2002 Ralf S. Engelschall, Copyright (c) 2002 The OSSP Project Copyright (c) 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

This product includes software copyright (c) 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php.

Page 3: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, and http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/license.html., and http://www.sente.ch/software/OpenSourceLicense.html

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php) and the BSD License (http://www.opensource.org/licenses/bsd-license.php).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/.

This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7,281,001; 7,421,458; and 7,584,422, international Patents and other Patents Pending..

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice.

NOTICES

This Informatica product (the “Software”) includes certain drivers (the “DataDirect Drivers”) from DataDirect Technologies, an operating company of Progress Software Corporation (“DataDirect”) which are subject to the following terms and conditions:

1. THE DATADIRECT DRIVERS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.

Part Number: IDE-DRP-90000-0001

Page 4: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data
Page 5: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica How-To Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

Informatica Multimedia Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . xi

Informatica Global Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . xi

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1: Introduction to Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . 1

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Introduction to Data Explorer Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

About the Data Rules Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Restrictions and Extensions in Data Explorer Data Rules . . . . . . . . . . . . . . . . 5

Choosing Between Data Rules and Drill Down . . . . . . . . . . . . . . . . . . . . . . . 6

Using Negative Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 2: Invalid, Missing, or Nonstandard Values . . . . . . . . . . . . . . 9

Finding Invalid Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Finding Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Finding Nonstandard Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Using Length as a Quality Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 3: Nonstandard Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Pattern Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Verifying the Format of a Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Finding Dates in Nonstandard Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Verifying Character Formats for Varchar Strings . . . . . . . . . . . . . . . . . . . . . . 24

Verifying a Pattern Embedded in a Column . . . . . . . . . . . . . . . . . . . . . . . . . 25

Escaping from the Normal Pattern Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Improving Performance With Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 27

v

Page 6: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Chapter 4: Validating Date and Time Values . . . . . . . . . . . . . . . . . . . . 29

Using the CHECKDATE Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Using the DATE Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Pros and Cons of Providing the Date Format . . . . . . . . . . . . . . . . . . . . . 32

Using the DATE Function Without Specifying a Date Format . . . . . . . . . 34

Using the EXTRACT Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Comparing Date and Time Fields to Literal Values . . . . . . . . . . . . . . . . . . . . 40

Date Tag Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 5: Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 6: Interactions Between Multiple Fields . . . . . . . . . . . . . . . . 49

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 7: Special Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57

NULL Value Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Example: Impact of NULLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Caution on Using NULLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Using the ROWNUM( ) Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

SQL Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Appendix A: Exploring Data with Data Explorer . . . . . . . . . . . . . . . . 65

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Working with Data Rules in the Tag Viewer . . . . . . . . . . . . . . . . . . . . . . . . . 67

Context Menu Options for a Rule in the Tag Viewer . . . . . . . . . . . . . . . 69

Creating a Data Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Specifying Data Source and Sample Data Options for Data Rule Execution . . 75

Sample Data Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Search Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Executing Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Results of Data Rule Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Editing a Data Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

vi Table of Contents

Page 7: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Appendix B: WHERE Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

RDBMS WHERE Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Sample RDBMS WHERE Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Data Explorer Standard SQL Syntax WHERE Clauses . . . . . . . . . . . . . . . . . 86

About NULLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Operator Order of Precedence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

IS PATTERN and IS NOT PATTERN . . . . . . . . . . . . . . . . . . . . . . . . . 90

Sample Data Explorer Standard SQL WHERE Clauses . . . . . . . . . . . . . . 91

Data Explorer Standard SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Translation Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Drill Down WHERE Clause Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Generic and Specific Drill Down Templates . . . . . . . . . . . . . . . . . . . . . . 95

Logical Operators in Generic Templates . . . . . . . . . . . . . . . . . . . . . . . . 96

Drill Down Template Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Customizing Drill Down Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Appendix C: Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

Common Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Table of Contents vii

Page 8: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

viii Table of Contents

Page 9: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Preface

Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data through data profiling. Data Explorer is a complete set of data investigation, discovery, and mapping tools that allows you to quickly analyze multiple source systems and eases the process of scoping the size and complexity of a migration or data integration project.

ix

Page 10: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Informatica Resources

Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Documentation Center, and access to the Informatica user community.

Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at [email protected]. We will use your feedback to improve our documentation. Let us know if we can contact you regarding your comments.

The Documentation team updates documentation as needed. To get the latest documentation for your product, navigate to the Informatica Documentation Center from http://my.informatica.com.

Informatica Web SiteYou can access the Informatica corporate web site at http://www.informatica.com. The site contains information about Informatica, its background, upcoming events, and sales offices. You will also find product and partner information. The services area of the site includes important information about technical support, training and education, and implementation services.

Informatica How-To LibraryAs an Informatica customer, you can access the Informatica How-To Library at http://my.informatica.com. The How-To Library is a collection of resources to help you learn more about Informatica products and features. It includes articles and interactive demonstrations that provide solutions to common problems, compare features and behaviors, and guide you through performing specific real-world tasks.

Informatica Knowledge BaseAs an Informatica customer, you can access the Informatica Knowledge Base at http://my.informatica.com. Use the Knowledge Base to search for documented solutions to known technical issues about Informatica products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If you have questions, comments, or

x : Preface

Page 11: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

ideas about the Knowledge Base, contact the Informatica Knowledge Base team through email at [email protected].

Informatica Multimedia Knowledge BaseAs an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://my.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files that help you learn about common concepts and guide you through performing specific tasks. If you have questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base team through email at [email protected].

Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the WebSupport Service. WebSupport requires a user name and password. You can request a user name and password at http://my.informatica.com.

Use the following telephone numbers to contact Informatica Global Customer Support:

North America / South America Europe / Middle East / Africa Asia / Australia

Toll Free +1 877 463 2435

Standard RateBrazil: +55 11 3523 7761 Mexico: +52 55 1168 9763 United States: +1 650 385 5800

Toll Free 00 800 4632 4357

Standard RateBelgium: +32 15 281 702France: +33 1 41 38 92 26Germany: +49 1805 702 702Netherlands: +31 306 022 797Spain and Portugal: +34 93 480 3760United Kingdom: +44 1628 511 445

Toll Free Australia: 1 800 151 830Singapore: 001 800 4632 4357

Standard RateIndia: +91 80 4112 5738

Informatica Resources xi

Page 12: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

xii : Preface

Page 13: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 1

Introduction to Data Rules

This chapter includes the following topics:

♦ Overview, 2

♦ Introduction to Data Explorer Data Rules, 3

♦ About the Data Rules Syntax, 4

♦ Restrictions and Extensions in Data Explorer Data Rules, 5

♦ Choosing Between Data Rules and Drill Down, 6

♦ Using Negative Logic, 7

1

Page 14: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

Data Explorer has a Data Rules feature that lets you create, save, and execute Data Rules against a Relation. This new functionality, which is implemented within the Rules tag, allows you to systematically explore and validate your data.

The Data Rules features were designed to run against any data source that can be accessed by of a Data Explorer importer.

This document assumes that you are familiar with SQL.

2 Chapter 1: Introduction to Data Rules

Page 15: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Introduction to Data Explorer Data Rules

Data Explorer Data Rules provide a new set of functions to complement the leading-edge profiling capabilities of the Informatica Data Explorer. Data Rules let you explore data in ways that are not possible with column profiling alone. You can specify data relationships that should hold, based on business requirements, and can verify that those relationships do indeed hold on different instances of tables and flat files.

Data Rules are not business rules. By most definitions, business rules imply actions to be taken when given situations are triggered. In contrast, Data Explorer Data Rules are used to encode business requirements as data relationships.

The Data Explorer Data Rules functionality lets you encode business requirements in SQL-like syntax and then verify that the data relationships implied hold for any given instance of implicated data.

There are two places to look for information on the Data Explorer Data Rules functionality. First, this document provides a hands-on, example-driven guide to how to use Data Explorer Data Rules.

Introduction to Data Explorer Data Rules 3

Page 16: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

About the Data Rules Syntax

The Data Explorer Data Rule language is called RRuL (Relational Rule Language). RRuL is based on SQL.

Data Explorer had two conflicting goals when developing RRuL language syntax. First, the language had to be easy to use. Ideally, even non-technical users should be able to build and execute Data Rules. Second, the language had to be relationally complete, or in other words, could express any data relationship that could be defined by standard relational database operations.

After examining a variety of language types, we finally decided on a SQL-like syntax for RRuL. While it may not be the easiest language for non-technical users, many users of the Informatica Data Explorer already have some familiarity with SQL. In addition, SQL is the standard of relational database language. Virtually every relational database available, from desktop engines to large mainframe systems, supports SQL in some form. SQL is relationally complete. It can express any standard data relationship.

For these reasons, RRuL is based on the ANSI SQL/92 standard. Initially, there are several restrictions. This version of Data Explorer does not support the full standard yet. However, to satisfy the needs of data profiling users, Data Explorer provides some extensions to the standard (as have most database vendors as well). So, the Data Explorer RRuL language is an extended subset of ANSI SQL/92. For more information about language restrictions and extensions, see “Restrictions and Extensions in Data Explorer Data Rules” on page 5 and “WHERE Clauses” on page 83.

A Data Rule expressed in RRuL has the following general form:

RULE <rule-name>

SELECT * | <column-name-list>

FROM <table implied by context>

WHERE <where-clause>

A Data Rule contains a rule header (the RULE statement) followed by the rule body (the SELECT statement). In the this implementation of Data Rules, the table name is implied by the context of the rule. There will always be a current data source against which the Data Rule is to be run. That data source determines the table/Relation. For more information about restrictions for the WHERE clause, see “Restrictions and Extensions in Data Explorer Data Rules” on page 5 and “WHERE Clauses” on page 83.

4 Chapter 1: Introduction to Data Rules

Page 17: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Restrictions and Extensions in Data Explorer Data Rules

The Data Explorer Data Rules feature does not support full ANSI SQL/92. The restrictions are determined by the data source against which you are running the Data Rule. There are two classes of environments in which Data Rules can be executed: Data Explorer flat files and RDBMSs. These environments determine the syntax restrictions for a given instance of a Data Rule.

In Data Explorer flat files, Data Explorer executes the SQL associated with the Data Rule internally. Running against flat files, you are restricted to the Data Explorer Standard SQL syntax. In general you are restricted to single table queries without aggregation, GROUP BY, ORDER BY, joins, or sub-queries, for more information on Data Explorer Standard SQL, see “WHERE Clauses” on page 83. We plan to reduce these restrictions in future releases.

In RDBMS execution, when your source is in an RDBMS instead of Data Explorer flat files, you can use some of the additional features provided by the database engine. In this case, the query will be passed through to the source database engine for execution. You still cannot use join syntax or aggregations, but you can use sub-queries. In addition, you can use scalar functions that are not necessarily valid in the Data Explorer Standard SQL Syntax.

The Data Explorer Standard SQL extension for PATTERN matching is available both for Data Explorer flat files and for RDBMS sources. With an RDBMS source, Data Explorer determines how to execute a query that returns as few rows as possible, based on the entirety of the WHERE clause. Then, Data Explorer applies pattern matching logic on the results set.

For maximum compatibility with future Data Explorer releases, you should code your Data Rules to conform to Data Explorer Standard SQL Syntax. In future releases, new and enhanced functionality may not support query pass-through. To aid this, there is a function in the rule entry dialog box to validate your rule against Data Explorer Standard SQL Syntax.

Restrictions and Extensions in Data Explorer Data Rules 5

Page 18: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Choosing Between Data Rules and Drill Down

Although there is some similarity in this release of Data Explorer between the Drill Down capability and Data Rule execution, the two features are intended for entirely different applications.

Drill Down is primarily intended for the analyst to be able to view source data rows from various places in Data Explorer where summarized results are presented, such as value/frequency pairs in column profiling. This allows for in-depth analysis of the results of profiling activities in Data Explorer. Nonetheless, a Drill Down query can be saved as a Data Rule. For some rules, such as domain restriction constraints, this is a convenient way to get started developing a rule.

Data Rules are derived from the business requirements of the user. They express relationships that should hold in the data throughout time. Users can code and retain these rules and verify that multiple instances of a data source conform to the desired constraints. Since Data Rules are anchored to some object in a Schema, if you remove an object for any reason, you will lose the Data Rules you have associated with it. You can import a new set of sample data from the same data source or a new one without losing the saved rules.

6 Chapter 1: Introduction to Data Rules

Page 19: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using Negative Logic

Many of the Data Rules that are most useful in business situations are called constraints. These rules test whether a given relationship exists in the data when the rule is executed. Currently, Data Explorer only supports row-level queries, so the constraints that we are dealing with are row-level constraints.

There are several types of row-level constraints that are common in Data Rules:

♦ All. All rows in the result set must conform to the constraint specified.

♦ None. No rows in the result set can exist which match the rule selection criteria.

♦ Existence. Some rows (sometimes just one row) must exist that satisfy the expressed constraint.

All rules and None rules are very similar in nature. All rules can be expressed as None rules (and vice versa) by using negative logic in the WHERE clause.

An example of an All rule might be: For all rows in a personnel file, the hire date must be after the birth date. For a rule like that, we attempt to create an exception table. It is much more interesting to look at the rows that violate the rule than at the rules that satisfy the constraint. Normally, with a well-constructed rule, for most situations, there will be dramatically fewer exceptions. So for an All rule, we typically use negative logic in the WHERE clause to produce the exception table. To create the rule we just listed, we might code syntax something like:

RULE testdate01

SELECT *

FROM <table>

WHERE HireDate < BirthDate

As a result of running this rule, we will see a table that consists of all rows where the employee was born after they were hired. This table might be empty, indicating that no rows violate the rule.

An example of a None rule might be: No part time employees should be in Department 19. Again, we want to create an exception table, so we use reverse logic and code the rule like the following:

RULE nopart19

SELECT Name, Department, Supervisor

FROM <table>

WHERE EmpType = 'PartTime' AND DeptNo = 19

This rule will produce a table listing all part time employees in Department 19.

The third kind of rule that is of interest is the Existence rule. In this case, we are testing to validate that some row does exist that satisfies the given constraint. For example, there must be at least one temporary employee in Department 19. In this case, we do not code the negative of the desired condition. Instead of an exception table, we are creating an “existence

Using Negative Logic 7

Page 20: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

table” that demonstrates that the desired condition is met. For this rule, we might code the following:

RULE musthavetemp19

SELECT *

FROM <table>

WHERE EmpType = 'Temp' AND DeptNo = 19

The form of the WHERE clause in this rule is the same as in the previous rule, which was a None rule. This demonstrates that it is up to the user to evaluate the results of a given Data Rule execution to determine the quality of the data.

8 Chapter 1: Introduction to Data Rules

Page 21: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 2

Invalid, Missing, or Nonstandard Values

This chapter includes the following topics:

♦ Finding Invalid Values, 10

♦ Finding Missing Values, 14

♦ Finding Nonstandard Values, 16

♦ Using Length as a Quality Indicator, 18

9

Page 22: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Finding Invalid Values

In this example, a company’s business rules dictate that the Employee_ID always starts with the two-character Region_code of the employee’s home office. To work on this data quality problem one region at a time, you could create a rule for each home office:

RULE "Region AA EmpID check"

SELECT "EMPID","LAST_NAME","REGION"

FROM <Use Table in Data Source>

WHERE Region = 'AA' and empid not like 'AA%'

RULE "Region AB EmpId check"

SELECT "EMPID","LAST_NAME","REGION"

FROM <Use Table in Data Source>

WHERE Region = 'AB' and empid not like 'AB%'

RULE "region DF EmpId check"

SELECT "EMPID","LAST_NAME","REGION"

FROM <Use Table in Data Source>

WHERE Region = 'DF' and empid not like 'DF%'

10 Chapter 2: Invalid, Missing, or Nonstandard Values

Page 23: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

The data fragment for this example is shown below.

Using this sample data, the Region AA rule returns this row:

Finding Invalid Values 11

Page 24: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

And the Region AB rule returns these rows:

The Region DF rule finds no rows.

Depending on the number of regions the company has, and if perhaps the plan is for each region’s administrative staff to work on the follow-up effort to clean up the data, this approach might be sufficient. If, instead, you want an overview of all Employee_IDs that do not conform, you could construct a more sophisticated rule using a substring function:

RULE "Region vs EmpId"

SELECT "EMPID","LAST_NAME","REGION"

FROM <Use Table in Data Source>

WHERE (SUBSTRING(EMPID from 1 for 2)

<> SUBSTRING (REGION from 1 for 2))

or EMPID is null or REGION is null

This rule would return:

Notice that this approach has the advantage of also highlighting any rows where the region is simply invalid.

Keep in mind that both the LIKE function and the SUBSTRING function are only valid with character strings. Numeric fields, dates, and times must be approached differently. In the case of numeric fields, the NOT BETWEEN construction could be used to build a Data Rule for

12 Chapter 2: Invalid, Missing, or Nonstandard Values

Page 25: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

each region, instead of using NOT LIKE. For more information on using Data Rules with dates, see “Validating Date and Time Values” on page 29.

Finding Invalid Values 13

Page 26: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Finding Missing Values

In this example, a company’s business rule demands that every vendor must have a non-null City, State, and ZIP. To see if this is true in the data, use the IS NULL function.

RULE "city|state|zip are null"

SELECT "Vendor_NO","Vendor_city","Vendor_state","Vendor_zip"

FROM <Use Table in Data Source>

WHERE Vendor_zip IS NULL

or Vendor_city is null

or Vendor_state is null

When applied against this data:

14 Chapter 2: Invalid, Missing, or Nonstandard Values

Page 27: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

The rule would return these rows:

Finding Missing Values 15

Page 28: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Finding Nonstandard Values

In this example, a company’s business rules allow premiums to be paid monthly, quarterly, semi-annually, or annually. To discover and examine those policies that do not conform, use a rule like this:

RULE "PremPmt Mode Valid"

SELECT "Policy_ID","Policy_issuedate","PremPmt_Mode"

FROM <Use Table in Data Source>

WHERE PremPmt_Mode not in ('M','Q','S','A')

Note: In this example, the column PremPmt_Mode is non-nullable.

Using this data:

16 Chapter 2: Invalid, Missing, or Nonstandard Values

Page 29: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

These rows are returned:

In the returned rows, there are some completely unexpected result values in the premium payment mode field (‘B’, ‘T’), as well as some where the case difference triggered their appearance (‘m’, ‘a’). Selecting the policy issue date in the rule gives valuable context information as to when this difference started to appear. The ‘B’ and ‘T’ might be new values allowed for payments (bi-weekly and twice a month?) which the company started accepting in 1996 and 1995, respectively. That means that the business rules have changed and that this rule might need to be updated accordingly.

To keep equivalent lowercase values such as ‘m’ and ‘a’ from being flagged by this Data Rule, use UPPER in the WHERE clause to cast PremPmt_Mode as always uppercase, forcing the lowercase values to be viewed (and compared) as uppercase. For example:

RULE "mode-case insensitive"

SELECT "Policy_ID","Policy_issuedate","PremPmt_Mode"

FROM <Use Table in Data Source>

WHERE UPPER(PremPmt_Mode) NOT IN ('M','Q','S','A')

When run on the same premium payment data sample returns these rows:

Finding Nonstandard Values 17

Page 30: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using Length as a Quality Indicator

The fact that the data occupying a column is shorter than expected could be an indicator of poor quality data.

For example, consider a 12-character field of fixed length that is labeled as Receipt which, when correctly filled in, will use all 12 characters. In other words, the length of the actual data will match the length of the field. Let’s say the need to gather receipt information was lost some time ago, and the column has instead become a place to store the license number, which has a length of eight or less. You might then have a rule like this:

RULE Receipt_Check

SELECT "RECEIPT"

FROM <Use Table in Data Source>

WHERE RECEIPT IS NOT NULL or

CHAR_LENGTH(RTRIM(RECEIPT)) < CHAR_LENGTH(RECEIPT)

This rule will return the rows where the Receipt field is not completely filled out or is null. When you examine that data, you might recognize that the contents are now license numbers instead of receipt numbers.

Another example is that if a field that is supposed to be last name has data with an actual length of only one or two characters, you can be suspicious that it is not a valid last name. It would certainly be worth taking a look at those rows. The following rule would accomplish this, if the LASTNAME field was a fixed length:

RULE LastName_Check

SELECT "LASTNAME"

FROM <Use Table in Data Source>

WHERE LASTNAME IS NULL OR CHAR_LENGTH(RTRIM (LASTNAME)) < 3

If LASTNAME is a variable-length field, you don’t need to include the function RTRIM in the WHERE clause, which can therefore be simplified to:

RULE LastName_Check_VariableField

SELECT "LASTNAME"

FROM <Use Table in Data Source>

WHERE LASTNAME IS NULL OR CHAR_LENGTH(LASTNAME) < 3

18 Chapter 2: Invalid, Missing, or Nonstandard Values

Page 31: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 3

Nonstandard Formats

This chapter includes the following topics:

♦ Overview, 20

♦ Pattern Syntax, 21

♦ Verifying the Format of a Field, 22

♦ Finding Dates in Nonstandard Formats, 23

♦ Verifying Character Formats for Varchar Strings, 24

♦ Verifying a Pattern Embedded in a Column, 25

♦ Escaping from the Normal Pattern Rules, 27

19

Page 32: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

Many columns in the business world have prescribed formats. Some examples are telephone numbers, zip codes, Vehicle Identification Numbers (VIN), and tracking codes (whose formats vary by carrier). A useful capability provided by Data Rules is pattern recognition.

Pattern profiling discovers the character format of the data, much like the picture clause in COBOL describes the format of the data. Once profiled, inferred patterns can be used to validate that the data conforms to a particular pattern.

20 Chapter 3: Nonstandard Formats

Page 33: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Pattern Syntax

For Data Explorer, each pattern in the data is restricted to a string whose nth character represents the class of characters that can occupy the nth position.

For example, the string 10-JAN-2000 is represented by the pattern 99-UUU-9999. If the month name can appear in any case, the pattern 99-XXX-9999 is used. To make patterns more readable, a repeated symbol may be suffixed with “(n)”, where “n” represents the count. In the preceding example, 99-XXX-9999 could have been written as 9(2)-X(3)-9(4).

For a complete list of the symbols that represent the characters allowed at a given position in a pattern, see “IS PATTERN and IS NOT PATTERN” on page 90.

Pattern Syntax 21

Page 34: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Verifying the Format of a Field

In this example, a company’s travel department tracks air travel e-ticket confirmations, and wants to check that the confirmation number matches a format that is valid for the issuing airline. Since each airline has its own format, a separate rule should be created for each airline. However, you might find some that use the same format, which allows you to combine the rules.

RULE "e-ticket Pattern"

SELECT Employee_ID, First_Name, Last_Name, Travel_date,

Issuing_Airline, Eticket_Confirmation

FROM <Use Table in Data Source>

WHERE (Issuing_Airline IN ('AA', 'NW') and

Eticket_Confirmation IS NOT PATTERN '9UUU9UU') or

(Issuing_Airline = 'CO' and

Eticket_Confirmation IS NOT PATTERN 'UU99UU99')

22 Chapter 3: Nonstandard Formats

Page 35: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Finding Dates in Nonstandard Formats

If you have a character column whose contents are dates, you might find that several different formats were used during input over the years. This can be a problem if you want to change the column to use a database date data type, or if you simply want to have a predictable format for creating other rules. You can use patterns in a rule to determine how many rows are out of compliance with your intended format of “Mon DD, YEAR”:

RULE "Entry_date Pattern"

SELECT ROWNUM, Employee_ID, Entry_date

FROM <Use Table in Data Source>

WHERE Entry_date IS NOT PATTERN 'UUU 99, 9999'

Finding Dates in Nonstandard Formats 23

Page 36: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Verifying Character Formats for Varchar Strings

To verify a character column as being all uppercase letters, you could use the IS PATTERN function. For example, if the column MyCharField is length 10, code the WHERE clause like this:

WHERE MyCharField IS PATTERN 'U(10)'

However, verifying that a column is all uppercase is more complicated if the column you want to check is a varchar where you don’t actually know the length. To check all the characters without going too far, you could approach it like this:

WHERE MyVarcharField IS PATTERN

'U('||STRING(CHAR_LENGTH(MyVarcharField))||')'

The following sections examine this complicated pattern in more depth:

♦ CHAR_LENGTH(). CHAR_LENGTH( MyVarcharField ) is used to get the number of characters for each value in the field, MyVarcharField. If MyVarcharField could have a length from 1 to 50, then CHAR_LENGTH will return a value between 1 and 50. For example, if MyVarcharField takes on a value of “COLORADO”, the value of CHAR_LENGTH( MyVarcharField ) is eight.

♦ STRING(). The STRING function casts its content be a string rather than a number. Because its content is the CHAR_LENGTH () function, this will resolve the returned length to be a string. In this example, it returns the string ‘8’.

♦ Concatenation ||. The double pipes is the string concatenation operator. In this case, the results of the STRING function are concatenated after the string ‘U(', which is then concatenated before the string, ‘)’ to form the final pattern string of ‘U(8)’.

In our example of when MyVarcharField equals “COLORADO”, this produces the pattern string 'U(8)'. If the next value of MyVarcharField is “TEXAS”, the pattern string becomes 'U(5)'. Thus the pattern string is always kept in sync with the current value being evaluated.

24 Chapter 3: Nonstandard Formats

Page 37: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Verifying a Pattern Embedded in a Column

In this example, a company has a comment field (COMMENT1) where the users have included a constant string (REC# or rec#) followed by the relevant comment. The data has a prescribed pattern that it must conform to that you want to check, but you do not know exactly where in COMMENT1 the data appears.

This rule searches COMMENT1 for the string ‘REC#’, allowing for lowercase or uppercase, and checks that the following 8 characters match the pattern 9XX-99XX. If the field fails to match the pattern or if COMMENT1 is null, the row is returned.

RULE REC#_pattern_check

SELECT "CLIENT_NAME","COMMENT1"

FROM <Use Table in Data Source>

WHERE SUBSTRING(COMMENT1 from

position('REC#' in UPPER(COMMENT1)+4 for 8)

IS NOT PATTERN '9XX-99XX'

OR COMMENT1 is null

You could allow for the possibility of a blank or a dash intervening between REC# and the actual data by making the rule more elaborate. Notice that in the second substring, the AND condition, the start of the data is adjusted to be one character farther to the right (+5 instead of +4):

RULE REC#_pattern_check

SELECT "CLIENT_NAME","COMMENT1"

FROM <Use Table in Data Source>

WHERE (SUBSTRING(COMMENT1 from

position('REC#' in UPPER(COMMENT1)+4 for 8)

IS NOT PATTERN '9XX-99XX' AND

SUBSTRING(COMMENT1 from

position('REC#' in UPPER(COMMENT1)+5 for 8)

IS NOT PATTERN '9XX-99XX')

OR COMMENT1 is null

But there is still a problem with this rule. If the COMMENT1 does not actually contain the substring ‘REC#’ anywhere in it, then the POSITION portion of the rule, that is, “POSITION(‘REC#’ in UPPER(COMMENT1)+4 [or 5]” will resolve to the location 4 [or 5] within COMMENT1. This is because when it does not find ‘REC#’, it resolves to zero, to which it adds the requested offset.

As currently crafted, the rule will NOT return a row that lacks the string ‘REC#’ if the fourth through eleventh (or fifth through twelfth) characters of COMMENT1 happen to match the requested pattern of '9XX-99XX'. As that might be a problem, depending on the usage of

Verifying a Pattern Embedded in a Column 25

Page 38: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

COMMENT1, the rule can be strengthened to also return any row where COMMENT1 did not contain the string ‘REC#’:

WHERE

(SUBSTRING(COMMENT1 from position('REC#' in

UPPER(COMMENT1)+4 for 8) IS NOT PATTERN '9XX-99XX'

AND SUBSTRING(COMMENT1 from position('REC#' in

UPPER(COMMENT1)+5 for 8) IS NOT PATTERN '9XX-99XX')

OR COMMENT1 is null

OR POSITION('REC#' in UPPER(COMMENT1) = 0

26 Chapter 3: Nonstandard Formats

Page 39: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Escaping from the Normal Pattern Rules

Certain characters have special meaning when they appear within a pattern. For example, 'U' means any uppercase alphabetic character, and '9' means any numeric character. For a full list of characters, see “Data Explorer Standard SQL Syntax WHERE Clauses” on page 86.

But what if you want to check for conformance to a pattern that includes one of the special characters of pattern notation? What if you want to know if the first two characters of a column are actually “U9” and the remaining two are any number? If you were to code:

FIELDX IS NOT PATTERN 'U999'

The test would be for any uppercase letter followed by any three numeric characters, which is not what you want. In a case like this, you must “escape” from the standard behavior for part of the pattern processing. This is done by using an escape clause in the pattern notation to signal you want to turn off the behavior of the special pattern elements. This is similar to the way the escape clause operates with SQL standard LIKE. In the escape clause, you indicate what character will trigger the change in behavior.

FIELDX IS NOT PATTERN '/U/999' ESCAPE '/'

That way, when the pattern processor detects the designated escape character, whatever it may be, it will disable special processing and treat next character as a normal character. In this fashion, you may escape as many characters in a pattern as you need.

Improving Performance With PatternsIf given a choice, use an IS PATTERN query over an IS NOT PATTERN because Data Explorer can execute IS PATTERN faster. Notice that “NOT ( x IS PATTERN p)” is semantically the same as “x IS NOT PATTERN p”. Therefore, negating IS PATTERN suffers the same performance penalty as IS NOT PATTERN. For more information, see “Performance Considerations” on page 62.

Escaping from the Normal Pattern Rules 27

Page 40: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

28 Chapter 3: Nonstandard Formats

Page 41: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 4

Validating Date and Time Values

This chapter includes the following topics:

♦ Using the CHECKDATE Function, 30

♦ Using the DATE Function, 32

♦ Using the EXTRACT Function, 37

♦ Comparing Date and Time Fields to Literal Values, 40

♦ Date Tag Formats, 41

29

Page 42: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using the CHECKDATE Function

Date type format enforcement in RDBMSs prevents invalid date data in a column from being a problem in those environments. If, however, you are processing a flat file or have stored dates in a character or numeric field, you may need to worry about invalid data in a date column. You might want to view the specific records with an actual invalid value in some date column. This is accomplished using the CHECKDATE function. In the following discussion:

♦ <date expression> is either a column containing a date value or a literal date.

♦ <date tag> indicates the format, if known, of the date contents. The <date tag> must be a valid date format tag. For more information, see “Date Tag Formats” on page 41.

To determine if the <date expression> can be parsed and is a valid date according to the Data Explorer date tags, use this rule:

CHECKDATE(<date expression>,'<date tag>') = 0

To determine if the <date expression> is null (or can be parsed and matches a null date rule), use this rule:

CHECKDATE(<date expression>,'<date tag>') = 1

To determine if the <date expression> cannot be parsed, or if it can be parsed but the date contents are not valid (for example, 9/31 or 2/29/1991), use this rule:

CHECKDATE(<date expression>,'<date tag>') = 2

The corollary determinations can also be used.

If the <date expression> is not a valid date or is null for any reason, use this rule:

CHECKDATE(<date expression>,'<date tag>') <> 0

If the <date expression> is not null for any reason, including not being able to convert it, use this rule:

CHECKDATE(<date expression>,'<date tag>') <> 1

If the <date expression> can be parsed and is a valid date or null, use this rule:

CHECKDATE(<date expression>,'<date tag>') <> 2

Here is an example of a rule to test the validity of a date:

RULE ValidateHireDate

SELECT "Agent_ID","Agent_Name","Hire_Date"

FROM <Use Table in Data Source>

WHERE CHECKDATE(Hire_Date, 'year/mm/dd') <> 0

30 Chapter 4: Validating Date and Time Values

Page 43: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

With the following input data:

This rule returns the following rows because the date is invalid:

The date format mentioned in the syntax corresponds to the date format tags found in the Column Profiling colprof.ini file. If there is no date format specified in a CHECKDATE, Data Explorer uses the same process that it applies in Column Profiling to infer the most likely date format and proceeds with the validation from there. If you know the date format and provide it, the performance of the Data Rule will improve and will be more predictable. For more information about date formats, see “Date Tag Formats” on page 41.

The date validation performed in Data Rules is more exacting than what occurs in Column Profiling. Because the focus in Column Profiling is on recognizing date formats, the validation done there is more basic. Months must be between 1 and 12, or equal to some acceptable variation of the alphabetic January, February, March, etc. Days must be between 1 and 31, but without considering context of the month or year of the date. With CHECKDATE, the context of month, day, and year values are considered together, so that a month with only 30 days will not be allowed to have a day of 31. Similarly, a date for the month of February can only have a 29 in the days if the year in that date is a valid leap year.

Using the CHECKDATE Function 31

Page 44: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using the DATE Function

If you are dealing with character string columns that contain dates, you might need to compare them to other date columns. The format of the DATE function is:

DATE(<expression>)

You can also use a more precise form of the DATE function:

DATE(<expression>,'<date format>')

The DATE() function tries to find a matching date format for the expression by searching for the first date format tag in the colprof.ini file that the data in the current row can satisfy. Then, it uses that format until data in a subsequent record conflicts with it.

As an example, consider this string: 04/01/2001. This would be valid in the context of the mm/dd/year format, so Data Explorer assumes that is the correct format and interprets it as April 01, 2001. But if a subsequent record contains 13/04/2000 in the same or different column, Data Explorer realizes that mm/dd/year no longer works and searches further through the date tags for a format that does work with current string. The next valid format is dd/mm/year, which Data Explorer switches to.

In the WHERE clauses of Data Rules, this switching can pose problems because the rows have already been passed by and either selected or not, based on one specific date format before Data Explorer can determine that really another date format is a better candidate. If Data Explorer switches date formats midstream, your results can be unpredictable.

Further, if there are two such DATE( <expression> ) in a query, the date processing may thrash between two different formats and performance may suffer. Therefore, as a general rule you should indicate the correct date format tag to explicitly convert the date in your rule as part of your comparisons.

The DATE function helps you make comparisons by explicitly converting the string to a true date before the comparison takes place. In this example, a company wants to verify that the Start_Date is not before the Hire_Date in a record. The Start_Date, being a character string, needs some conversion before it can be successfully compared. In this rule, the date format tag is included in the DATE function to tell Data Explorer that the string Start_Date is known to be in the format ‘mm/dd/year’. This rule can be used to verify that the hire date occurs before the start date:

RULE StartDateCheckImpliedConversion

SELECT "Agent_ID","Start_Date_char","Hire_Date"

FROM <Use Table in Data Source>

WHERE DATE(Start_Date_char,'mm/dd/year') < Hire_Date

Pros and Cons of Providing the Date FormatIn the example above, the Hire_Date was not explicitly converted by using DATE(Hire_Date, 'year/mm/dd'). This is because it was already known to be in a consistent format. This could have been known because the company was operating on an RDBMS where a column

32 Chapter 4: Validating Date and Time Values

Page 45: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

identified as a date or a datetime is always returned in display format. Or, the Hire_Date column might have been previously reviewed and cleaned. Unfortunately, Data Explorer automatically converts each such column to an internal date by implicitly using the DATE() function.

Given the following sample flat file data:

The rule to verify that the hire date occurred before the start date returns these five rows from the flat file:

The first two of those returned rows look like they shouldn’t have been returned, given that the Hire_Date is in the format of ‘year/mm/dd’. It appears that the Start_Date and the Hire_Date are equal to each other, something that our rule allows. But it turns out that since the Hire_Date was not explicitly converted to a date, the implicit conversion caused the unpredictable results. For most Data Rules, it is better to err on the side of explicit date conversions when coding your rules, to get the best results. Here is a revision of the preceding rule, this one forcing explicit conversion using the date format known to the user:

RULE StartDateCheckExplicitConv

SELECT "Agent_ID","Start_Date_char","Hire_Date"

FROM <Use Table in Data Source>

WHERE DATE(Start_Date_char,'mm/dd/year')

Using the DATE Function 33

Page 46: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

< DATE(Hire_date,'year/mm/dd')

This rule returns the three rows that show a Start_Date that is earlier than the Hire_Date, as it should:

Using the DATE Function Without Specifying a Date FormatThere are times when you might want to use the DATE function, but not supply the date format. For example, Data Explorer is accurate about figuring out dates that come in diverse formats, such as these:

♦ Nov 23, 1957

♦ September 23, 1959

♦ March 14 1963

♦ 25 April 1937

♦ 3/12/78

♦ 5-14-63

♦ 21-Sep-94

This is useful functionality. For example, if a company has a file with a date column of mixed freeform format, and wanted to test for the values being between two date literals, you could make this rule:

RULE OrderDateCheck

SELECT "ORDER_NO","ORDER_DATE"

FROM <Use Table in Data Source>

WHERE DATE(ORDER_DATE)

NOT BETWEEN DATE('04/06/1990','mm/dd/year')

AND DATE('01/01/1997','mm/dd/year')

Notice that while the DATE function is used on the column and the literals, the date format is only supplied for the literals. It’s important to provide the date format on the literals because at least one of those literals is ambiguous. If you look at the date 04/06/1990 and you do not have a date format to make the date clearer, you can not really be sure if it’s April 6 or June 4. The date format will clarify that. Better still is to use the ANSI date literal, DATE 'YYYY-MM-DD' here. For more information, see “Comparing Date and Time Fields to Literal Values” on page 40. Here is the rule re-written to use this syntax:

RULE OrderDateCheck

34 Chapter 4: Validating Date and Time Values

Page 47: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

SELECT "ORDER_NO","ORDER_DATE"

FROM <Use Table in Data Source>

WHERE DATE(ORDER_DATE)

NOT BETWEEN DATE '1990-04-06' AND DATE '1997-01-01'

In this case, you do not want to provide a date on the ORDER_DATE because you want Data Explorer to try to figure out what the format is in each row in order to get the correct results from your comparison. In other words, if you include a date format, in the rows where the date does not match the date format, the date converts to NULL. Even if the date values semantically meet the WHERE clause criteria, these rows are rejected because of the conversion to NULL. Thus, in this example of using a date format tag, fewer rows than expected are returned.

Here is a subset of the input data:

Using the DATE Function 35

Page 48: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

And a subset of the returned rows:

36 Chapter 4: Validating Date and Time Values

Page 49: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using the EXTRACT Function

You may have a situation where the allowable value of one column is determined by part of a date that is in another column. You might need to query against just the year or the month of the date. For examining a distinct part of a date, Data Explorer has the function EXTRACT. The format of EXTRACT function is:

EXTRACT( <part name> FROM <date expression> )

where the valid part names are YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND.

The following is an example of using this function. If the valid range of commission codes is less than 20 for agents hired before the year 2000, between 20 and 29 inclusive for the year 2001, and 30 or greater for years 2002 and higher, you could code this rule:

RULE "CommCode for HireYear"

SELECT "Agent_ID","Hire_Date","Commission_Code"

FROM <Use Table in Data Source>

WHERE (EXTRACT(YEAR FROM Hire_date) < 2000 and

Commission_code > 19) OR

(EXTRACT(YEAR FROM Hire_Date) = 2001 and

Commission_code not between 20 and 29) OR

(EXTRACT(YEAR FROM Hire_Date) > 2001 and

Commission_code < 30)

This will extract the year from the date and, depending on its value, compare the commission code to the appropriate value or range of values.

With the following input data:

Using the EXTRACT Function 37

Page 50: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

These rows are returned:

Data Explorer also includes a DATEPART function that you can use to examine QUARTER, DAY OF YEAR, and FRACTION.

If you are familiar with standard SQL, you will recognize the similarity between the DATEPART function and the EXTRACT function. Compare:

EXTRACT(<part name> from <date expression>)

and

DATEPART(<part name>, <date expression>)

The difference between the two are basically in the allowable values of partname. They both support YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. DATEPART also supports DAY OF YEAR, QUARTER, and fraction up to 6 digits.

EXTRACT has support for TIMEZONE_HOUR and TIMEZONE_MINUTE and the scale of SECOND is implementation-defined. Further note that DATEPART only works with date-time expressions in the <date expression>, but EXTRACT also supports INTERVALS as the <date expression>.

The functions of EXTRACT that exceed the provisions of DATEPART (that is, TIMEZONE_HOUR and TIMEZONE_MINUTE in the part name, and INTERVALS in the date expression) can be passed through to an RDBMS, but may not be used on a flat file with Data Explorer.

It’s worth asking why you didn’t need to use the DATE function to explicitly convert the Hire_Date in this example, when the preceding example involving Hire_Date recommended doing so. The reason is that here, the focus is only on the year portion of the date. The year with four digits is not likely to be confused with any other part of the date, as are the potentially interchangeable day and month.

Of course, no harm would have been done in choosing to formulate the rule to include the DATE function:

WHERE (EXTRACT(YEAR FROM DATE(Hire_Date,'year/mm/dd')) < 2000 and

Commission_code > 19) OR

(EXTRACT(YEAR FROM DATE(Hire_Date,'year/mm/dd'))= 2001 and

Commission_code not between 20 and 29) OR

(EXTRACT(YEAR FROM DATE(Hire_Date,'year/mm/dd'))> 2001 and

38 Chapter 4: Validating Date and Time Values

Page 51: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Commission_code < 30)

The same rows would have been returned.

Using the EXTRACT Function 39

Page 52: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Comparing Date and Time Fields to Literal Values

For a literal string to be compared to a field with the data type of date, time, or timestamp, you must follow the ANSI standard of preceding it with the appropriate identifying keyword: DATE, TIME, or TIMESTAMP.

WHERE Hire_Date > DATE '2001-01-01'

WHERE Entry_Time < TIME '10:00:00'

WHERE Report_Tstamp > TIMESTAMP '2002-04-01 12:00:00.000000'

40 Chapter 4: Validating Date and Time Values

Page 53: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Date Tag Formats

Table 4-1 lists readable date tag formats:

Table 4-1. Date Tag Formats

Rule Date Tag Example Comments

Dates including the name of the month:

date-1.0 Month dd, year February 13, 2001 Month names can be spelled out or abbreviated.

date-1.1 dd Month year 13 FEB 2001 Month names can be spelled out or abbreviated.

date-1.2 dd-Month-year 13/Feb/2001 Month names can be spelled out or abbreviated.

date-1.3 MonthYear February 2001 Month names can be spelled out or abbreviated.

date-1.4 Month-Year FEB-2001 Month names can be spelled out or abbreviated.

date-1.5 Month dd year Feb 13 2001 Month names can be spelled out or abbreviated.

date-1.6 year Month dd 2001 February 13 Month names can be spelled out or abbreviated.

Dates as series of separated numbers:

date-2.0 mm/dd/year 02/13/2001

date-2.1 dd/mm/year 13/02/2001

date-2.2 year/dd/mm 2001/13/02

date-2.3 year/mm/dd 2001/02/13

date-2.4 mm/year 02/2001

date-2.5 year/mm 2001/02

date-2.6 ISO 8601 Timestamp 2001-02-13T14:15:16

date-2.7 DB2 Timestamp 2001-02-13-14.15.16

date-2.8 ODBC Canonical 2001-02-13 14:15:16

Dates including the day name:

date-3.0 DayName, Month dd, year Tue, Feb 13, 2001 Day/month names can be spelled out or abbreviated.

date-3.1 DayName Month dd, year Tue Feb 13, 2001 Day/month names can be spelled out or abbreviated.

date-3.2 DayName, dd Month year Tuesday, 13 Feb 2001 Day/month names can be spelled out or abbreviated.

date-3.3 DayName, dd Month, year Tue, 13 February, 2001 Day/month names can be spelled out or abbreviated.

date-3.4 DayName dd Month year TUE 13 FEB 2001 Day/month names can be spelled out or abbreviated.

Dates composed of blocks of digits:

Date Tag Formats 41

Page 54: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

date-4.0 yymm 0102 Must be uncommented to activate.

date-4.1 mmyy 0201 Must be uncommented to activate.

date-4.2 mmyyyy 022001

date-4.3 yyyymm 200102

date-4.4 mmddyy 021301

date-4.5 ddmmyy 130201

date-4.6 yymmdd 010213

date-4.7 mmddyyyy 02132001

date-4.8 ddmmyyyy 13022001

date-4.9 yyyymmdd 20010213

date-4.10 yddd 1044 Must be uncommented to activate.

date-4.11 yyddd 01044

date-4.12 yyyyddd 2001044

date-4.13 ywwd 1072 Must be uncommented to activate.

date-4.14 yywwd 01072

date-4.15 yyyywwd 2001072

Dates with 7 digits and a century indicator:

date-4.16 cmmddyy 0021301

date-4.17 cddmmyy 0130201

date-4.18 cyymmdd 0010213

date-4.19 mmddyyc 0213010

date-4.20 ddmmyyc 1302010

date-4.21 yymmddc 0102130

date-4.22 xmddyy 021301 or 21301 Must be uncommented to activate.

date-4.23 xmddyyyy 02132001 or 2132001 Must be uncommented to activate.

date-4.24 xdmmyy 010601 or 10601 Must be uncommented to activate.

date-4.25 xdmmyyyy 01062001 or 1062001 Must be uncommented to activate.

Time only:

date-5.0 time 13:14:15.123456 Seconds and fractions may be omitted.

Quarterly dates:

date-6.0 qQyear 1Q2001

Table 4-1. Date Tag Formats

Rule Date Tag Example Comments

42 Chapter 4: Validating Date and Time Values

Page 55: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

date-6.1 yearQq 2001Q1

Week dates (with separator):

date-7.0 year-week-day of week 2001-7-2

date-7.1 yearWweekday 2001W072

date-7.2 year-Wweek-day 2001-W7-2

Table 4-1. Date Tag Formats

Rule Date Tag Example Comments

Date Tag Formats 43

Page 56: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

44 Chapter 4: Validating Date and Time Values

Page 57: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 5

Calculations

This chapter includes the following topic:

♦ Overview, 46

45

Page 58: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

You are not limited to straight comparisons of columns to literals, or of columns to other columns. You can embed calculations in the WHERE clause. Addition, subtraction, multiplication, and division are all allowed.

A very straightforward calculation would involve adding up several columns in a single row to check that their sum matches another column. For example, if you had a record of a client’s total insurance held with the company, you might need to review any rows where the sum of the Life_insurance_Premium, the Auto_insurance_Premium, and the Disability_Premium is not equal to the Total_Premium. This rule:

RULE VerifyTotPrem

SELECT "ClientID","LAST_NAME","LIFE_PREM",

"AUTO_PREM","DISAB_PREM","TOT_PREM"

FROM <Use Table in Data Source>

WHERE LIFE_PREM + AUTO_PREM + DISAB_PREM <> TOT_PREM

Using the following data:

Returns these rows:

For a more complicated example, consider a payroll application where the weekly timesheets coming in contain the hourly rate, the hours worked, and the tips reported by the employee.

46 Chapter 5: Calculations

Page 59: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

The company wants to determine the total gross for the employee and confirm that the effective hourly rate (gross divided by hours worked) is not less than the minimum wage of $5.35. This rule will make the necessary calculation and return those rows where the effective hourly rate is too low:

RULE VerifyReportedTips

SELECT "EMPID","LAST_NAME","HRLY_RATE",

"REPORTED_TIPS","HRS_WORKED"

FROM <Use Table in Data Source>

WHERE (HRS_WORKED * 5.35) > ((HRS_WORKED * HRLY_RATE) + REPORTED_TIPS)

Using the following sample data:

Returns these rows:

Overview 47

Page 60: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

In this example, the company has a business Data Rule that says that the premium a client is paying for disability insurance should be no more than 150% of their life insurance premium, and at least 10% of the life premium:

RULE CheckDisVSLifePrem

SELECT "ClientID","LAST_NAME","LIFE_PREM","DISAB_PREM"

FROM <Use Table in Data Source>

WHERE DISAB_PREM NOT BETWEEN (.10 * LIFE_PREM) AND (1.50 * LIFE_PREM)

Using the following data:

Returns these results:

48 Chapter 5: Calculations

Page 61: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 6

Interactions Between Multiple Fields

This chapter includes the following topic:

♦ Overview, 50

49

Page 62: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

An advantage of Data Rules functionality beyond column profiling is that Data Rules let you discover where the value of one field is not within the allowable values, based on another field in the same row. In this example, a company has a database that tracks loans. They want to show the rows for an auto loan that do not contain “VIN” in collateral, the rows where it is not an auto loan but the collateral does contain “VIN,” and the rows where LOAN_TYPE or COLLATERAL is null. This rule returns the desired rows:

RULE CollateralType

SELECT "Loan_ID","Loan_Type","Collateral"

FROM <Use Table in Data Source>

WHERE (LOAN_TYPE = 'AUTO' and COLLATERAL NOT LIKE '%VIN%')

OR (LOAN_TYPE <> 'AUTO' and COLLATERAL LIKE '%VIN%')

OR (LOAN_TYPE IS NULL) OR (COLLATERAL IS NULL)

Taking the loan data example further, what if the loan details vary according to the loan type? If the loan type is AUTO, then the loan amount must be between $3,000 and $50,000, and the loan term must be between 12 and 60 months. But if the loan type is REAL, then the amount boundaries are $10,000 to $500,000, and the loan term is between 36 and 360 months. This rule returns rows that does not meet the loan criteria:

RULE LoanTypeAmtTerm

SELECT "Loan_ID","Loan_Type","Loan_Amt","Loan_Term"

FROM <Use Table in Data Source>

WHERE (UPPER(LOAN_TYPE) = 'AUTO' and

(LOAN_AMT not between 3000 and 50000 or

LOAN_TERM not between 12 and 60)) or

(UPPER(LOAN_TYPE) = 'REAL' and

(LOAN_AMT not between 10000 and 500000 or

LOAN_TERM not between 36 and 360)) or

LOAN_TYPE is null or LOAN_AMT is null or LOAN_TERM is null

50 Chapter 6: Interactions Between Multiple Fields

Page 63: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using this input data:

Returns these results:

In the following example, a company wants to validate the coincident values of PAYSTATUS and MOSALARY on each row. In this rule, the salary can only be zero if status is I (Inactive) or T (intern); and salary must be positive if status is A (active) or D (disabled). Note that an intern’s salary can be zero or positive. Also, this rule should return rows where the PAYSTATUS is null or an invalid value:

RULE StatusVsSalary

SELECT "EMPID","LAST_NAME","PAYSTATUS","MOSALARY"

FROM <Use Table in Data Source>

WHERE PAYSTATUS IS NULL or

PAYSTATUS NOT IN ('A','D','I','T') or

Overview 51

Page 64: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

(MOSALARY <= 0 and PAYSTATUS IN ('A','D')) or

(MOSALARY > 0 and PAYSTATUS = 'I')

Using the input data overlead:

Returns these rows:

52 Chapter 6: Interactions Between Multiple Fields

Page 65: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

In this example, a company uses an employee’s Salary Class and Salary Code to determine the allowable range that the Monthly Salary can be. The payroll department might have a grid similar to the following that outlines the pay structure:

Although the Data Explorer Data Rules do not support a table lookup function, you can encode the information from that grid into a Data Rule to verify your data:

RULE SalClass&CodevsMOSALARY

SELECT "EMPID","LAST_NAME","SALARYCLASS","SALARYCODE","MOSALARY"

FROM <Use Table in Data Source>

WHERE (salaryclass = 'A' and salarycode = 1

and mosalary not between 2500 and 3000) OR

(salaryclass = 'A' and salarycode = 2

and mosalary not between 3000 and 3500) OR

(salaryclass = 'A' and salarycode = 3

and mosalary not between 3500 and 4200) OR

(salaryclass = 'B' and salarycode = 1

and mosalary not between 1500 and 2500) OR

(salaryclass = 'B' and salarycode = 2

and mosalary not between 2500 and 3750) OR

(salaryclass = 'B' and salarycode = 3

and mosalary not between 3750 and 5000) or

(Salaryclass = 'C' and salarycode = 1

and mosalary not between 4200 and 5400) OR

(salaryclass = 'C' and salarycode = 2

and mosalary not between 5400 and 6250) OR

(salaryclass = 'C' and salarycode = 3

and mosalary not between 6250 and 8500) OR

(salaryclass = 'D' and salarycode = 1

and mosalary not between 3500 and 5000) OR

Salary CodeSalary Class

1 2 3

A 2500-3000 3000-3500 3500-4200

B 1500-2500 2500-3750 3750-5000

C 4200-5400 5400-6250 6250-8500

D 3500-5000 5000-6600 6600-10000

Overview 53

Page 66: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

(salaryclass = 'D' and salarycode = 2

and mosalary not between 5000 and 6600) OR

(salaryclass = 'D' salarycode = 3

and mosalary not between 6600 and 10000) OR

SALARYCLASS is null or SALARYCODE is null

Using this input data:

54 Chapter 6: Interactions Between Multiple Fields

Page 67: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Returns these rows:

In this example, the values of certain columns constrain the allowable values of other columns. Typically, a column of postal code may contain either a Canadian-style postal code or a U.S.-style ZIP code, depending on the country. In a company that mainly does business in Canada, the database might have a great many nulls in the country. But if the province code contains a valid Canadian province, that information should trigger the check to see if the postal code matches the Canadian pattern.

The rule has been further refined to also return rows where both the province and the country are null:

RULE CanadaPostalCode

SELECT "EmployeeID","Province","Country","Postal_Code"

FROM <Use Table in Data Source>

WHERE ((upper(Country) = 'CANADA' or

(Country is null and upper(Province)

in ('AB', 'BC', 'MB', 'NB', 'NF', 'NT', 'NS', 'NU', 'ON',

'PE', 'QC', 'SK', 'YT')

AND (Postal_Code is null or

Postal_code is NOT PATTERN 'U9U 9U9'))

or (Country is null and Province is null)

Overview 55

Page 68: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

This rule returns the following rows:

56 Chapter 6: Interactions Between Multiple Fields

Page 69: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

C h a p t e r 7

Special Topics

This chapter includes the following topics:

♦ NULL Value Considerations, 58

♦ Using the ROWNUM( ) Function, 61

♦ Performance Considerations, 62

♦ SQL Functions, 63

57

Page 70: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

NULL Value Considerations

When you develop rules, it is important to consider the effect that a null value might have on your rule. If a column contains a null value, it means that the real value is not known. Consequently, null values might behave in a comparison in a way that may not seem intuitive. You might not know whether the column’s value is equal to, greater than, less than, IN a list, BETWEEN extremes of a range, or LIKE a particular pattern, so it will not be returned in the set matching that criteria. The same is true for the negative of those conditionals (< >, NOT IN, NOT BETWEEN, etc.). So, when you deal with columns where there is a possibility that they are null, make sure to take that possibility into consideration in your conditionals and use the two special operators IS NULL and IS NOT NULL in your Data Rules.

Example: Impact of NULLSThis example involves a database that supports a payroll system. A rule is needed to highlight any records where the SalaryClass is not valid or the SalaryCode is not valid. SalaryClass is a character column with acceptable values of ‘A’, ‘B’, ‘C’, or ‘D’. SalaryCode is an integer that should range from 1 to 3. To find such records, use a rule like:

WHERE SalaryClass NOT IN ('A','B','C','D') or

SalaryCode NOT BETWEEN 1 and 3

However, either of those columns might contain a null, and we want to see the records where they are null. So, the rule should be expanded to:

WHERE SalaryClass IS NULL or

SalaryClass NOT IN ('A','B','C','D') or

SalaryCode IS NULL or

SalaryCode NOT BETWEEN 1 and 3

58 Chapter 7: Special Topics

Page 71: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

If we use the following data:

and apply the first rule, we get the following results:

NULL Value Considerations 59

Page 72: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

But if we execute the second, expanded rule, where the possibility of nulls is taken into account, we get these results:

Caution on Using NULLSIn the previous example, a possibly null column is compared against literal values. This is natural because the scope of the Data Rule functionality is oriented toward single-row processing, so you are most likely going to create your rules along a model where the targets you are comparing to are literals rather than other columns in the row.

A literal value is by definition not going to be null. But as you create any rules where the comparison targets are other columns, keep in mind that there is an impact if any of those target columns are null.

This gets particularly tricky when you have multiple target columns, such as [NOT] IN and [NOT] BETWEEN. Further, a null value appearing in a calculation will cause the calculation not to work. For more information about nulls and their behavior, see A Guide to the SQL Standard, Fourth Edition by C.J. Date with Hugh Darwen.

60 Chapter 7: Special Topics

Page 73: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Using the ROWNUM( ) Function

The ROWNUM() function returns the number of the current row before it is acted upon by the WHERE clause. Thus, if a WHERE clause throws out a row, that row is still “counted” by ROWNUM(). While you cannot use ROWNUM() to limit the number of rows returned to a specific number, you can use ROWNUM() to limit the rows being looked at in the file to specific ranges in the file.

You can use ROWNUM() to place an upper limit in the file where rows are examined. For example:

WHERE ROWNUM() < 10000

You can use it to force the count of rows to return to start at a certain place in the file. For example:

WHERE ROWNUM() > 5000

You could combine those two conditions together to place a lower and upper limit on the rows in the file to be returned. For example:

WHERE ROWNUM() > 5000 and ROWNUM() < 10000

Or, you could include these conditions in any other WHERE clause to impose a limit on a specific search. For example:

WHERE STATE_CODE = 'AZ' and ROWNUM() > 5000 and ROWNUM() < 10000

One thing to keep in mind when you are dealing with a Select from an RDBMS, particularly when you have a ORDER BY clause in the rule, is that the rows may be returned in a different order and the ROWNUM action will be applied against the rows only after they are returned to Data Explorer.

Note: ORDER BY is not part of Data Explorer Standard SQL. It is therefore only supported to the extent that your target RDBMS supports it.

Using the ROWNUM( ) Function 61

Page 74: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Performance Considerations

Execution against a large data source will, of course, take longer than a small data source.

Note: The progress bar actually can only indicate that work is still in progress. Since there is no way to know in advance how long it will take to satisfy a query, the progress bar does not necessarily indicate approximately how much work has been done or is yet to do.

Having the ability to apply SQL style Data Rules against a flat file is very handy, but it can be especially slow on large files. On an RDBMS data source, you will find that having indexes on the columns included in the WHERE clause will improve your speed.

Care must be taken when using the Data Explorer Standard SQL extensions to the ANSI/92 SQL syntax, such as:

♦ IS [NOT] PATTERN

♦ CHECKDATE

♦ ROWNUM

♦ DATEPART

Since these are not functions that can be directly executed in any database, potentially all rows must be fetched before these rules can be applied. Naturally, that takes longer. You can mitigate that if there are other conditions in your rules. When a rule contains Data Explorer Standard SQL that cannot be passed to the RDBMS, Data Explorer divides the work between the RDBMS and Data Explorer based on what parts the RDBMS can execute.

62 Chapter 7: Special Topics

Page 75: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

SQL Functions

Table 7-1 describes SQL functions for Oracle, ANSI, and flat files:

Table 7-1. SQL Functions for Oracle, ANSI, and Flat Files

Function Flat File ANSI Oracle 7 Oracle 8

position( IN ) position( IN ) position( IN ) instr() instr()

char_length()character_length()

char_length() char_length() length() length()

lower() lower() lower() lower() lower()

trim( LEADING FROM ) trim( ansi ) trim( ansi ) ltrim() trim( ansi )

trim( TRAILING FROM ) trim( ansi ) trim( ansi ) rtrim() trim( ansi )

trim( BOTH FROM ) trim( ansi ) trim( ansi ) rtrim( ltrim() ) trim( ansi )

number() number() cast() to_number() to_number()

|| || || concat() concat()

string() string() cast() to_char() to_char()

substring( FROM FOR ) substring( FROM FOR )

substring( FROM FOR )

substr() substr()

upper() upper() upper() upper() upper()

date() date() cast($1 as datetime)

to_date($1) to_date($1)

<dateConstant> date(‘${ISO 8601 Timestamp} ’$,’ISO 8601 Timestamp’)

timestamp '$ {ODBC Canonical}'

to_date('${ODBC Canonical}','YYYY-MM-DD HH24:MI:SS')

to_date('${ODBC Canonical}','YYYY-MM-DD HH24:MI:SS')

current_timestamp current_timestamp current_timestamp sysdate sysdate

current_time current_time current_time to_date(to_char(sysdate,'HH24:MI:SS'),'HH24:MI:SS')

to_date(to_char(sysdate,'HH24:MI:SS'),'HH24:MI:SS')

current_date current_timestamp current_date to_date(to_char(sysdate,'YYYY-MM-DD'),'YYYY-MM-DD')

to_date(to_char(sysdate,'YYYY-MM-DD'),'YYYY-MM-DD')

extract() extract( ANSI ) extract( ANSI ) extract( ANSI ) extract( ANSI )

SQL Functions 63

Page 76: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Table 7-2 describes SQL functions for Informix, Sybase, Microsoft SQL Server, and UDB:

Table 7-2. SQL Functions for Informix, Sybase, Microsoft SQL Server, UDB

Function Informix SybaseMicrosoft SQL Server

UDB

position( IN ) charindex() charindex() locate()

char_length()character_length()

char_length() char_length() len() length()

lower() lower() lower() lower() lower()

trim( LEADING FROM ) trim( ansi ) ltrim() ltrim() ltrim()

trim( TRAILING FROM ) trim( ansi ) rtrim() rtrim() rtrim()

trim( BOTH FROM ) trim( ansi ) rtrim( ltrim() ) rtrim( ltrim() ) rtrim( ltrim() )

number() cast() convert() convert() double()

|| || + + || or concat()

string() cast() convert() convert() char()

substring( FROM FOR ) substr() substring() substring() substr()

upper() upper() upper() upper() upper()

date() to_date($1) convert(datetime,$1) convert(datetime,$1) date($1)

<dateConstant> DATETIME (${ODBC Canonical}) YEAR TO SECOND

'${Month dd year}' '${Month dd year}' '${DB2 Timestamp}'

current_timestamp current year to fraction(3)

getdate() getdate() current timestamp

current_time current hour to fraction(3)

convert(datetime, convert(char(10),getdate(),8), 8)

convert(datetime, convert(char(10),getdate(),8), 8)

current time

current_date current year to day

convert(datetime, convert(char(10),getdate(), 102), 102)

convert(datetime, convert(char(10),getdate(), 102), 102)

current date

extract() day( <date> ) month( <date> ) year( <date> )

datepart( <part>, <date> )

datepart( <part>, <date> )

second( <exp> )minute( <exp> )hour( <exp> ) day( <exp> )month( <exp> )year( <exp> )

64 Chapter 7: Special Topics

Page 77: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

A p p e n d i x A

Exploring Data with Data Explorer

This appendix includes the following topics:

♦ Overview, 66

♦ Working with Data Rules in the Tag Viewer, 67

♦ Creating a Data Rule, 70

♦ Specifying Data Source and Sample Data Options for Data Rule Execution, 75

♦ Executing Data Rules, 77

♦ Editing a Data Rule, 82

65

Page 78: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

This appendix describes how to use the Data Explorer interface to create and execute your Data Rules.

Data Rules were designed to run against any data source that can be accessed with a Data Explorer importer. To use a data rule, specify the data source and the Sample Data options when you execute the rule.

All Data Explorer objects may have rules associated with them. However, only rules against Tables, Columns, Keys, Indexes, and Sample Data may be executed.

Note: You cannot edit a Table WHERE clause when creating a rule.

66 Appendix A: Exploring Data with Data Explorer

Page 79: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Working with Data Rules in the Tag Viewer

You can view, edit, email, delete, and execute data rules from the tag viewer. All of the tags for the object you most recently selected in a workspace viewer or in the Navigation tree appear in the tag viewer. Each icon at the top of the viewer represents a different kind of tag.

To see the rules for an object in the tag viewer, select the object in the Navigation tree or in a workspace viewer. To see only rules and no other tags, select the Rule icon.

Figure A-1 shows the Rule icon:

To view or edit the details of a specific rule, double-click it in the tag viewer.

Figure A-2 displays the Rule dialog box that appears:

The Rule dialog box includes basic information about the rule, a Text box, and a WHERE box.

Figure A-1. Rule Icon

Figure A-2. Rule Dialog Box

Rules Line

Working with Data Rules in the Tag Viewer 67

Page 80: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Table A-1 describes the Rule dialog box fields:

The Text box is a free-form text area where you can add comments for the selected rule.

Note: When a rule is made from a Drill Down operation, this text box states: “Rule generated from Drill Down on YYYY/MM/DD HH:MM:SS”.

The WHERE box contains an optional WHERE clause that identifies the search criteria for the rule. For more information about using WHERE clauses in rules, see “WHERE Clauses” on page 83.

Note: For this release, there are some objects (such as Schemas, Domains, and Foreign Keys) for which you can create but not execute data rules. For these objects, do not enter anything in the WHERE box.

Table A-2 describes the buttons in the Rule dialog box:

Table A-1. Rule Dialog Box Fields

Field Description

Name The name of the rule tag. Data Explorer uses this as the name of the rule.

Author The name of the user who created the rule. The author is chosen from a list specified by your Data Explorer administrator.

Sponsor The name of the person who initiated the rule or is in charge of the project.

Type The type of rule, selected from a list specified by your Data Explorer administrator.

Rule The name for the rule to be included in the rule syntax. Data Explorer requires that this be the same as the name of the rule tag in the Name field. You cannot edit the Rule field.

Select The column or columns against which the rule is to be run.

From The table in the data source against which the rule is to be run. In Data Explorer, the data source is specific to a single table and the FROM statement always reads <Use Table in Data Source>.

Table A-2. Rule Dialog Box Buttons

Button Name Description

Insert Column... Inserts a column name you select into the Text or WHERE box. For more information, see “Creating a Data Rule” on page 70.

Insert Date Inserts today’s date into the Text or WHERE box. For more information, see “Creating a Data Rule” on page 70.

Execute... Executes the displayed rule. The Execute Rule dialog box appears and you have the opportunity to change the data source and Sample Data options, but not to select other rules. For more information about executing data rules, see “Executing Data Rules” on page 77.Note: All Data Explorer objects may have rules associated with them, however only rules against Tables, Columns, Keys, Indexes, and Sample Data may be executed.

OK Accepts the changes you have made to the rule and close the dialog box.

Cancel Closes the dialog box and discard any changes you have made.

68 Appendix A: Exploring Data with Data Explorer

Page 81: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Context Menu Options for a Rule in the Tag ViewerFrom the context menu for a rule in the tag viewer, you can select several actions.

Table A-3 describes the Tag Viewer context menu options:

Table A-3. Tag Viewer Context Menu Options

Option Description

Email Sends a copy of the rule by email.

Edit Displays the Rule dialog to edit the rule.

Delete Deletes the rule. You can also delete the rule by selecting it and pressing the Delete key.

Set Table Data Source Sets the data source for the parent table of the object whose rules are displayed in the tag viewer. Use this option to choose the data source before choosing the Execute command.

Execute Executes the rule using the data source currently selected for the parent table of the object with which the rule is associated. To change the data source, choose Set Table Data Source before using the Execute command.

Tags Creates a new tag (which may be a new rule or another type of tag) for the object whose tags are displayed in the tag viewer.

Working with Data Rules in the Tag Viewer 69

Page 82: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Creating a Data Rule

You can create a data rule from the following locations:

♦ Context menu for an item in the Navigation tree.

♦ Context menu for a selected item in a workspace viewer.

♦ Context menu in the tags viewer.

♦ Drill Down dialog.

To create a data rule from a context menu:

1. Select the object in the Navigation tree, right-click, and choose Tags > New Rule from the context menu.

-or-

Select the object in a workspace viewer, right-click, and choose Tags > New Rule from the context menu.

-or-

Display the tag viewer for the object, right-click, and choose Tags > New Rule from the context menu.

Note: All Data Explorer objects may have rules associated with them, however only rules against Tables, Columns, Keys, Indexes, and Sample Data may be executed.

The New Rule dialog box appears.

70 Appendix A: Exploring Data with Data Explorer

Page 83: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

By default, the Select field contains a * and the From field contains <Use Table in Data Source>.

Note: You cannot change the data source for the rule in the New Rule dialog box. If necessary, you can change the data source when you execute the rule.

2. Enter a name for the rule in the Name field.

If you do not specify a name, Data Explorer generates a name for the rule using the ID of the object to which the rule is attached.

3. Enter the appropriate information into the Author, Sponsor, and Type fields:

The Rule field contains the name for the rule which will be included in the rule syntax. Data Explorer requires that this be the same as the name of the rule tag in the Name field. You cannot edit the Rule field.

4. To select specific columns to display in the rule result set, instead of using the default of all (*) columns in the Table, click the ... button next to the Select field.

The Select dialog box appears.

By default, all columns are selected. To use a subset of the columns listed, select the columns against which the rule is to apply, then click OK.

5. In the New Rule dialog box, click in the WHERE clause text box.

The Insert Column... and Insert Date buttons become active.

Option Description

Author Select your name from the drop-down list. This list is created by your Data Explorer administrator.

Sponsor Select the name of the person who initiated the rule or is in charge of the project.

Type Select the appropriate rule type from the drop-down list.

Creating a Data Rule 71

Page 84: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

For this release, there are some objects, such as Schemas, Domains, and Foreign Keys, for which you can create but not execute data rules. For these objects, do not enter anything in the WHERE box.

6. If you want to save typing or do not remember the column name, you can select the column from a list of available columns for the Table. To select from a list of columns, click Insert Column.... The Select Column dialog box appears.

7. Select the column from the list, and then click OK.

The New Rule dialog box reappears.

To insert the current date in ODBC canonical format into the WHERE clause, click Insert Date.

8. Complete the WHERE clause by typing it into the WHERE text box.

If you formulate your own WHERE clause by prefixing your WHERE clause with the RDBMS tag, you must ensure that the clause matches the format required for the selected data source.

Data Explorer does not check the format of user-formulated WHERE clauses.

Note: Data Explorer does not limit the size of the WHERE clause, but some data sources have query size limits, so check the documentation for your data source.

9. Click in the Text text box.

The Insert Column... button and the Insert Date button become active.

♦ Enter comments about the rule.

♦ To insert the current date and time into the Text box, click Insert Date.

♦ To insert a Column into the Text box, follow the instructions in steps 4 to 9 of this procedure.

10. Click OK.

To create a rule using Drill Down:

1. Right-click the object for which you want to create a rule, and then choose Drill Down.

Drill Down is available from the context menus for the following:

♦ Columns in a table viewer in the workspace.

♦ Columns in the Navigation tree.

♦ Value frequencies in the properties viewer.

♦ Patterns in the properties viewer.

♦ Data types in the properties viewer.

♦ Rows in a Sample Data workspace viewer.

♦ Columns in a parents without children viewer.

♦ Columns in a parents without children viewer.

72 Appendix A: Exploring Data with Data Explorer

Page 85: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

♦ Columns in a conflicting rows viewer (for validate key or validate alternate key results).

♦ Columns in an overlapping rows viewer (for validate vertical merge results).

The Drill Down dialog box appears.

2. Use the Drill Down dialog box to create the search criteria.

You can use one of these drill-down criteria:

Criteria Description

Equal or Not Equal Use to search for rows where the column on which you drilled down has a value equal or not equal to the value you enter in the text box.

Is Null or Is Not Null Use to search for rows where the column value is NULL or is not NULL.

Between or Not Between Use to search for rows where the column value is between or not between the two values you enter in the text boxes.

Greater Than or Less Than Use to search for rows where the column value is greater or less than the value you enter in the text box.

Is Pattern or Is Not Pattern Use to search for rows where the column value matches or does not match the pattern you select from the drop-down menu or enter in the text box.

Creating a Data Rule 73

Page 86: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Select an option in the Data Type from Data Source area to determine the data type for which the rule will search. You may choose from the following:

The data type determines the syntax used in the WHERE clause. For example, when testing for equality, the value is turned into a quoted literal for strings, left as a number for numeric data types, and converted to a date literal for dates.

3. Click the WHERE clause tab and verify that the WHERE clause is what you intended.

When Data Explorer generates the WHERE clause, it does so based on the data source selected, the Drill Down WHERE Clause template, and the information specified in the Drill Down dialog box. For more information, see the “WHERE Clauses” on page 83.

4. Click Make Rule.

When you click Make Rule, the Make Rule button becomes grayed out, and a Rule tag with the associated WHERE clause is created on the column associated with the object for which the Drill Down was initiated.

The name of the Rule is set to the text in the WHERE clause, and a comment is added to the text section stating that the rule was create from Drill Down.

5. Either execute the Drill Down, or Cancel.

In either case, the Data Rule has been saved on the column.

In Value Set or Not In Value Set

Use to search for rows where the column value is in or is not in the value set for the domain.

By Data Type Use to search for rows by data type.To search for rows where the column value does not have a particular data type, check the Is Not box and select the data type from the drop-down menu or type it in the text box.The data types in the drop-down list are the inferred and documented data types for the column.To search for rows where the column value has a particular data type, check the And Is box and select the data type from the drop-down menu or type it in the text box.You may use either option or both.

Option Description

String String data types such as char or varchar

Number Numeric data types such as number, decimal, or integer

Date Date data types such as date, time, or timestamp

Criteria Description

74 Appendix A: Exploring Data with Data Explorer

Page 87: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Specifying Data Source and Sample Data Options for Data Rule Execution

When you execute a set of rules, you can specify the data source from the Data Source tab in the Execute Rule dialog. You can select options for the Sample Data against which the rules will be executed from the Sample Data Options tab. Because these options are the same no matter how you select the rules to execute, they are described separately in this section.

Sample Data OptionsThe Sample Data Options tab of the Execute Rule dialog box allows you to specify how many rows the system should return and how to sample those rows from the table.

Table A-4 describes the fields of the Select Sample Data Parameters region of the Sample Data Options tab.

Table A-4. Select Sample Data Parameters Fields

Field Description

Maximum Rows The maximum number of rows to load from the search results. This field defaults to 100 for rule execution. The default is 1000 for loading sample data. The maximum value for this field is 100,000. To load more than 100,000, do multiple samples or increase the Number of Samples to Load from 1 to load multiple samples.

Start At Row The row number from the search results set at which to start loading the data. This field defaults to 1.

Load Rows This option is selected by default. It loads consecutive rows from the search results until the maximum is reached or until all rows in the search results have been loaded.

Load Every Nth Row Select this option to load rows from the search results sequentially, starting with the specified start row and skipping N-1 rows after each row loaded. Rows are loaded until the end of the table or the maximum is reached, whichever occurs first.

Load Randomly N% Select this option to load rows randomly. Rows from the search results are examined in sequence, starting with the specified start row, and selected for loading with a random probability. Each row has an N % chance of being loaded. Rows are loaded until the end of the table or the maximum is reached, whichever comes first.Note: The set of rows that are returned when using the Load Randomly N% is different from one search to the next if all other things are equal. That is, the random number sequence is different for each Drill Down.If you want the random sequence to be the same from for every search, then in the [Environment] section of the importer configuration file, set RANDOM_SEED=nnn, where nnn is a number between 1 and 2147483647. If nnn is 0, the default random behavior is used.

Specifying Data Source and Sample Data Options for Data Rule Execution 75

Page 88: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Search LimitsThe Search Limits region of the Sample Data Options tab allows you to control the number of rows read by Data Explorer when computing the number of rows that satisfy the data rule criteria. The default is to search all rows of the specified data source to obtain an exact count of the number of rows that satisfy the search criteria. The number of rows actually loaded into Data Explorer is controlled by the Sample Data Options region.

Table A-5 describes the fields of the Search Limits region:

N= The variable to be used when the Load Every Nth Row or Load Randomly N % option is selected. You can enter percentages to a precision of three decimal places (useful for very large datasets). The smallest percentage Data Explorer will accept is 0.001.

Number of Samples to Load Select this option to load several sets at the same time. The samples are loaded sequentially. Each sample begins loading at the row immediately following the last row imported for the previous sample.

Table A-5. Search Limits Fields

Field Description

Search All Rows This selection instructs the Data Explorer importer to count all of the rows that satisfy the search criteria. If it is not checked, the Data Explorer importer stops counting the number of rows after the Limit Rows to Search value is reached.

Limit Rows to Search This selection instructs the Data Explorer importer to return rows until it searches the number of rows specified. The Rows to Search field defaults to 10000.Note: If Search All Rows is not selected, the importer will fetch and count each row up to the limit. Depending on the number of rows specified, this may be slower than selecting Search All Rows.Note: When using Data Explorer Standard SQL extensions, the importer will be forced to examine all rows when executing the query. You may choose to set the Limit Rows to Search if you expect a large number of rows to be returned. If you expect few or no rows are to be returned, changing this selection has no effect.For more information, see “WHERE Clauses” on page 83.

Table A-4. Select Sample Data Parameters Fields

Field Description

76 Appendix A: Exploring Data with Data Explorer

Page 89: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Executing Data Rules

From the Analysis panel, you can select a set of rules to execute. Because rules must run against a table’s data source, the set of rules you select can include only rules associated with the same table.

You can use one of the following methods to specify a set of rules:

♦ Choosing them individually.

♦ Selecting all rules associated with an object.

♦ Using all rules which meet search criteria you specify.

Executing all rules for an object can be useful when the set of rules for the object changes over time, because you can execute the rule set without remembering what the individual rules are.

Searching for a set of rules allows you to find and execute similar rules or logically related groups of rules.

When you select individual rules for execution, you may choose them from a list of rules, search for rules and then select from the results, or both. You select from the rules associated with an object or its child objects, such as a table or its columns.

To execute a set of rules you select:

1. Select the Analysis tab and expand the Data Rule Management category.

2. Click Execute Rule.

The Execute Rule dialog appears.

3. Select the General tab, then select the Select rule set button.

4. Select the Navigation tab.

Executing Data Rules 77

Page 90: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

5. Select the object against which you want to execute rules.

The rules associated with that object appear in the Rule Name list.

From the list, you can select rules you want to execute.

6. Click Add to Rule List to add the selected rules to the execution list.

7. Select the Search tab.

If necessary, select the schema and table to search for rules.

Note: If you selected rules from the Navigation tab and you choose a different table in the Search tab, the rules for the original table will be removed from the list.

Enter as many of these search options as you wish:

8. Click the Search button to search for rules using the specified options.

9. From the search results, you can select rules you want to execute. Click Add to Rule List to add the selected rules to the execution list.

10. To specify options for the Sample Data against which the rules will be executed, select the Sample Data Options tab.

For information about the Sample Data options, see “Sample Data Options” on page 75.

11. To change the data source, select the Data Source tab and specify the new data source.

12. Click Execute.

To execute all rules for a selected object:

1. Select the Analysis tab and expand the Data Rule Management category.

2. Click Execute Rule.

The Execute Rule dialog appears.

3. Select the General tab, and then select the Select by object button.

Option Description

Rule Text to search for in the rule name.

Type Rule type.

Author The user who created the rule.

Sponsor The name of the person who initiated the rule or is in charge of the project.

Text Text to search for in the rule’s text description.

Logic Text to search for in the rule’s logic (its WHERE clause).

78 Appendix A: Exploring Data with Data Explorer

Page 91: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

4. Select the Navigation tab.

5. In the tree in the Navigation tab, select the object against which you want to execute rules.

The rules for that object appear in the list for your reference.

If you select a different object, the rules for the new object replaces the original list. The set of rules which will be executed is not determined until you click Execute.

You cannot select individual rules using this option. If you want to select from the search results, see “To execute a set of rules you select:” on page 77.

6. To specify options for the Sample Data against which the rules will be executed, select the Sample Data Options tab. For information about the Sample Data options, see “Sample Data Options” on page 75.

7. To change the data source, select the Data Source tab and specify the new data source.

8. Click Execute. Data Explorer executes the rules for the currently selected object only.

To search for and execute a set of rules:

1. Select the Analysis tab and expand the Data Rule Management category.

2. Click Execute Rule. The Execute Rule dialog appears.

3. Select the General tab. Select the Select by search button.

Executing Data Rules 79

Page 92: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

4. Select the Search tab.

5. Select the schema and table to search for rules.

Enter as many of these search options as you wish:

Click the Search button at any time to see the results of the search you have specified. The set of rules which will be executed is not determined until you click Execute.

You cannot select individual rules using this option. If you want to select from the search results, see the procedure on page 77.

6. To specify options for the Sample Data against which the rules will be executed, select the Sample Data Options tab. For information about the Sample Data options, see “Sample Data Options” on page 75.

7. To change the data source, select the Data Source tab and specify the new data source.

Option Description

Rule Rule name

Type Rule type

Author The user who created the rule

Sponsor The name of the person who initiated the rule or is in charge of the project

Text Text to search for in the rule’s text description

Logic Text to search for in the rule’s logic (its WHERE clause)

80 Appendix A: Exploring Data with Data Explorer

Page 93: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

8. Click Execute. Data Explorer searches for rules using the options which were specified when you executed the search, then executes all the rules found.

Results of Data Rule ExecutionAfter data rule execution completes, any rows that violate the rule are displayed in a Sample Data viewer in the workspace. The properties viewer shows the number of rows in the sample, the number of rows selected by the query, and the number of rows in the table. If you limited the number of rows searched, the Query Count property reads “At least x rows,” since the exact count of rows is unknown without reading all rows of the data source.

Note: If no rows satisfy the search criteria, a message appears indicating “No records matched the Rule criteria.” If you have questions about a failed search, open the log reports viewer by double-clicking the Reports folder for the project in the Navigation tree. Find the search in the Reports and Logs List.

When you close the rule viewer, Data Explorer asks if you want to retain the results. Click Yes to save the results as a Sample Data set.

Note: Unless you click the Retain Sample button, the data set is discarded when you close the viewer. If you have questions about a discarded search, select Reports » List and find the search in the Reports and Logs List.

The rule execution results data sets appear in the Navigation tree within the Sample Data folder for the table. The results are identified by the rule name, for example, Rule (ValidateHireDate). Each rule data set has a note tag associated with it that contains the text of the rule used to generate it and the source of the data. If you executed multiple rules, each rule has a separate data set.

To display the rule viewer again, double-click the icon of the rule data set which you want to see.

In most ways, this data can be treated like any other Sample Data set. You may perform further exploration of the data and perform the following tasks:

♦ View column profiling for the resultant data.

♦ Perform table structural profiling on the resultant data.

♦ Add tags to the resultant Sample Data describing interesting findings.

Executing Data Rules 81

Page 94: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Editing a Data Rule

The options used to edit a rule are the same as those used to create a rule. For more information on creating a data rule, see “Creating a Data Rule” on page 70.

To edit a rule:

1. Open the tag viewer for the object whose rule you want to edit.

2. Double-click the rule.

The Rule dialog box appears.

3. Update the rule in the same way that you create a rule.

82 Appendix A: Exploring Data with Data Explorer

Page 95: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

A p p e n d i x B

WHERE Clauses

This appendix includes the following topics:

♦ Overview, 84

♦ RDBMS WHERE Clauses, 85

♦ Data Explorer Standard SQL Syntax WHERE Clauses, 86

♦ Data Explorer Standard SQL, 92

♦ Drill Down WHERE Clause Template, 95

83

Page 96: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Overview

In several instances within the Data Explorer, you have the option of using an SQL WHERE clause to define a subset of data. There are two classes of WHERE clauses within Data Explorer:

♦ RDBMS-specific. Can only be executed on the RDBMS for which the query was designed. RDBMS-specific WHERE clauses contain at least one construct that does not conform to Data Explorer Standard SQL syntax.

♦ Data Explorer Standard SQL. Can be executed by all Data Explorer importers. For this class of WHERE clauses, use Data Explorer Standard SQL syntax to construct the clauses.

84 Appendix B: WHERE Clauses

Page 97: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

RDBMS WHERE Clauses

The available operators depend on the SQL RDBMS from which you are importing. Data Explorer does not validate the WHERE clause, instead passing it to the RDBMS.

Any table-level WHERE clause is valid, including sub-queries, but joins are not. For more information about the structure of a WHERE clause, see your RDBMS SQL documentation.

Sample RDBMS WHERE ClausesThis WHERE clause describes a subset that includes only salaries that are greater than 2000:

sal > 2000

This WHERE clause describes a subset that includes only employees whose department numbers are 20 and whose names are unknown:

deptno = 20 and ename is NULL

This WHERE clause describes a subset that includes all employees whose job title is either manager or salesman:

job = 'manager' or job = 'salesman'

This WHERE clause uses the like predicate to describe a subset whose names are Smith or Smythe:

ename like ‘Sm%th%‘

This WHERE clause selects employees who are managers:

empid in (select mgrid from employee)

RDBMS WHERE Clauses 85

Page 98: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Data Explorer Standard SQL Syntax WHERE Clauses

All Data Explorer importers accept Data Explorer Standard SQL syntax for importing sample data, drilling down on data, and executing Data Rules. However, the flat file importer and DB2 Load Utility Format importer require this syntax, since these importers are responsible for processing all of the queries directed at them.

The advantage of using the Data Explorer Standard SQL syntax in other situations is that WHERE clauses developed for one data source can be redirected and executed by another data source.

Data Explorer supports a WHERE clause syntax that is based on ANSI SQL.

Note: Operators are not case sensitive.

Table B-1 describes the Data Explorer Standard SQL WHERE clause syntax:

Table B-1. Data Explorer Standard SQL WHERE Clause Syntax

Element DefinitionANSI/92 Compliant

Column Name Column names are either literals, if the name consists of nothing but letters, numbers, and _ (underscore), or a double quoted string. For example: “Emp #”. If the column name is also a reserved word, it must be quoted using double quotes. Note: Column names are not case-sensitive

Yes

String Constant String constants are singly quoted strings. For example: ‘this is a string’.

Yes

Numeric Constant Number constants can be either an integer, a floating point, or a floating point in scientific notation.

Yes

Timestamp Constant Timestamp constants are literals with the format of 'YYYY-MM-DD HH:MM:SS[.FFFFFF]' that are preceded by the TIMESTAMP keyword. For example: TIMESTAMP '2002-11-03 12:33:52.341'.

Yes

Date Constant Date constants are literals with format of ‘YYYY-MM-DD' that are preceded by the DATE keyword. For example: DATE '2002-11-03'.

Yes

Time Constant Time constants are literals with the format of 'HH:MM:SS[.FFFFFF]' that are preceded by the TIME keyword. For example: TIME '12:33:52.341'.

Yes

IS [NOT] NULL Conditional that tests for NULL or the missing value. Yes

AND Logical AND. Yes

OR Logical OR. Yes

NOT Negation. Yes

= Equality. Yes

86 Appendix B: WHERE Clauses

Page 99: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

<> Inequality. Yes

>= Greater than or equal. Yes

> Greater than. Yes

<= Less than or equal. Yes

< Less than. Yes

x [NOT] LIKE y [ESCAPE c] String x is [not] like string y. ‘_’ matches exactly one character; ‘%’ matches zero or more characters. c provides an escape character definition to allow checking for occurrences of the characters ‘_’ and ‘%’ in y.

Yes

x IS [NOT] PATTERN y [ESCAPE c] String x matches [does not match] the specific pattern y. c provides an escape character definition to allow checking for occurrences of the pattern characters in y.For more information about pattern coding, see “IS PATTERN and IS NOT PATTERN” on page 90.

No

x IS [NOT] DATA_TYPE y String x is consistent [is not consistent] with the data type specified by string y.

No

x [NOT] BETWEEN y AND z Inclusive in [not in] range. Yes

x [NOT] IN (list) x is included [not included] in a list of values. Yes

+ Addition. Yes

- Subtraction. Yes

* Multiplication. Yes

/ Division. Yes

( ) Parentheses group expressions. Yes

- expression Unary minus. Yes

+ expression Unary plus. Yes

CHAR[ACTER]_LENGTH( s ) The length of strings. Yes

LOWER( s ) Converts to lowercase strings. Yes

UPPER( s ) Converts to uppercase strings. Yes

NUMBER( s ) Converts the string s to a number. No

ROWNUM( ) In general, returns the current row in the table before the current WHERE clause is acted on. However, if the query contains elements that are executed in the RDBMS and in the importer, the behavior is slightly different.

No

STRING( s ) Converts the number or date to a string. No

Table B-1. Data Explorer Standard SQL WHERE Clause Syntax

Element DefinitionANSI/92 Compliant

Data Explorer Standard SQL Syntax WHERE Clauses 87

Page 100: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

s1 || s2 Concatenates strings s1 and s2. Yes

SUBSTRING( s FROM n [ FOR m] ) Returns the substring of length m in s starting with the nth character. If m is not given, the string starting at the nth character to the end is returned.

Yes

LTRIM ( s ) Remove leading blanks from string s. This is equivalent to TRIM( LEADING FROM s ).

No

RTRIM( s ) Removes trailing blanks from string s. This is equivalent to TRIM( TRAILING FROM s ).

No

TRIM( w p FROM s ) Removes leading and/or trailing padding characters from string s. p is the character to trim. The default is ' ' (blank). Valid values for w are: LEADING, TRAILING, BOTH. This is equivalent to the ANSI SQL TRIM(<part> ['trimChar'] FROM 'string').

Yes

POSITION( y IN x ) If y is of length 0, the result is 1.Otherwise, if y occurs as a substring in x, the result is the start position of the first occurrence of y in x (positions are numbered starting at 1).Otherwise, the result is 0.

Yes

DATE( d [ ,s ] ) Converts string d to a date value based on the format described in the Data Explorer date tags.

No

CHECKDATE( d [ , s ] ) Verify that string d is a valid date based on the format described in the Data Explorer date tags.The values returned are:0 = Valid date/time.1 = Date/time is NULL.2 = Date/time is not a valid format for the Data Explorer date tag or is not a legal date/time and is not NULL.

No

DATEPART( p , d ) Extracts an individual datetime field (p) value from a date value (d). Valid values for p are: YEAR, QUARTER, MONTH, DAYOFYEAR, DAY, HOUR, MINUTE, SECOND, FRACTION.

No

EXTRACT( p FROM d ) Extracts an individual datetime field (p) value from a date value (d). Valid values for p are: YEAR, MONTH, DAY, HOUR, MINUTE, SECOND.

Yes

CURRENT_DATE Returns the current date. The current date is computed once at the beginning of the query and is used throughout of the duration of the query, however long.

Yes

Table B-1. Data Explorer Standard SQL WHERE Clause Syntax

Element DefinitionANSI/92 Compliant

88 Appendix B: WHERE Clauses

Page 101: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

About NULLs NULL is a missing or unknown value. As such, NULL is represented outside of the domain for the column. It is not one of the valid values in the domain, such as 0 or 99/99/99 or “ ” (empty string).

Nulls in an expression evaluate to “UNKNOWN”. If not tested for explicitly, rows that evaluate to UNKNOWN are not selected by the query.

NULL is not a literal, but a key word that is used in the IS [NOT] NULL conditional. Therefore, the proper way to test for NULL is <column> IS [NOT] NULL. Misformed queries such as, <column> = NULL, are not legal.

Operator Order of PrecedenceWhere parentheses do not explicitly indicate the grouping of operands with operators, the operands are grouped with the operator with the highest precedence. If two operators have the same precedence, the operand is grouped with the left or the right operator according to whether the operators are left-associative or right-associative. All operators with the same precedence level also have the same associative properties.

Table B-2 defines the operator order of precedence (highest to lowest) for the WHERE clause syntax.

CURRENT_TIME Returns the current time. The current time is computed once at the beginning of the query and is used throughout of the duration of the query, however long.

Yes

CURRENT_TIMESTAMP Returns the current date and time. The current timestamp is computed once at the beginning of the query and is used throughout of the duration of the query, however long.

Yes

Table B-2. Operator Order Precedence

Token Operator Associates

Column Names N/A

Constants

Built-In Functions function call N/A

+ - unary plus/minus right

/ * multiplicative left

+ - additive left

|| string catenation

Table B-1. Data Explorer Standard SQL WHERE Clause Syntax

Element DefinitionANSI/92 Compliant

Data Explorer Standard SQL Syntax WHERE Clauses 89

Page 102: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

IS PATTERN and IS NOT PATTERNPATTERN is a Data Explorer extension to ANSI SQL that allows the user to test for different kinds of character patterns in data. The syntax looks like:

WHERE SSN IS PATTERN ‘999-99-9999’

Table B-3 describes characters with special meanings in patterns:

A pattern to match U.S. phone numbers would be:

WHERE PhoneNum IS PATTERN ‘P999Qb999-9999’

[NOT] BETWEEN between predicate left

[NOT] LIKE like predicate N/A

IS [NOT] PATTERN pattern predicate N/A

IS [NOT] DATA_TYPE data type predicate N/A

[NOT] IN in predicate N/A

< > <= >= relational left

= <> equality/inequality left

NOT logical NOT right

AND logical AND left

OR logical OR left

Table B-3. Characters in Pattern

Character Matches in Pattern

‘B’ or ‘b’ or ‘ ’ Blank.

‘C’ or ‘c’ Any character.

‘L’ or ‘l’ Any lowercase alphabetic character.

‘P’ or ‘p’ Left parenthesis: ‘(‘.

‘Q’ or ‘q’ Right parenthesis: ‘)’.

‘T’ or ‘t’ Tab.

‘U’ or ‘u’ Any uppercase alphabetic character.

‘X’ or ‘x’ Any upper- or lower-case alphabetic character.

‘9’ Any numeric character.

‘(’ and ‘)’ Used to indicate repetition.

Any other character Itself.

Table B-2. Operator Order Precedence

Token Operator Associates

90 Appendix B: WHERE Clauses

Page 103: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

The escape character allows you to check for actual occurrences of the special characters defined in the above table. For example, to check for department identifiers of the form 2 digits followed by 3 uppercase alphabetic characters and that start with ‘79’, you could code a pattern like:

WHERE DeptNum IS PATTERN ‘7#9UUU’ ESCAPE ‘#’

The escape is necessary because if you had coded ‘79UUU’ the 9 in the pattern would have matched any digit, not just ‘9’. ‘9’ is the generic pattern character for a digit.

A slightly more interesting example of using the escape character in a pattern clause is the following:

WHERE PhoneNum IS PATTERN ‘\(999\)B999-9999’ ESCAPE ‘\’

Note that this is equivalent to the phone number example above using ‘P’ and ‘Q’.

The repetition operation improves readability by allowing you to specify multiple consecutive occurrences of the same character (or special character above) in the string. For example, to check for a string of 10 uppercase letters, you could code this pattern clause:

WHERE MyCharField IS PATTERN ‘U(10)’

Here we have used the repetition value 10 instead of coding the equivalent pattern ‘UUUUUUUUUU’.

Sample Data Explorer Standard SQL WHERE ClausesThe examples in this section are based on the following table:

Employee( EmpID, LastName, FirstName, Dept#, MgrID, Salary)

Select the first fifty employees to join the company:

EmpID <= 50

Select all employees who work in accounting (dept 109) that make between 50K and 100K:

“Dept#” = 109 and Salary BETWEEN 50000 and 100000

Select everyone whose first name starts with L:

UPPER( FirstName ) LIKE ‘L%’

Select all rows whose Salary field contains non-numeric data or NULLs. This query takes advantage of the fact that the “Number()” function returns NULL if the argument cannot be converted to a number:

Number( Salary ) IS NULL

This is a slightly different query to select only those rows whose Salary field contains non-numeric data:

Number( Salary ) IS NULL AND Salary IS NOT NULL

Data Explorer Standard SQL Syntax WHERE Clauses 91

Page 104: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Data Explorer Standard SQL

The Data Explorer Standard SQL as described in the previous section is based on the ANSI SQL standard. This enables the user to write a Rule or Drill Down query without having to worry about the specifics of the RDBMS SQL syntax. This is useful in situations where the final RDBMS is not known or when the RDBMS may change as the data is moved from one staging area to another. Data Explorer automatically translates a query written with Data Explorer Standard SQL to SQL that can be executed by the specific RDBMS running the rule.

The Data Explorer Standard SQL implements extensions to ANSI SQL which have no direct translation to any RDBMS SQL. Data Explorer automatically handles this case by splitting the query into two components.

The first component contains portions of the query that the RDBMS understands and Data Explorer uses the power of the RDBMS to select rows matching this portion. The extension containing portions of the query are executed directly by the importer on these rows to produce the final result set. This splitting is done transparently by Data Explorer without any intervention by the user.

The user may also specify Rules and Drill Down queries that contain specific RDBMS syntax. These non-Data Explorer conforming queries are passed untranslated to the RDBMS for execution. The query may not contain Data Explorer SQL extensions as the RDBMS does not know how to process them, resulting in errors being returned. Avoid queries that do not conform to Data Explorer Standard SQL as they may not be supported in future releases.

Translation TemplatesData Explorer customizes the SQL to native RDBMS syntax through a set of translation templates. By default, the translation for any SQL syntax element is the Data Explorer Standard SQL syntax.

To customize the translation to a specific RDBMS, the user must define configuration variables in the appropriate importer configuration file for each non-matching syntax element. At a minimum, the following data type conversion configuration file variables should be define since they are Data Explorer extensions to the ANSI SQL standard:

♦ DATE()

♦ NUMBER()

♦ STRING()

In general, most importers have these variables pre-defined in the delivered .ini files and there is no work required by the user. However, the ODBC importer, with its ability to import and Drill Down to any source, may require configuration. The default installation sets up the ODBC importer for Microsoft SQL Server.

92 Appendix B: WHERE Clauses

Page 105: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

The ideodbc.ini file contains defined sections for all major databases. Use the following steps to select the appropriate database for your data source:

1. Bring up in an editor the ideodbc.ini configuration file and find the [ideStandardSQL] section.

2. Add to this section the name of the data source you wish to associate with a named set of translation templates.

The format is:

dataSourcePattern=DataBase

Where “dataSourcePattern” may be the name of the data source or a pattern string that can match the data source. For example, if the name of the data source is “RedRocks”, the dataSourcePattern may be “RedRocks” or “Red*”. Asterisk (*) matches 0 or more characters and question-mark (?) matches exactly one character. “DataBase” must refer to one of the pre-defined translation template sections. For example “ANSI” for the [ANSI] section.

3. Save the ODBC Configuration file.

Config VariableData Explorer Standard Syntax Element

Comments

WHERE_CHAR_LENGTH CHAR_LENGTH( $1 )

WHERE_CURRENT_DATE CURRENT_DATE

WHERE_CURRENT_TIME CURRENT_TIME

WHERE_CURRENT_TIMESTAMP CURRENT_TIMESTAMP

WHERE_DATE DATE( $1, $2 ) ‘$2’ is either a Data Explorer date tag or NULL. In general, only $1 is required for date conversions. This variable should be defined since this date conversion is a Data Explorer extension.

WHERE_DATE_CONST DATE ‘string’TIME ‘string’TIMSTAMP ‘string’

Date constants are in internal format and can be formatted to display format using any Data Explorer date tag. The conversion is ${‘<dateTag>’}. For example:${‘ODBC Canonical’}

WHERE_EXTRACT EXTRACT($d1 FROM $2) ‘$d1’ returns the date part literal since WHERE_EXTRACT is used translate both extract() and datepart().

WHERE_EXTRACT_YEAR EXTRACT( YEAR FROM $2 ) Note that $2 is used for the date.

WHERE_EXTRACT_MONTH EXTRACT( MONTH FROM $2 ) Note that $2 is used for the date.

WHERE_EXTRACT_DAY EXTRACT( DAY FROM $2 ) Note that $2 is used for the date.

WHERE_EXTRACT_HOUR EXTRACT( HOUR FROM $2 ) Note that $2 is used for the date.

Data Explorer Standard SQL 93

Page 106: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

WHERE_EXTRACT_MINUTE EXTRACT( MINUTE FROM $2 ) Note that $2 is used for the date.

WHERE_EXTRACT_SECOND EXTRACT( SECOND FROM $2 ) Note that $2 is used for the date.

WHERE_LOWER LOWER( $1 )

WHERE_NUMBER NUMBER( $1 ) This variable should be defined since this number conversion is a Data Explorer extension.

WHERE_POSITION POSITION( $1 IN $2 )

WHERE_STRCAT $1 || $2

WHERE_STRING STRING($1) This variable should be defined since this string conversion is a Data Explorer extension.

WHERE_SUBSTRING2 SUBSTRING( $1 FROM $2 )

WHERE_SUBSTRING3 SUBSTRING( $1 FROM $2 FOR $3 )

WHERE_TRIM_BOTH TRIM( BOTH FROM $1 )

WHERE_TRIM_LEADING TRIM( LEADING FROM $1 )LTRIM( $1 )

Because LTRIM() is automatically translated to the ANSI form (that is, ANSI TRIM() ), this variable should be defined if the native RDBMS does not support ANSI TRIM().

WHERE_TRIM_TRAILING TRIM( TRAILING FROM $1 )RTRIM( $1 )

Because RTRIM() is automatically translated to the ANSI form (that is, ANSI TRIM() ), this variable should be defined if the native RDBMS does not support ANSI TRIM().

WHERE_UPPER UPPER( $1 )

Config VariableData Explorer Standard Syntax Element

Comments

94 Appendix B: WHERE Clauses

Page 107: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Drill Down WHERE Clause Template

The Drill Down templates that Data Explorer uses to create the WHERE clause for Drill Down operations are contained in the DrillDown section of the configuration file for each data source. These templates are used by the Drill Down dialog box to create Data Explorer Standard SQL that each importer then executes to return the results of the desired selection.

Figure B-1 outlines the process:

The Drill Down templates are written generically by the type of operation (such as a comparison), and by data type (such as “String”). When the Drill Down dialog box executes a Drill Down request, it searches for the Drill Down template that matches the operation and data type. It then expands the template using the data in the dialog box to a Data Explorer Standard SQL statement. This statement is then sent to the importer for execution. The importer then translates this SQL to an appropriate SQL dialect that the specific RDBMS can execute.

Generic and Specific Drill Down TemplatesThere are two types of templates that are defined for Drill Down: Generic and Specific. Generic templates, which are used for each class of operators that has the same syntactic structure. Generic is used when the sense of the operator is used in its default meaning. For example, the ‘=’, which tests for equality, can be used to search for exact matches. Generic templates can be overridden when it is desirable to change the default meaning for a class of operators. For an example, see “Customizing Drill Down Templates” on page 99.

Specific Drill Down templates are used when it is necessary to change the default sense of an operator to make it more general or to include other conditions, such as nulls. For example, you could change your equality check on strings to perform a “fuzzy” check instead, by inserting this line into the configuration file (preferably within the Drill Down Query section):

EQ_STR = "UPPER($col) LIKE UPPER('%$val$')"

Figure B-1. Drill Down WHERE Clause Template

Drill Down Templates

RDBMS

Data Explorer Standard SQL Templates

Data Explorer Standard SQL

Drill Down Dialog Box

Importer

Drill Down WHERE Clause Template 95

Page 108: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Logical Operators in Generic TemplatesThe following sections outline each generic template and its specific equivalents for each data type.

Numeric TemplatesIn general, the numeric templates take the data in the Drill Down and apply it untranslated against the templates. For example:

LOGOP_NUM = "$col $op $val"

Table B-4 describes numeric templates:

String TemplatesIn general, the string templates take the data in the Drill Down and quote it to make it legal syntax. For example:

LOGOP_STR = "$col $op '$val'".

The string templates are also responsible for translating the pattern requests.

Table B-4. Numeric Templates

Generic Template Specific Template Operator Comments

LOGOP_NUM EQ_NUM = The logical comparison operators.

NE_NUM <>

LT_NUM <

LE_NUM <=

GT_NUM >

GE_NUM >=

INOP_NUM IN_NUM IN The “IN” list operators.

NIN_NUM NOT IN

BTWNOP_NUM BTWN_NUM BETWEEN The inclusive BETWEEN operators.

NBTWN_NUM NOT BETWEEN

NULOP_NUM INL_NUM IS NULL The test for NULL operators.

INNL_NUM IS NOT NULL

PATOP_NUM ISPAT_NUM IS PATTERN The test for IS PATTERN.

ISNPAT_NUM IS NOT PATTERN

96 Appendix B: WHERE Clauses

Page 109: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Table B-5 describes string templates:

Date TemplatesIn general, the date templates take the data in the Drill Down and convert it to dates it to make it legal syntax. For example:

LOGOP_DTE = "$col $op TO_DATE(’$val', ’YYYY-MM-DD HH24:MI:SS’)"

Table B-5. String Templates

Generic Template Specific Template Operator Comments

LOGOP_STR EQ_STR = The logical comparison operators.

NE_STR <>

LT_STR <

LE_STR <=

GT_STR >

GE_STR >=

INOP_STR IN_STR IN The “IN” list operators.

NIN_STR NOT IN

BTWNOP_STR BTWN_STR BETWEEN The inclusive BETWEEN operators.

NBTWN_STR NOT BETWEEN

NULOP_STR INL_STR IS NULL The test for NULL operators.

INNL_STR IS NOT NULL

PATOP_STR ISPAT_STR IS PATTERN The test for IS PATTERN.

ISNPAT_STR IS NOT PATTERN

PATOPUC_STR ISUC_STR — Test for all uppercase letters, for example: UPPER( column) = column.

ISNUC_STR —

PATOPLC_STR ISLC_STR — Test for all lowercase letters, for example: LOWER( column) = column.

ISNLC_STR —

Drill Down WHERE Clause Template 97

Page 110: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Table B-6 describes date templates:

Drill Down Template VariablesThe following variables are available for use in the templates. They are expanded in place, without white space around them:

♦ $col. The name of the column that is being drilled down on.

♦ $op. The operation being performed. It can have one of the following values:

− The operators =, <>, <, <=, >, or >=

− IN or NOT IN

− BETWEEN or NOT BETWEEN

− IS NULL or IS NOT NULL

− IS PATTERN or IS NOT PATTERN

♦ $val. The scalar value being Drill Down on, if available.

♦ $min. The minimum value in the range, if available.

♦ $max. The maximum value in the range, if available.

♦ $list;<prefix>;<suffix>;. The list of domain values, if available. Each domain value is expanded with the <prefix> pre-appended and <suffix> post-appended. Additionally, between each domain value a list separator character is added. The <prefix> and <suffix> may be empty. The prefix/suffix separator character is the first character after '$list' and

Table B-6. Date Templates

Generic Template Specific Template Operator Comments

LOGOP_DTE EQ_DTE = The logical comparison operators.

NE_DTE <>

LT_DTE <

LE_DTE <=

GT_DTE >

GE_DTE >=

INOP_DTE IN_DTE IN The “IN” list operators.

NIN_DTE NOT IN

BTWNOP_DTE BTWN_DTE BETWEEN The inclusive BETWEEN operators.

NBTWN_DTE NOT BETWEEN

NULOP_DTE INL_DTE IS NULL The test for NULL operators.

INNL_DTE IS NOT NULL

PATOP_DTE ISPAT_DTE IS PATTERN The test for IS PATTERN.

ISNPAT_DTE IS NOT PATTERN

98 Appendix B: WHERE Clauses

Page 111: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

must repeated. For example, $list;“;”; and $list/“/”/ are equivalent. The following variables may be used inside either the prefix or suffix:

− $fmt

− $val

♦ $fmt. The Data Explorer date tag format that was discovered during column profiling if the column contains a recognizable date format.

♦ $pat. The pattern format, without quotes, if available.

Customizing Drill Down TemplatesYou can customize any Drill Down template in the importer configuration file. This is accomplished by including the overriding line in the configuration file. It is a good idea to place them all together in the Drill Down Query Template section. If you place an octothorpe (#) in the first column of any line, it becomes a comment. In that way, you can de-activate an override without losing track of what was specified previously. For more information about importer configuration files, see the Data Explorer System Administration Guide.

While the default settings should work in most situations, there are reasons to customize a template, such as those in the following list:

♦ Use RDBMS specific syntax. There are many situations where using RDBMS specific syntax is required. For example, better date conversions.

♦ Special comparison. Some RDBMSs have special functions that try to match based on sound, rather than on spelling. For example, SOUNDEX() in Sybase.

♦ Case insensitive compares. Some RDBMSs have case-sensitive comparisons. To make the RDBMS use case-insensitive compares, use a template like “LOGOP_STR = UPPER($col) $op UPPER($val)”.

♦ Include nulls. To always include nulls in searches. For example:

"LOGOP_STR = $col $op '$val' or $col IS NULL"

Includes nulls in string searches. LOGOP_NUM and LOGOP_DTE can be similarly altered to include nulls in numeric and date searches, respectively.

Drill Down WHERE Clause Template 99

Page 112: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

100 Appendix B: WHERE Clauses

Page 113: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

A p p e n d i x C

Troubleshooting

This appendix includes a listing of common problems and solutions.

101

Page 114: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Common Problems

Does the column contain a particular string of characters?

The column could be fixed-length character or variable-length character. Regardless of the string you embed in the LIKE pattern, you will have no control over whether the RDBMS returns instances that are exact matches as to uppercase or lowercase, or more forgiving. That behavior is determined by the RDBMS.

If you really want to find only the UPPER case instances, you will have to use the IS [NOT] PATTERN construction.

To show the rows where FIELDX does not contain the characters “TELNO” anywhere:

WHERE FIELDX NOT LIKE '%TELNO%'

To show the rows where FIELDX does not start with the characters “TELNO”:

WHERE FIELDX NOT LIKE 'TELNO%'

To show the rows where FIELDX does not end with the characters “TELNO”:

WHERE FIELDX NOT LIKE '%TELNO'

To show the rows where FIELDX does not contain the characters “TELNO” starting in the second position:

WHERE FIELDX NOT LIKE '_TELNO%'

Does one column include the full contents of another column?

Sometimes the contents of one column will also be completely included in the contents of another column, making the first column superfluous.

To show the rows where LONG_DESC includes the contents of SHORT_DESC:

WHERE LONG_DESC LIKE ’%' || SHORT_DESC ||' %’

Better (performing) is:

WHERE POSITION( SHORT_DESC IN LONG_DESC) > 0

In an address, is there a line of blanks followed by a line of non-blanks?

While Column Profiling will tell you how many times each address line is blank, it does not know when a higher address line is non-blank when a lower one is blank.

To show the rows where a nonblank address line follows a blank one:

WHERE (TRIM( TRAILING FROM ADDR1) = ''

AND TRIM( TRAILING FROM ADDR2) = '' )

OR (TRIM( TRAILING FROM ADDR3) <> ''

AND TRIM( TRAILING FROM ADDR2) = '' )

If nulls are a possibility in the various address lines, the rule must be more complex:

102 Appendix C: Troubleshooting

Page 115: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

WHERE ((ADDR1 IS NULL OR TRIM( TRAILING FROM ADDR1) = '')

AND (ADDR2 is NOT NULL and TRIM( TRAILING FROM ADDR2) = ''))

OR ((ADDR3 is NOT NULL and TRIM( TRAILING FROM ADDR3) <> '')

AND (ADDR2 IS NULL OR TRIM( TRAILING FROM ADDR2) = ''))

Are all three fields of a key null?

If a key is composed of three fields, it might be valid for one or more of the fields to be null, but not valid if all three are null.

To show the rows where all three fields are null:

WHERE KEY1 IS NULL and KEY2 IS NULL and KEY3 IS NULL

Is the ZIP code in the right format?

To show the rows where the ZIP code is not of the right pattern, either find (five numbers followed by blanks) or (five numbers - four numbers). This also shows rows where the zip is null:

WHERE ZIP IS NULL

OR (TRIM(TRAILING FROM ZIP) IS NOT PATTERN '9(5)'

AND TRIM(TRAILING FROM ZIP) IS NOT PATTERN '9(5)-9(4)')

How do I make a numeric compare on a character string column that contains numeric info?

Use NUMBER( ) to treat the string as a numeric. This is equivalent to similar functions in RDBMS executions, functions such as cast( ), to_number( ), convert( ), and double( ).

WHERE NUMBER(shipnchar) > 300000

Are there non-letters inside a column that ought to be only alphabetic?

The IS [NOT] PATTERN construction can be used to highlight the rows that have non-letter characters in alphabetic columns. In a PATTERN, ‘X’ means any upper- or lower-case letter. If you only want to allow uppercase letters, then substitute ‘U’ for ‘X’ in the pattern. For lowercase, use ‘L’.

To show the rows where there are characters that are non-alphabetic included in either of the varchar name fields, the city, or state (nulls not considered):

WHERE FIRSTNAME IS NOT PATTERN

'X('||STRING(CHAR_LENGTH(FIRSTNAME))||')'

OR LASTNAME IS NOT PATTERN

'X('||STRING(CHAR_LENGTH(LASTNAME))||')'

OR CITYNAME IS NOT PATTERN

'X('||STRING(CHAR_LENGTH(CITYNAME))||')'

OR STATE IS NOT PATTERN

Common Problems 103

Page 116: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

'X('||STRING(CHAR_LENGTH(STATE))||')'

Are there numbers inside a character column?

The preceding pattern checks to detect uppercase or lowercase letters that may not be what you really want, because there actually are some non-letters that would be acceptable in a field such as CITYNAME, for example, blanks or periods (as in New Canaan and St. Paul). Perhaps what you are really looking for is numbers that have gotten into a column. You can use LIKE to find those.

To show the rows where there are numbers within a given column:

WHERE CITYNAME LIKE '%1%' or CITYNAME LIKE '%2%' or

CITYNAME LIKE '%3%' or CITYNAME LIKE '%4%' or

CITYNAME LIKE '%5%' or CITYNAME LIKE '%6%' or

CITYNAME LIKE '%7%' or CITYNAME LIKE '%8%' or

CITYNAME LIKE '%9%' or CITYNAME LIKE '%0%'

Consider another situation. If a 15-char field labeled COUNTY, which is typically alphabetic, has actually been used to hold the license ID, which is typically numeric, we could use pattern to look for numerics;

WHERE COUNTY IS PATTERN '9('||STRING(CHAR_LENGTH(RTRIM(COUNTY)))||')'

Here we have added the function RTRIM() to the PATTERN. RTRIM() has the effect of trimming the trailing blanks, if any, from the column. So the length that is returned is the length of the data through the last non-blank character. The pattern will resolve to 9(1) through 9(15).

Does the column lack data?

Even a column that is not null may have essentially no data in it. It may be blank or zero.

To show the rows where a character column is null or blank:

WHERE FIELDX IS NULL or

FIELDX IS PATTERN 'B('||STRING(CHAR_LENGTH(FIELDX))||')'

To show the rows where a numeric field is null or zero:

WHERE FIELD9 IS NULL or

FIELD9 = 0

Is the date column the wrong format?

If you have a particular date format that you want your dates to conform to, a rule can test for that, using CHECKDATE.

To show the rows where the ISSUE_DATE does not conform to the format “dd-month-year”, or to show rows where even though following that format they are invalid dates:

WHERE CHECKDATE(ISSUE_DATE, 'dd-month-year') <>0

104 Appendix C: Troubleshooting

Page 117: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

Is the date column in the past?

If the date must be in the past, you can check it against the current date.

To show the rows where the ORDER_DATE is in the past:

WHERE DATE(ORDER_DATE) is NULL or

DATE(ORDER_DATE,'dd-month-year') < CURRENT_DATE

Is the column all uppercase?

Sometimes you require the contents of a character string column to be all uppercase.

To show the rows where FIELDX is not all uppercase alphabetic characters:

WHERE FIELDX IS NOT PATTERN

U('||STRING(CHAR_LENGTH(FIELDX))||')'

Common Problems 105

Page 118: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

106 Appendix C: Troubleshooting

Page 119: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

I n d e x

Ccontext menus

creating rules 70

Ddata rules

applying 66creating 70creating from context menu 70creating from drill down 72creating from Navigation tree 70data type in 74deleting 69drill-down search criteria 73editing 67, 69, 82email 69executing 77executing for an object 78executing from tag viewer 69generated names 71non-executable 66, 68, 70results 81Rule field and name 68, 71sample data options 75sample data sets 81saving results 81search limits 76

searching for 78, 79selecting 77table data source 69tag viewer context menu 69viewing 67WHERE clause 68

Drill Downdialog box 73

drill downcreating rules 72data type in 74rule text for 68search criteria 73

EExecute Rule

dialog box 75, 77, 78, 79

NNew Rule

dialog box 70

RRule

dialog box 67

107

Page 120: Data Rules Primer - kb.informatica.com · ix Preface Welcome to Data Explorer, the Informatica software product suite that provides a complete and accurate picture of enterprise data

rule viewerusing 81

rulesapplying 66creating 70creating from context menu 70creating from drill down 72creating from Navigation tree 70data type in 74deleting 69drill-down search criteria 73editing 67, 69, 82email 69executing 77executing for an object 78executing from tag viewer 69generated names 71non-executable 66, 68, 70results 81Rule field and name 68, 71sample data options 75sample data sets 81saving results 81search limits 76searching for 78, 79selecting 77table data source 69tag viewer context menu 69viewing 67WHERE clause 68

Ssample data options

data rule execution 75SQL WHERE clause 84

flat files 86RDBMS 85

Ttag viewer

rule context menu 69rules in 67

WWHERE clause 68

examples (flat files) 91examples (RDBMS) 85

108 Index