cloveretl/gui user's manual version 2.0

CloverGUI

User's guide

Javlin, a.s.

Tomas Waller

CloverGUI User's GuideTomas Waller

This User's Guide covers the Release 2.0.x of CloverGUI.

Copyright © 2008 Javlin, a.s. All rights reserved.

Published 26-August-2008

iii

Table of ContentsI. Installation Guide ...................................................................................................................... 1

1. CloverGUI Overview ......................................................................................................... 2What Is CloverGUI? ..................................................................................................... 2What Is CloverEngine? .................................................................................................. 2What Is CloverServer? ................................................................................................... 2Web Information .......................................................................................................... 2

2. Installation Instructions ...................................................................................................... 3The Way How You Should Download CloverGUI .............................................................. 3

Downloading the Eclipse Platform ........................................................................... 3Setting Workspace ................................................................................................ 4Downloading the Eclipse GEF Plugin ....................................................................... 5Downloading the CloverGUI Plugin ......................................................................... 8

Creating a New Project ................................................................................................ 12Creating a New Graph ................................................................................................. 15Running Graphs .......................................................................................................... 18

3. Import ........................................................................................................................... 24Import Clover Projects ................................................................................................. 25Import Graphs ............................................................................................................ 26Import Metadata .......................................................................................................... 27

Metadata from XSD ............................................................................................ 27Metadata from DDL ............................................................................................ 28

4. Export ........................................................................................................................... 29Export Graphs ............................................................................................................ 29Export Graphs to HTML .............................................................................................. 30Export Metadata to XSD .............................................................................................. 31Export Image .............................................................................................................. 31

A. Setting and Configuring Java Tools .................................................................................... 33Setting Java Runtime Environment ................................................................................. 33Installing Java Development Kit .................................................................................... 37

B. Import Other Examples .................................................................................................... 39II. Objects, Structures and Tools .................................................................................................... 45

5. CloverGUI Structures ....................................................................................................... 46CloverGUI Perspective ................................................................................................. 46CloverGUI Panes ........................................................................................................ 46

Graph Editor with Palette of Components ................................................................ 47Navigator Pane ................................................................................................... 51Outline Pane ....................................................................................................... 51Tabs Pane .......................................................................................................... 52

6. Building Transformation Graph .......................................................................................... 547. Edges ............................................................................................................................ 55

What Are the Edges? ................................................................................................... 55Connecting Components by the Edges ............................................................................ 55Assigning Metadata to the Edges ................................................................................... 55Propagating Metadata through the Edges ......................................................................... 56Debugging the Edges ................................................................................................... 56Viewing the Data Flowing through the Edges ................................................................... 57Types of Edges ........................................................................................................... 60Colors of the Edges ..................................................................................................... 61

8. Metadata ........................................................................................................................ 62Internal Metadata ........................................................................................................ 62

How You Can Create Internal Metadata .................................................................. 62Externalizing Internal Metadata ............................................................................. 64Exporting Internal Metadata .................................................................................. 64

External (Shared) Metadata ........................................................................................... 65

CloverGUI

iv

How You Can Create External (Shared) Metadata ..................................................... 65Linking External (Shared) Metadata ....................................................................... 66Internalizing External (Shared) Metadata ................................................................. 66

The Resources From Which You Can Extract Metadata ..................................................... 67Extracting Metadata from a Flat File ...................................................................... 67Extracting Metadata from an XLS File .................................................................... 71Extracting Metadata from a Database ...................................................................... 72Creating Metadata from a DBase File ..................................................................... 74Creating Metadata by User ................................................................................... 78

Assigning Metadata to an Edge ..................................................................................... 78Editing Metadata ......................................................................................................... 78Creating Database Table on the basis of Metadata and Database Connection ........................... 79Metadata Editor .......................................................................................................... 79

Record Pane ....................................................................................................... 81Field Pane .......................................................................................................... 83Filter Textarea .................................................................................................... 84

Dynamic Metadata ...................................................................................................... 849. Database Connections ....................................................................................................... 85

Internal Database Connections ....................................................................................... 85How You Can Create Internal Database Connections ................................................. 85Externalizing Internal Database Connections ............................................................ 88

External (Shared) Database Connections .......................................................................... 89How You Can Create External (Shared) Database Connections .................................... 89Linking External (Shared) Database Connection ........................................................ 93Internalizing External (Shared) Database Connections ................................................ 93

Browsing Database and Extracting Metadata from Database Tables ...................................... 93Encrypting the Access Password .................................................................................... 94

10. Lookup Tables .............................................................................................................. 95Creating Lookup Tables ............................................................................................... 95

Simple Lookup Table ........................................................................................... 96Database Lookup Table ........................................................................................ 98Range Lookup Table ........................................................................................... 99

11. Parameters .................................................................................................................. 101Internal Parameters .................................................................................................... 101

How You Can Create Internal Parameters .............................................................. 101Externalizing Internal Parameters ......................................................................... 101

External (Shared) Parameters ....................................................................................... 102How You Can Create External (Shared) Parameters ................................................. 102Linking External (Shared) Parameters ................................................................... 102Internalizing External (Shared) Parameters ............................................................. 103

Parameters Wizard ..................................................................................................... 103Using Parameters ....................................................................................................... 104

12. Sequences ................................................................................................................... 105Creating a Sequence ................................................................................................... 105Editing a Sequence .................................................................................................... 105

C. JMS Connections .......................................................................................................... 107Internal JMS Connections ........................................................................................... 107

How You Can Create Internal JMS Connections ..................................................... 107Externalizing Internal JMS Connections ................................................................. 107

External (Shared) JMS Connections .............................................................................. 107How You Can Create External (Shared) JMS Connections ........................................ 107Linking External (Shared) JMS Connection ............................................................ 107Internalizing External (Shared) JMS Connections .................................................... 108

Edit JMS Connection Wizard ...................................................................................... 108Encrypting the Authentication Password ........................................................................ 109

III. Components Guide ............................................................................................................... 110

CloverGUI

v

13. Introduction to Components ........................................................................................... 111Common Properties of Components .............................................................................. 111

Palette of Components ........................................................................................ 112Giving a Name to a Component ........................................................................... 113Phases ............................................................................................................. 114Enabling vs. Disabling Components vs. PassThrough Status ...................................... 115Data Policy ....................................................................................................... 117Locating Files with URL File Dialog .................................................................... 117Viewing Data in Readers and Writers .................................................................... 118

14. Defining the Transformations ......................................................................................... 121Open Type Wizard .................................................................................................... 121Edit Value Wizard ..................................................................................................... 122Transform Editor ....................................................................................................... 123

15. Readers ...................................................................................................................... 132File URL ................................................................................................................. 132File Readers ............................................................................................................. 133

DataGenerator ................................................................................................... 133Flat File Readers ............................................................................................... 135UniversalDataReader .......................................................................................... 135Other Type File Readers ..................................................................................... 136CloverDataReader .............................................................................................. 136XLSDataReader ................................................................................................. 137DBFDataReader ................................................................................................ 139

Database Readers ...................................................................................................... 140Using JDBC Drivers .......................................................................................... 140DBInputTable ................................................................................................... 140

Advanced Readers ..................................................................................................... 141XMLExtract ...................................................................................................... 141XMLXPathReader ............................................................................................. 144JMSReader ....................................................................................................... 147LDAPReader .................................................................................................... 148

16. Writers ....................................................................................................................... 149File URL ................................................................................................................. 149File Writers .............................................................................................................. 149

Partitioning Data Flow into Different Output Files ................................................... 150Trash ............................................................................................................... 151Flat File Writers ................................................................................................ 151UniversalDataWriter ........................................................................................... 151Other Type File Writers ...................................................................................... 152CloverDataWriter ............................................................................................... 152XLSDataWriter ................................................................................................. 153StructuredDataWriter .......................................................................................... 154

Database Writers ....................................................................................................... 155Using JDBC Drivers .......................................................................................... 155DBOutputTable ................................................................................................. 156Using Database Bulk Loaders .............................................................................. 158DB2DataWriter ................................................................................................. 158InformixDataWriter ............................................................................................ 159MSSQLDataWriter ............................................................................................. 160MySQLDataWriter ............................................................................................. 161OracleDataWriter ............................................................................................... 161PostgreSQLDataWriter ....................................................................................... 162

Advanced Writers ...................................................................................................... 163XMLWriter ...................................................................................................... 163JMSWriter ........................................................................................................ 167LDAPWriter ..................................................................................................... 168

CloverGUI

vi

17. Transformers ............................................................................................................... 169Copying, Filtering and Sorting ..................................................................................... 169

SimpleCopy ...................................................................................................... 169SpeedLimiter .................................................................................................... 169ExtSort ............................................................................................................ 170Dedup .............................................................................................................. 171ExtFilter ........................................................................................................... 171

Concatenating, Gathering and Merging .......................................................................... 172Concatenate ...................................................................................................... 172SimpleGather .................................................................................................... 172Merge .............................................................................................................. 173

Partitioning and Intersection ........................................................................................ 174Partition ........................................................................................................... 174DataIntersection ................................................................................................. 177

Pure Transformers ..................................................................................................... 179KeyGenerator .................................................................................................... 179Aggregate ......................................................................................................... 180Reformat .......................................................................................................... 181Denormalizer .................................................................................................... 183Normalizer ....................................................................................................... 185XSLTransformer ................................................................................................ 188

18. Joiners ........................................................................................................................ 189Join Types ................................................................................................................ 189

Inner Join ......................................................................................................... 189Left Outer Join .................................................................................................. 189Full Outer Join .................................................................................................. 189

Joining Components ................................................................................................... 189Transformations ................................................................................................. 190ApproximativeJoin ............................................................................................. 191ExtHashJoin ...................................................................................................... 194ExtMergeJoin .................................................................................................... 197LookupJoin ....................................................................................................... 199DBJoin ............................................................................................................ 201

19. Other Components ........................................................................................................ 203Executing Components ............................................................................................... 203

SystemExecute .................................................................................................. 203JavaExecute ...................................................................................................... 203DBExecute ....................................................................................................... 204RunGraph ......................................................................................................... 205

Non-Executing Components ........................................................................................ 206CheckForeignKey .............................................................................................. 206LookupTableReaderWriter ................................................................................... 208

20. Deprecated .................................................................................................................. 210Flat File Readers ....................................................................................................... 210DelimitedDataReader .................................................................................................. 210FixLenDataReader ..................................................................................................... 211Flat File Writers ........................................................................................................ 212DelimitedDataWriter .................................................................................................. 212FixLenDataWriter ...................................................................................................... 213

D. Defining Transformations in Java ..................................................................................... 215IV. Transformation Language ...................................................................................................... 216

21. Clover Transformation Language .................................................................................... 217Program Structure ...................................................................................................... 217Comments ................................................................................................................ 217Import ..................................................................................................................... 217Data Types ............................................................................................................... 218

CloverGUI

vii

Literals .................................................................................................................... 220Variables .................................................................................................................. 222Operators ................................................................................................................. 222

Arithmetic Operators .......................................................................................... 222Relational Operators ........................................................................................... 224Logical Operators .............................................................................................. 225

Simple Statement and Block of Statements ..................................................................... 226Control Statements ..................................................................................................... 226

Selection Statements .......................................................................................... 226Iteration Statements ............................................................................................ 227Jump Statements ................................................................................................ 228

Functions ................................................................................................................. 228Eval ........................................................................................................................ 229Parameters ................................................................................................................ 229Sequences ................................................................................................................ 229Lookup Tables .......................................................................................................... 229Data Flows ............................................................................................................... 230Mapping .................................................................................................................. 230

E. Clover TL Functions ...................................................................................................... 231Conversion Functions ................................................................................................. 231Date Functions .......................................................................................................... 233Mathematical Functions .............................................................................................. 234String Functions ........................................................................................................ 235Miscellaneous Functions ............................................................................................. 238

F. Clover Transformation Language Lite ............................................................................... 239

viii

List of Figures2.1. The Eclipse Logo ................................................................................................................... 32.2. You Are Asked To Select a Workspace ...................................................................................... 42.3. You Can Select the Following Workspace ................................................................................... 42.4. The Eclipse Platform Introductory Screen ................................................................................... 52.5. Downloading the Graphical Editing Framework ........................................................................... 52.6. Install/Update Wizard .............................................................................................................. 62.7. Searching for the Graphical Editing Framework ........................................................................... 62.8. List of Mirrors for Download ................................................................................................... 72.9. The Eclipse License Agreement ................................................................................................ 72.10. The About Eclipse SDK Window ............................................................................................ 82.11. List of Installed Plugins ......................................................................................................... 82.12. List of Update Sites ............................................................................................................... 92.13. Adding the Clover Update Site ................................................................................................ 92.14. Selecting the Sites that Should Be Updated .............................................................................. 102.15. CloverGUI Prompt .............................................................................................................. 102.16. Clover Products to Install ..................................................................................................... 112.17. Clover Has Been Installed ..................................................................................................... 112.18. Creating a New Project ........................................................................................................ 122.19. Selecting the New Project Wizard .......................................................................................... 122.20. Giving a Name to a New Project ............................................................................................ 132.21. CloverETL Examples Project ................................................................................................. 132.22. Opening the CloverETL Perspective ....................................................................................... 132.23. CloverETL Perspective ......................................................................................................... 142.24. CloverETL Perspective with Highlighted Navigator Pane and the Project Folder Structure ................. 142.25. Creating a New Graph ......................................................................................................... 152.26. Giving a Name to a New Graph ............................................................................................. 152.27. Selecting the Parent Folder for the Graph ................................................................................ 162.28. CloverETL Perspective with Highlighted Graph Editor ............................................................... 162.29. Graph Editor with a New Graph and the Palette of Components ................................................... 172.30. Opening the Workspace.prm File ........................................................................................... 172.31. The Parameters Contained in the Workspace.prm File ................................................................ 182.32. Running a Graph from the Main Menu .................................................................................... 192.33. Running a Graph from the Context Menu ................................................................................ 192.34. Running a Graph from the Upper Tool Bar .............................................................................. 202.35. Open Run Dialog ................................................................................................................ 202.36. Setting Up Memory Size ...................................................................................................... 212.37. Successful Data Parsing ........................................................................................................ 212.38. Console Tab with an Overview of the Graph Processing ............................................................. 222.39. Counting Parsed Data ........................................................................................................... 222.40. Enlarging the Font of Numbers .............................................................................................. 232.41. Setting the Font Size ............................................................................................................ 233.1. Import (Main Menu) .............................................................................................................. 243.2. Import (Context Menu) .......................................................................................................... 243.3. Import Options ..................................................................................................................... 253.4. Import Projects ..................................................................................................................... 253.5. Import Graphs ...................................................................................................................... 263.6. Import Metadata from XSD .................................................................................................... 273.7. Import Metadata from DDL .................................................................................................... 284.1. Export Options ..................................................................................................................... 294.2. Export Graphs ...................................................................................................................... 294.3. Export Graphs to HTML ........................................................................................................ 304.4. Export metadata to XSD ........................................................................................................ 314.5. Export Image ....................................................................................................................... 31A.1. Setting Java Runtime Environment .......................................................................................... 33

CloverGUI

ix

A.2. Preferences Wizard ............................................................................................................... 34A.3. Installed JREs Wizard ........................................................................................................... 34A.4. Adding a New JRE ............................................................................................................... 35A.5. Selecting New JRE Files ....................................................................................................... 35A.6. Selecting a JRE .................................................................................................................... 36A.7. Adding Java Development Kit ................................................................................................ 37A.8. Searching for JDK Jars ......................................................................................................... 37A.9. Selecting JDK Jars ............................................................................................................... 38A.10. Adding JDK Jars ................................................................................................................ 38B.1. Import Examples (Main Menu) ............................................................................................... 39B.2. Import Examples (Context Menu) ............................................................................................ 40B.3. Import External Clover Project ............................................................................................... 40B.4. Examples Selected ................................................................................................................ 41B.5. CloverETL Perspective with a Set of Projects ............................................................................ 41B.6. Newer Examples Files and Folders in the Navigator Pane ............................................................ 42B.7. Older Examples Files and Folders in the Navigator Pane ............................................................. 42B.8. Setting the WORKSPACE Parameter ....................................................................................... 43B.9. Example Graph .................................................................................................................... 445.1. CloverGUI Perspective ........................................................................................................... 465.2. Graph Editor with an Opened Palette of Components ................................................................... 475.3. Closing the Graphs ................................................................................................................ 485.4. Grid in the Graph Editor ........................................................................................................ 485.5. A Graph before Selecting Auto-Layout. .................................................................................... 495.6. A Graph after Selecting Auto-Layout. ....................................................................................... 495.7. Six New Buttons in the Tool Bar Appear Highlighted (Align Middle is shown) ................................. 505.8. Alignments from the Context Menu .......................................................................................... 505.9. Navigator Pane ..................................................................................................................... 515.10. Outline Pane ....................................................................................................................... 515.11. Another Representation of the Outline Pane ............................................................................. 525.12. Properties Tab ..................................................................................................................... 535.13. Console Tab ....................................................................................................................... 535.14. Problems Tab ..................................................................................................................... 535.15. Clover - Graph Tracking Tab ................................................................................................ 535.16. Clover - Log Tab ................................................................................................................ 537.1. Creating Metadata on the Empty Edge ...................................................................................... 567.2. Properties of an Edge ............................................................................................................ 577.3. Filter Editor Wizard .............................................................................................................. 577.4. View Data Dialog ................................................................................................................. 587.5. Viewing Data ....................................................................................................................... 587.6. Hide/Show Columns when Viewing Data .................................................................................. 587.7. View Record Dialog .............................................................................................................. 597.8. Find Dialog .......................................................................................................................... 597.9. Copy Dialog ......................................................................................................................... 607.10. Selecting the Edge Type ....................................................................................................... 607.11. Metadata in the Tooltip ........................................................................................................ 618.1. Creating Internal Metadata in the Outline Pane ........................................................................... 638.2. Creating Internal Metadata in the Graph Editor ........................................................................... 638.3. Externalizing and/or Exporting Internal Metadata ........................................................................ 648.4. Selecting a Location for a New Externalized and/or Exported Internal Metadata ................................ 658.5. Creating External (Shared) Metadata in the Main Menu and/or in the Navigator Pane ......................... 668.6. Internalizing External (Shared) Metadata ................................................................................... 678.7. Extracting Metadata from Delimited Flat File ............................................................................. 688.8. Extracting Metadata from Fixed Length Flat File ........................................................................ 688.9. Setting Up Delimited Metadata ................................................................................................ 698.10. Setting Up Fixed Length Metadata ......................................................................................... 708.11. Extracting Metadata from XLS File ........................................................................................ 71

CloverGUI

x

8.12. Extracting Internal Metadata from a Database ........................................................................... 728.13. Database Connection Wizard ................................................................................................. 728.14. Selecting Columns for Metadata ............................................................................................. 738.15. Generating a Query .............................................................................................................. 738.16. Original Libraries Tab of Java Build Path ................................................................................ 748.17. Adding the Two Libraries for Extracting Metadata from DBASE File ........................................... 758.18. Creating Java Application for Extracting Metadata from DBASE File ............................................ 758.19. Selecting the Main Class ...................................................................................................... 768.20. Adding the Main Class ......................................................................................................... 768.21. Adding Arguments .............................................................................................................. 778.22. Configuration for Extracting Metadata from DBASE File Has Been Created ................................... 778.23. Assigning Metadata to an Edge .............................................................................................. 788.24. Creating Database Table on the Basis of Metadata and Database Connection .................................. 798.25. Metadata Editor for a Delimited File ....................................................................................... 818.26. Metadata Editor for a Fixed Length File .................................................................................. 819.1. Creating Internal Database Connection ...................................................................................... 859.2. Database Connection Wizard ................................................................................................... 869.3. Adding a new JDBC Driver into the List of Available Drivers ....................................................... 869.4. Defining Internal Database Connection ..................................................................................... 879.5. Externalizing Internal Database Connection ............................................................................... 889.6. Creating External (Shared) Database Connection ........................................................................ 899.7. Selecting Database Connection Item ......................................................................................... 909.8. Database Connection Wizard ................................................................................................... 909.9. Adding a new JDBC Driver into the List of Available Drivers ....................................................... 919.10. Defining External (Shared) Database Connection ...................................................................... 929.11. Selecting a Folder for External (Shared) Database Connection ..................................................... 929.12. Internalizing External (Shared) Database Connection ................................................................. 939.13. Running a Graph with the Password Encrypted ......................................................................... 9410.1. Lookup Table Wizard .......................................................................................................... 9510.2. Simple Lookup Table Wizard ................................................................................................ 9610.3. Edit Key Wizard ................................................................................................................. 9610.4. Simple Lookup Table Wizard with File URL ........................................................................... 9710.5. Simple Lookup Table Wizard with Data .................................................................................. 9710.6. Changing Data .................................................................................................................... 9710.7. Database Lookup Table Wizard ............................................................................................. 9810.8. Query Editor Wizard ............................................................................................................ 9810.9. Appropriate Data for Range Lookup Table ............................................................................... 9910.10. Range Lookup Table Wizard ............................................................................................... 9910.11. Define Range Lookup Table Key Wizard ............................................................................. 10010.12. Assigning End Fields to Start Fields .................................................................................... 10011.1. Creating Internal Parameters ................................................................................................ 10111.2. Externalizing Internal Parameters .......................................................................................... 10211.3. Internalizing External (Shared) Parameter ............................................................................... 10311.4. Example of a Parameter-Value Pair ....................................................................................... 10412.1. Creating a Sequence ........................................................................................................... 10512.2. Editing a Sequence ............................................................................................................ 10612.3. A New Run of the Graph with the Previous Start Value of the Sequence ...................................... 106C.1. Edit JMS Connection Wizard ................................................................................................ 10813.1. Selecting Components ........................................................................................................ 11213.2. Components in Palette ........................................................................................................ 11213.3. Removing Components from he Palette ................................................................................. 11313.4. Simple Renaming Components ............................................................................................. 11413.5. Running a Graph with Various Phases ................................................................................... 11513.6. Running a Graph with Disabled Component ........................................................................... 11613.7. Running a Graph with Component in PassThrough Status ......................................................... 11613.8. URL File Dialog ................................................................................................................ 117

CloverGUI

xi

13.9. Viewing Data in Components .............................................................................................. 11913.10. Viewing Data as Plain Text ............................................................................................... 11913.11. Viewing Data as Grid ....................................................................................................... 11913.12. Plain Text Data Viewing ................................................................................................... 12013.13. Grid Data Viewing ........................................................................................................... 12014.1. Open Type Wizard ............................................................................................................. 12214.2. Edit Value Wizard ............................................................................................................. 12214.3. Find Wizard ...................................................................................................................... 12214.4. Go to Line Wizard ............................................................................................................. 12314.5. Transformations Tab of the Transform Editor ......................................................................... 12314.6. Copying the Input Field to the Output ................................................................................... 12414.7. Transformation Definition in CTL (Transformations Tab) .......................................................... 12514.8. Mapping of Inputs to Outputs (Connecting Lines) .................................................................... 12514.9. Editor with Fields and Functions .......................................................................................... 12614.10. Transformation Definition in CTL (Source Tab) .................................................................... 12614.11. Confirmation Message ...................................................................................................... 12714.12. Transformation Definition in CTL (Transform Tab of the Graph Editor) ..................................... 12714.13. Outline Pane Displaying Variables and Functions ................................................................... 12814.14. Content Assist (Record and Field Names) ............................................................................. 12814.15. Content Assist (List of CTL Functions) ................................................................................ 12914.16. Error in Transformation ..................................................................................................... 12914.17. Converting Transformation to Java ...................................................................................... 13014.18. Transformation Definition in Java ....................................................................................... 13014.19. Older Transformation Definition in CTL Lite (Transformations Tab) ......................................... 13114.20. Older Transformation Definition in CTL Lite (Source Tab) ...................................................... 13115.1. Sequences Dialog .............................................................................................................. 13415.2. A Sequence Assigned ......................................................................................................... 13415.3. Edit Key Dialog ................................................................................................................ 13515.4. XLS Mapping Dialog ......................................................................................................... 13815.5. XLS Fields Mapped to Clover Fields .................................................................................... 13816.1. Create Mask Wizard ........................................................................................................... 15517.1. Defining Sort Key and Sort Order ........................................................................................ 17017.2. Ranges Editor ................................................................................................................... 17517.3. Source Tab of the Transform Editor in the Partition Component ................................................. 17617.4. Source Tab of the Transform Editor in the DataIntersection Component ....................................... 17817.5. Source Tab of the Transform Editor in the Reformat Component ................................................ 18217.6. Source Tab of the Transform Editor in the Denormalizer Component ........................................... 18417.7. Source Tab of the Transform Editor in the Normalizer Component .............................................. 18617.8. XSLT Mapping ................................................................................................................. 18817.9. An Example of Mapping ..................................................................................................... 18818.1. Source Tab of the Transform Editor in Joiners ........................................................................ 19018.2. Join Key Wizard (Master Key Tab) ...................................................................................... 19118.3. Join Key Wizard (Slave Key Tab) ........................................................................................ 19218.4. An Example of the Join Key Attribute in ApproximativeJoin Component ..................................... 19318.5. Matching Key Wizard (Master Key Tab) ............................................................................... 19318.6. Matching Key Wizard (Slave Key Tab) ................................................................................. 19318.7. An Example of the Join Key Attribute in ExtHashJoin Component .............................................. 19518.8. Hash Join Key Wizard ........................................................................................................ 19618.9. Join Key Wizard (Master Key Tab) ...................................................................................... 19818.10. Join Key Wizard (Slave Key Tab) ....................................................................................... 19818.11. Edit Key Wizard .............................................................................................................. 20019.1. Foreign Key Definition Wizard (Foreign Key Tab) .................................................................. 20719.2. Foreign Key Definition Wizard (Primary Key Tab) .................................................................. 20719.3. Foreign Key Definition Wizard (Foreign and Primary Keys Assigned) ......................................... 208

Part I. Installation Guide

2

Chapter 1. CloverGUI OverviewThis chapter is an overview of the following three products of our CloverETL software: CloverGUI, Clov-erEngine and CloverServer.

What Is CloverGUI?CloverGUI is one of the family of CloverETL software products developed by Opensys and Javlin Companies. Itis a powerful Java-based standalone application for data extraction, transformation and loading.

CloverGUI is provided as a plugin for the Eclipse Platform. Thus, to work with CloverGUI, you must first down-load the Eclipse Platform.

Working with CloverGUI is much more simple than writing your proper code for data parsing. Its graphical userinterface makes building and running graphs more easy and comfortable.

What Is CloverEngine?CloverEngine is another member of the family of CloverETL software products developed by Opensys and JavlinCompanies. CloverEngine is a Java-based application that allows you to integrate CloverEngine into other appli-cations you are using.

What Is CloverServer?CloverServer is the last and newest member of CloverETL software products developed by Opensys and JavlinCompanies. CloverServer is also based on Java. It provides the full functionality of a server.

Web InformationIn addition to this User's Guide, you can find many useful information on the following sites:

• wiki.clovergui.net

• www.cloveretl.org

• www.opensys.eu

http://wiki.clovergui.net

http://www.cloveretl.org

http://www.opensys.eu

3

Chapter 2. Installation InstructionsThis chapter explains how you should download CloverGUI, create a new project and a new graph, and how youcan run graphs.

The Way How You Should Download Clover-GUISo far it has been indispensable download and install both Java Runtime Environment and Java Development Kit,but from now Clover tools contain Janino compiler. For this reason, you can use this compiler now. JRE can bedownloaded from the following site: http://java.sun.com/javase/downloads/index_jdk5.jsp. We suggest you useJava 1.5 because Clover is being developed on it.

Downloading the Eclipse PlatformSo, once you have downloaded and installed Java Runtime Environment, you should download the Eclipse Plat-form. There are various Eclipse Platforms for different operating systems. The following is the Eclipse home site:www.eclipse.org.

The Eclipse Platform for both Windows and Linux can be downloaded from the following site: www.eclipse.org/downloads.

Once you have downloaded the Eclipse Platform, you only need to unpack its .zip file (for Windows), or its.tar.gz file (for Linux).

In Windows, the Eclipse folder contains an eclipse.exe file by which you can start up the Eclipse Platform.

In Linux, the folder contains an executable eclipse file.

Figure 2.1. The Eclipse Logo

http://java.sun.com/javase/downloads/index_jdk5.jsp

http://www.eclipse.org

http://www.eclipse.org/downloads

http://www.eclipse.org/downloads

Installation Instructions

4

Setting Workspace

When you double-click the eclipse.exe file or executable eclipse file, the Eclipse Platform starts up andyou are prompted to select a location for the workspace folder. It is the place your projects will be stored in.In Windows, you can see the following prompt. Instead of the cloveruser folder there will be your username,e.g. johnsmith. (C:\Users\johnsmith\workspace).

Figure 2.2. You Are Asked To Select a Workspace

You can agree with the offered workspace location, but if you want, you can choose another one. Maybe youwant to have the Eclipse workspace inside the eclipse folder. In such a case, follow these instructions:

(You could set C:\Users\johnsmith\Desktop\eclipse\workspace, for example.)

Figure 2.3. You Can Select the Following Workspace

These screenshots are taken from MS Windows operating system, but they are similar in Linux.

In Linux, you can choose /home/cloveruser/Desktop/eclipse/workspace. Again, with your user-name (e.g. johnsmith) instead of cloveruser.

After confirming the selected workspace you will see the following window:


5

Figure 2.4. The Eclipse Platform Introductory Screen

Downloading the Eclipse GEF PluginAfter downloading the Eclipse Platform and selecting the workspace, you must download the Graphical EditingFramework (GEF) with the help of the Eclipse Update Mechanism.

After opening the Eclipse Platform, you must choose Help → Software Updates → Find and Install...

Figure 2.5. Downloading the Graphical Editing Framework


6

Then, after clicking Find and Install..., you can see a new window with two options. One of them allows you toupdate the currently installed features, the other one - to install new features. You must select the option as follows:

Figure 2.6. Install/Update Wizard

After that, you can select some or all of the provided options. We have chosen the Europa Discovery Site here.

Figure 2.7. Searching for the Graphical Editing Framework

When you click the Finish button, you will be presented with some mirrors for download.


7

Figure 2.8. List of Mirrors for Download

Then, you must select which mirror should be used for searching and downloading new features. When all newfeatures are found, you must check the Graphical Editing Framework item.

Now you must accept the terms in the license agreements.

Figure 2.9. The Eclipse License Agreement

Now you must click the Next button, then Finish and you will be asked to accept the installation and restart theEclipse Platform. After restarting the Eclipse Platform, the Eclipse GEF is already installed. When you choose

Help → About Eclipse SDK, you can see the Graphical Editing Framework items after clicking the Plug-inDetails button.


8

Figure 2.10. The About Eclipse SDK Window

Figure 2.11. List of Installed Plugins

Downloading the CloverGUI Plugin

Once you have downloaded the Eclipse GEF, you can download CloverGUI itself. The method varies dependingon your license.

First, you must register an account at the company site: www.cloveretl.org/user/register.

After that, you will be sent an e-mail with your login name and password. In that mail, you will be asked to confirmthe registration. Without doing that, you would not be able to download the CloverGUI plugin.

When you confirm the registration, you can download CloverGUI itself. After choosing Help → Software Up-

dates → Update and Install..., you can see the following window:

http://www.cloveretl.org/user/register


9

Figure 2.12. List of Update Sites

Now you must click the New Remote Site... button and fill in the two fields of the new window. CloverGUIshould be the name. For URL you must type: http://www.clovergui.net/eval-update.

Thus, the resulting window should be as follows:

Figure 2.13. Adding the Clover Update Site

After clicking the OK button, you can see the following window:


10

Figure 2.14. Selecting the Sites that Should Be Updated

Now, when you click the Finish button, you will be prompted to fill in your username and password that you havereceived in the registration mail.

Figure 2.15. CloverGUI Prompt

When you type your username and password and click the OK button, CloverGUI will be found. After selectingand expanding the CloverGUI item, the window should look like this:


11

Figure 2.16. Clover Products to Install

Now, you must click the Next button, select I accept the terms in the license agreements, click the Next andFinish buttons. Again, you will be asked if you want to install new features. You must click Install or InstallAll. Again, you will be asked to restart the Eclipse Platform. After that, you should see the Clover logo when you

choose Help → About Eclipse SDK.

Figure 2.17. Clover Has Been Installed

This way, you have installed the Eclipse Platform along with the CloverGUI plugin.


12

Creating a New ProjectWhen you want to create a new project, you must do it by choosing File → New → Project.

Figure 2.18. Creating a New Project

Now, you should expand the CloverETL item in the presented list, select CloverETL Project or CloverETLExamples Project and click the Next button.

Figure 2.19. Selecting the New Project Wizard

If you have selected the CloverETL Project item, you will be asked to give a name to your project. You can giveit the Project_01 name and click Finish.


13

Figure 2.20. Giving a Name to a New Project

If you have selected the CloverETL Examples Project item, you will be presented with the following wizard:

Figure 2.21. CloverETL Examples Project

You can select any of the example projects by checking its checkbox. After clicking Finish, any of them (or themall) will appear in the Navigator pane.

After that, you will be asked to change the perspective of the Eclipse Platform for that of CloverGUI.

Figure 2.22. Opening the CloverETL Perspective

Once you have confirmed it by clicking Yes, if you have selected all of the example projects along with a newproject, you can see the following window:


14

Figure 2.23. CloverETL Perspective

On the left side, there is a Navigator pane. In this pane, you can expand the Project_01 folder, for example. Af-ter expanding the project folder, you will be presented with the folder structure. There are subfolders for data (da-ta-in, data-out, data-tmp), metadata (meta), connections (conn), lookup tables (lookup), sequences(seq), transformations (trans) and graphs (graph). In the project folder, there is also a workspace.prmfile. In this file, some important project parameters are set.

Figure 2.24. CloverETL Perspective with Highlighted Navigator Pane and the ProjectFolder Structure


15

Creating a New GraphNow you can create a graph for the Project_01 by choosing File → New → ETL Graph.

(Once you have more projects in you workspace, you should better right-click the desired project in the Navigator

pane and select New → ETL Graph from the context menu.)

Figure 2.25. Creating a New Graph

After clicking the item, you will be asked to give a name to the graph. For example, the name can beProject_01 too. But, in most cases your project will contain more graphs and you can give them names suchas Project_01_###, for example. Or any other names according to what they should do.

Figure 2.26. Giving a Name to a New Graph


16

Remember that you can decide what parameters file should be included to this project along with the graph. Thisselection can be done in the textarea at the bottom of this window. You can locate some other file by clicking theBrowse... button and searching the right one. Or, you can even uncheck the checkbox leaving the graph withouta parameters file included.

We decided to have the workspace.prm file included.

At the end, you can click the Next button. After that, the extension .grf will be added to the selected nameautomatically.

Figure 2.27. Selecting the Parent Folder for the Graph

By clicking Finish, you save the graph in the graph subfolder. Then, an item Project_01_001.grf appearsin the Navigator pane and a tab named Project_01_001.grf appears on the window.

Figure 2.28. CloverETL Perspective with Highlighted Graph Editor


17

You can see that there is a palette of components on the right side of the graph. This palette opens by clicking.

Figure 2.29. Graph Editor with a New Graph and the Palette of Components

You can also look at the workspace.prm file by clicking this item in the Navigator pane, by right-clicking

and choosing Open With → Text Editor from the context menu.

Figure 2.30. Opening the Workspace.prm File

You can see the parameters of your new project. The parameters of imported projects may differ from those ofnew project.


18

Figure 2.31. The Parameters Contained in the Workspace.prm File

We suggest you better do not use backslashes in parameters. You should use single forward slashes instead. Oryou can use double backslashes as well.

Both Linux and Windows accept forwards slashes, but in case of backslashes, Windows always needs to havedouble backslashes instead of only single ones.

And, with only single forward slashes you will be sure that everything will be working as you are expecting.

Running GraphsWhen you have already created or imported graphs into your projects, you can run them.

There are three ways of running a graph:

• You can select Run → Run as → CloverETL graph from the main menu.

• Or you can right-click in the Graph editor, then select Run as in the context menu and click the CloverETLgraph item.

• Or you can click the green circle with white triangle in the tool bar located in the upper part of the window.


19

Figure 2.32. Running a Graph from the Main Menu

Figure 2.33. Running a Graph from the Context Menu


20

Figure 2.34. Running a Graph from the Upper Tool Bar

In each of these cases you can also open the Open Run Dialog, fill in the project name, the graph name and otherparameters and click the Run button.

Figure 2.35. Open Run Dialog

In this Open Run Dialog you can also select set the Java memory size in Megabytes. It is important to definesome memory size because Java Virtual Machine needs this memory capacity to run the graphs. You must definemaximum memory size for JVM by selecting the proper value:


21

Figure 2.36. Setting Up Memory Size

When using either of these three ways, the process of running the graph can be seen in the Console.

Figure 2.37. Successful Data Parsing


22

Figure 2.38. Console Tab with an Overview of the Graph Processing

And, below the edges, counts of parsed data should appear:

Figure 2.39. Counting Parsed Data

If you want, you can enlarge the font of these numbers. To do that, select Window → Preferences...


23

Figure 2.40. Enlarging the Font of Numbers

Then, expand the CloverETL item, select Tracking and type the desired font size to the Record number fontsize area. By default, it is set to 7.

Figure 2.41. Setting the Font Size

24

Chapter 3. ImportCloverGUI allows you to import Clover projects, graphs and/or metadata. If you want to import something, select

File → Import... from the main menu.

Figure 3.1. Import (Main Menu)

Or right-click in the Navigator pane and select Item... from the context menu.

Figure 3.2. Import (Context Menu)

Import

25

After that, the following window opens. When you expand the Clover ETL category, the window will look likethis:

Figure 3.3. Import Options

Import Clover ProjectsIf you select the Import external Clover.ETL projects item, you can click the Next button and you will seethe following window:

Figure 3.4. Import Projects

You can find some directory or compressed archive file (the right option must be selected by switching the ra-diobuttons). If you locate the directory, you can also decide whether you want to copy or link the project to yourworkspace. If you want the project be linked only, you can leave the Copy projects into workspace checkboxunchecked. Otherwise, it will be copied. Linked projects are contained in more workspaces. If you select somearchive file, the list of projects contained in the archive will appear in the Projects area. You can select some orall of them by checking the checkboxes that appear along with them.

Import

26

Import GraphsIf you select the Import graphs - version conversion item, you can click the Next button and you will see thefollowing window:

Figure 3.5. Import Graphs

You must select the right graph(s) and specify from which directory into which folder the selected graph(s) shouldbe copied. By switching the radio buttons, you decide whether complete folder structure or only selected foldersshould be created. You can also order to overwrite existing sources without warning. You can also convert it(them)from 1.x.x to 2.x.x version of GUI.

Import

27

Import MetadataYou can also import metadata from XSD or DDL.

Metadata from XSDIf you select the Import metadata from XSD item, you can click the Next button and you will see the followingwindow:

Figure 3.6. Import Metadata from XSD

You must select the right metadata and specify from which directory into which folder the selected metadatashould be copied. By switching the radio buttons, you decide whether complete folder structure or only selectedfolders should be created. You can also order to overwrite existing sources without warning. You can specify thedelimiters or default field size.

Import

28

Metadata from DDLIf you select the Import metadata - transform from DDL item, you can click the Next button and you will seethe following window:

Figure 3.7. Import Metadata from DDL

You must select the right metadata and specify from which directory into which folder the selected metadatashould be copied. By switching the radio buttons, you decide whether complete folder structure or only selectedfolders should be created. You can also order to overwrite existing sources without warning. You need to specifythe delimiters.

29

Chapter 4. ExportCloverGUI allows you to export Clover graphs and/or metadata. If you want to export something, select File →Export... from the main menu. Or right-click in the Navigator pane and select Item... from the context menu. Afterthat, the following window opens. When you expand the Clover ETL category, the window will look like this:

Figure 4.1. Export Options

Export GraphsIf you select the Export graphs item, you can click the Next button and you will see the following window:

Figure 4.2. Export Graphs

You must check the graph(s) that should be exported in the right pane. You also must find the output directory. Inaddition to it, you can select whether external (shared) metadata, connections and parameters should be internalized

Export

30

and inserted into graph(s). This must be done by checking corresponding checkboxes. You can also remove guitags from the output file by checking the Strip gui tags checkbox.

Export Graphs to HTMLIf you select the Export graphs to HTML item, you can click the Next button and you will see the followingwindow:

Figure 4.3. Export Graphs to HTML

You must select the graph(s) and specify to which output directory the selected graph(s) should be exported. Youcan also generate index file of the exported pages and corresponding graphs and/or images of the selected graphs.By switching the radio buttons, you are selecting either the scale of the output images, or the width and the heightof the images. You can decide whether antialiasing should be used.

Export

31

Export Metadata to XSDIf you select the Export metadata to XSD item, you can click the Next button and you will see the followingwindow:

Figure 4.4. Export metadata to XSD

You must select the metadata and specify to which output directory the selected metadata should be exported.

Export ImageIf you select the Export image item, you can click the Next button and you will see the following window:

Figure 4.5. Export Image

Export

32

This option allows you to export images of the selected graphs only. You must select the graph(s) and specify towhich output directory the selected graph(s) images should be exported. You can also specify the format of outputfiles - bmp, jpeg or png. By switching the radio buttons, you are selecting either the scale of the output images,or the width and the height of the images. You can decide whether antialiasing should be used.

33

Appendix A. Setting and ConfiguringJava ToolsThis new release of CloverGUI contains Janino compiler. Therefore, you can compile .java files and Java sourcecode located outside the graph or inside the graph, respectively. For this reason you do not need to install JavaDevelopment Kit any more. Neither you need to set Java Runtime Environment. Janino compiler can do the samewhat Java Development Kit can do. However, if you should want to set JRE or add JDK libraries, you can do itas shown in this Appendix A. Remember that you should use Java 1.5!

Setting Java Runtime EnvironmentWhen you want to set JRE, you can do it by selecting Window → Preferences.

Figure A.1. Setting Java Runtime Environment

After clicking the item, you can see the following window:

Setting and Configuring Java Tools

34

Figure A.2. Preferences Wizard

Now you must expand the Java item and select the Installed JREs item as shown above. If you have installedJRE 1.6, you can see the following window:

Figure A.3. Installed JREs Wizard

You should switch Java 1.6 to 1.5. Select the right JRE version by clicking the mentioned Add button, after whichyou will be presented with the following window:


35

Figure A.4. Adding a New JRE

Once you have found the right folder with JRE (version 1.5), the libraries with .jar files appear in the JREsystem libraries textarea.

Figure A.5. Selecting New JRE Files

After clicking the OK button, you will have two JRE, from which you must select the right one by checking:


36

Figure A.6. Selecting a JRE

After doing this and clicking the OK button, you have prepared the right JRE for CloverGUI.


37

Installing Java Development KitAs mentioned above, the new release of CloverGUI contains Janino compiler. But, if you want, you can installJDK and add it to the project by selecting its item, right-clicking it and selecting the Properties item in the contextmenu. We suggest once more you use JDK 1.5!

Figure A.7. Adding Java Development Kit

Then you can select the Java Build Path item and its Libraries tab. You must click the Add External JARsbutton on it.

Figure A.8. Searching for JDK Jars


38

You can add all .jar files contained in the selected jdk folder into the Libraries tab. (This window below istaken from Windows Vista, you may see some other window in your OS.)

Figure A.9. Selecting JDK Jars

After confirming this, the .jar files will be added to the project as shown below.

Figure A.10. Adding JDK Jars

39

Appendix B. Import Other ExamplesIn addition to the Clover examples project, you can also download and import other Clover examples.

You can find examples at the following site: www.cloveretl.org/download/examples.

The structure of some older examples folder may differ from that of new projects. The older examples consistof one project only instead of four projects in case of newer examples.

Figure B.1. Import Examples (Main Menu)

http://www.cloveretl.org/download/examples

Import Other Examples

40

Figure B.2. Import Examples (Context Menu)

You must expand the Clover ETL item and select Import external Clover.ETL projects.

Figure B.3. Import External Clover Project

You can choose Select root directory or Select archive file by switching the radio button. When you chooseSelect archive file, you can find the desired archive file. For example, you can download any of the .zip filescontaining our examples that are provided at the following site: www.cloveretl.org/download/examples. Whenyou select some of these .zip files, an examples label appears checked in the window. You only need to clickFinish.

(If you import some directory, you also can decide whether the examples should be copied or linked to theworkspace. If you check the Copy projects into workspace checkbox, it will be copied there. Otherwise, theexamples will be linked only. Linked projects are shared by more workspaces.)

http://www.cloveretl.org/download/examples


41

Figure B.4. Examples Selected

In the Navigator pane, there appears the examples folder or any other name of the project if you have importedany.

Figure B.5. CloverETL Perspective with a Set of Projects

After expanding the examples folder in the Navigator pane, you can see the tree of graphs and other folders.


42

Figure B.6. Newer Examples Files and Folders in the Navigator Pane

Older examples had a different folder structure as you can see in the following screenshot:

Figure B.7. Older Examples Files and Folders in the Navigator Pane

When you want to work with any of the graphs (your proper or imported ones), you only need to select such graphfile in the Navigator pane and add it to the Graph Editor by double-clicking the graph item in the Navigator pane.

The mentioned folder structure of some of older projects may differ from that of new projects. Also theworkspace.prm file sets different parameters. To work with the graphs, you must set the WORKSPACE pa-rameter in the following way:


43

Figure B.8. Setting the WORKSPACE Parameter

As you can see, the WORKSPACE parameter can be set with the help of forward slashes in both Windows andLinux, but it can be set with the help of backslashes in Windows as well. However, in Windows, there shouldbe used double backslashes instead of only single ones. So, in Windows, wherever you have some path withonly single backslashes, you should change them for double backslashes or you should change each of the singlebackslashes for a single forward slash. For this reason it is important that you add another backslash to every singlebackslash contained in the PROJECT parameter in the workspace.prm file or in each other place. But, we cansuggest that you use only forward slashes for Clover tools in both Windows and Linux. It is the best solution ofthis discrepancy.

When you double-click any of the graph items in the Navigator pane, such graph opens in the Graph Editor. Itis displayed with the help of two tabs - as a graph and as a source code.


44

Figure B.9. Example Graph

Part II. Objects, Structures and Tools

46

Chapter 5. CloverGUI StructuresThis chapter presents a description of the appearance and structures of CloverGUI.

CloverGUI PerspectiveThe CloverGUI perspective consists of 4 panes:

Figure 5.1. CloverGUI Perspective

• Graph Editor with Palette of Components is in the upper right part of the window.

In this pane you can build your graphs. Palette of Components serves to select components, move them intothe Graph Editor, connect them by edges. This pane has two tabs.

• Navigator pane is in the upper left part of the window.

There are folders and files of your projects in this pane. You can expand or collaps them and open any graphby double-clicking its item.

• Outline pane is in the lower left part of the window.

There are all of the parts of the graph that is opened in the Graph Editor.

• Tabs pane is in the lower right part of the window.

You can see the data parsing process in these tabs.

CloverGUI PanesNow we will present you a more detailed description of the panes.

But, first we would like to suggest you that if you want to extend any of the tabs of some pane, you only need todouble-click such a tab. After that, the pane will extend to the size of the whole window. When you double-clickit again, it will return to its original size.

CloverGUI Structures

47

Graph Editor with Palette of Components

The most important pane is the Graph Editor with Palette of Components.

To create a graph, you need to open the Palette tool by clicking the arrow which is located above the Palettelabel or by holding the cursor on the Palette label. You can close the Palette again by clicking the same arrowor even by simple moving the cursor outside the Palette tool. You can even change the shape of the Palette byshifting its border in the Graph Editor and/or move it to the left side of the Graph Editor by clicking the labeland moving it to this location.

The name of the user that has created the graph and the name of its last modifier are saved to the Source tabautomatically.

It is the Palette tool from which you can select a component and paste it to the Graph Editor. To paste thecomponent, you only need to click the component label, move the cursor to the Graph Editor and click again.After that, the component appears in the Graph Editor. You can do the same with the other components.

Once you have selected and pasted more components to the Graph Editor, you need to connect them by edgestaken from the same Palette tool. To connect two components by an edge, you must click the edge label in thePalette tool, move the cursor to the first component, connect the edge to the output port of the component byclicking and move the cursor to the input of another component and click again. This way the two componentswill be connected. Once you have terminated your work with edges, you must click the Select item in the Palettewindow.

After creating or modifying a graph, you must save it by selecting the Save item from the context menu. Thegraph becomes a part of the project in which it has been created. A new graph name appears in the Navigatorpane. All components and properties of the graph can be seen in the Outline pane when the graph is opened inthe Graph Editor.

Figure 5.2. Graph Editor with an Opened Palette of Components

If you want to close any of the graphs that are opened in the Graph Editor, you can click the cross at the right sideof the tab, but if you want to close more tabs at once, right-click any of the tabs and select a corresponding itemfrom the context menu. There you have the items: Close, Close other, Close All and some other ones. See below:


48

Figure 5.3. Closing the Graphs

From the main menu, you can also select the CloverETL item (but only when the Graph Editor is highlighted)and you can add grid to the Graph Editor by selecting the Grid item from the main menu.

Figure 5.4. Grid in the Graph Editor

By clicking the Graph auto-layout item, you can change the layout of the graph. You can see how it changeswhen you select the Graph auto-layout item in case you have opened the graphAggregateUnsorted.grf.Before selecting this item, the graph looks like this:


49

Figure 5.5. A Graph before Selecting Auto-Layout.

Once you have selected the mentioned item, graph could look like this:

Figure 5.6. A Graph after Selecting Auto-Layout.

Another possibility of what you can do with the Graph Editor is the following:

When you push and hold down the left mouse button somewhere inside the Graph Editor, drag the mouse through-out the pane, a rectangle is created. When you create this rectangle in such a way so as to surround some of the graphcomponents and finally release the mouse button, you can see that these components have become highlighted.(The first and second ones on the left in the graph below.) After that, six buttons (Align Left, Align Center, Align


50

Right, Align Top, Align Middle and Align Bottom) appear highlighted in the tool bar above the Graph Editoror Navigator panes. (With their help, you can change the position of the selected components.) See below:

Figure 5.7. Six New Buttons in the Tool Bar Appear Highlighted (Align Middle is shown)

You can do the same by right-clicking inside the Graph Editor and selecting the Alignments item from thecontext menu. Then, a submenu appears with the same items as mentioned above.

Figure 5.8. Alignments from the Context Menu

Remember that you can copy any highlighted part of any graph by clicking Ctrl - C and subsequently Ctrl -V after opening some other graph.


51

Navigator Pane

In the Navigator pane, there is a list of your projects, their subfolders and files. You can expand or collaps them,view them and open.

All graphs of the project are situated in this pane. You can open any of them in the Graph Editor by double-click-ing the graph item.

Figure 5.9. Navigator Pane

Outline Pane

In the Outline pane, there are shown all components of the selected graph. There you can create or edit all prop-erties of the graph components, edges metadata, database connections or JMS connections, lookups, parameters,sequences, and notes. You can both create internal properties and link external (shared) ones. Internal propertiesare contained in the graph and are visible there. You can externalize the internal properties and/or internalize theexternal (shared) properties. You can also export the internal metadata. If you select any item in the Outline pane(component, connection, metadata, etc.) and click Enter, its editor will open.

Figure 5.10. Outline Pane

Note that the two buttons in the upper right part of the Outline pane have the following property:

By default you can see the tree of components, metadata, connections, parameters, sequences, lookups and notesin the Outline pane. But, when you click the button that is the second from the left in the upper right part of theOutline pane, you will be switched to another representation of the pane. It will look like this:


52

Figure 5.11. Another Representation of the Outline Pane

You can see a part of some of the example graphs in the Graph Editor and you can see the same graph structurein the Outline pane. In addition to it, there is a light-blue rectangle in the Outline pane. You can see exactly thesame part of the graph as you can see in the Graph Editor within the light-blue rectangle in the Outline pane.By moving this rectangle within the space of the Outline pane, you can see the corresponding part of the graph inthe Graph Editor as it moves along with the rectangle. Both the light blue-rectangle and the graph in the GraphEditor move equally.

You can do the same with the help of the scroll bars on the right and bottom sides of the Graph Editor.

To switch to the tree representation of the Outline pane, you only need to click the button that is the first fromthe left in the upper right part of the Outline pane.

Tabs PaneIn the lower right part of the window, there is a serie of tabs.

• Properties tab

In this tab, you can view and/or edit the component properties. When you click a component, properties of theselected component appear in this tab.

• Console tab

In this tab, process of reading, unloading, transforming, joining, writing, and loading data can be seen.

• Problems tab

In this tab, you can see error messages, warnings, etc. When you expand any of the items, you can see theirresources (name of the graph), their paths (path to the graph), their location (name of the component).

• Clover - graph tracking tab

In this tab, you can see a brief description of the graph that has been run successfully. The names of the com-ponents, grouped by phases (with their using time in seconds, their using capacity in percents), status of all


53

components, CPU time that has been used for them (in seconds), CPU size that has been used for them (inpercents), average of bytes processed (in Bytes per second), average of rows processed (in rows per second),total bytes processed (in Bytes, Kilobytes, etc.), total rows processed (in rows).

• Clover - Log tab

In this tab, you can see the entire log from the process of data parsing that is created after running a graph. Therecan be a set of logs from more runs of graphs.

Figure 5.12. Properties Tab

Figure 5.13. Console Tab

Figure 5.14. Problems Tab

Figure 5.15. Clover - Graph Tracking Tab

Figure 5.16. Clover - Log Tab

54

Chapter 6. Building TransformationGraphTo build a graph, you must select graph components, set up their properties, connect these components by edges,select data files and/or database tables that should be read or unloaded from, written or loaded to, create metadatadescribing data, assign them to edges, create database connections or JMS connections, create lookup tables and/or create parameters and sequences. Once all of it is done, you can run the graph.

We are presenting a more detailed description of individual parts of every graph in the following chapters andsections.

55

Chapter 7. EdgesThis chapter presents an overview of the edges. It describes what they are, how they can be connected to thecomponents of a graph, how metadata can be assigned to them and propagated through them, how the edges canbe debugged and how the data flowing through the edges can be seen.

What Are the Edges?The edges represent data flowing from one component to another.

Connecting Components by the EdgesWhen you have selected and pasted at least two components to the Graph Editor, you must connect them byedges taken from the Palette tool. In order to connect two components by an edge, you must click the edge labelin the Palette tool, move the cursor onto one of the two components, connect the edge to its output port by clickingthe left mouse button on the component and move the cursor to the input of another component and click again.This way the two components will be connected.

Some components only receive data from their input port(s) and write it to some data resources (Writers, includ-ing Trash), other components read data from data resources or generate data and send it into their output port(s)(Readers, including DataGenerator), and other components both receive data and send it to other components(Transformers and Joiners). And the last group of components either need to be connected to some edges (non-executing components such as CheckForeignKey and LookupTableReaderWriter) or not (Executing Compo-nents). But almost all components must be connected by edges.

When pasting an edge to the graph, as desribed, it always bounds to a component port. The number of ports ofsome components is strictly specified, while in others the number of ports is unlimited. If the number of ports isunlimited, a new port is created by connecting a new edge. Once you have terminated your work with edges, youmust click the Select item in the Palette tool or click Esc on the keyboard.

If you have already connected two components by an edge, you can remove this edge to any other component.To do that, you must highlight the edge by clicking, then move to the port to which the edge is connected (inputor output) until the arrow mouse cursor turns to a cross. Once the cross appears, you can drag the edge to someof the other free ports of any component. Remember that you can only replace output port by another output portand input port by another input port.

Assigning Metadata to the EdgesMetadata must be created and assigned to an edge. The edge has still a form of a dashed line. Only after metadatahave been created and assigned to the edge, the line becomes continuous.

You can create metadata as shown in corresponding sections below, however, you can also double-click the empty(dashed) edge and select Create metadata from the menu. Or you can link some existing external metadata fileby selecting Link shared metadata.

Edges

56

Figure 7.1. Creating Metadata on the Empty Edge

You can also assign metadata to an edge by right-clicking the edge, choosing the Select metadata item from thecontext menu and selecting the desired metadata from the list.

Propagating Metadata through the EdgesWhen you have already assigned metadata to the edge, you need to propagate the assigned metadata to other edgesthroughout a component.

To propagate metadata, you must also open the context menu by right-clicking the edge, select the Propagatemetadata item and click this item. The metadata will be propagated until they reach the component in whichmetadata can be changed (for example: reformat, join, etc.).

For the other edges, you must define another metadata and propagate them again if necessary.

Debugging the EdgesShould there occur some errors, if you obtain incorrect or unexpected results when running some of your graphs,you must debug the graph.

To do that, it is necessary to decide first where the problem may arise from. Then you must right-click the edgesthat are under your suspicion. Now you must click the Enable debug item from the context menu. After that, abug icon appears on the edge meaning that a debugging will be performed.

You can do the same by opening the Properties tab of the Tabs pane and setting Debug mode to true. It is falseby default.

Then you can set up some of the other properties of the edge. First, you should decide how many records you wantto view. Note that all of the records will be parsed, but only some of them can be displayed for viewing. So, typesome number to the Debug max. records field.

Then you can decide whether you want to view the desired number of selected records only from their start (fromthe first record), or you want to view them selected from all of the records evenly. In such a case you need to

Edges

57

set the Debug sample data item to true. This means that the desired number of records will be selected evenlythroughout all of the records. This property is set to false by default. This means that records are selected onlyfrom the start. But, sometimes it might be better to set this property to true.

Figure 7.2. Properties of an Edge

Finally, you can create some filter expression for debugging. After clicking the Debug filter expression item, youcan create or type the filter expression in the Filter editor wizard.

Figure 7.3. Filter Editor Wizard

This wizard consists of three panes. The left one displays the list of record fields, their names and data types. Youcan select any of them by double-clicking or dragging and dropping. Then the field name appears in the bottomarea with the dollar sign before that name. You can also use the functions selected from the right pane of thewindow. Below this pane, there are both comparison signs and logical connections. You can select any of thenames, functions, signs and connections by double-clicking. After that, they appear in the bottom area. You canwork with them in this area and complete the creation of the filter expression. You can validate the expression, exitthe creation by clicking Cancel or confirm the expression by clicking OK. Only the records that meet the filterexpression can be viewed after running the graph. The way of viewing them will be described in the next section.

Viewing the Data Flowing through the EdgesIn order to view the records that have flown trough the edge and met the filter expression, you must open thecontext menu by right-clicking. Then you must click the View data item. After that, a View data dialog opens.Note, that even here you can create a filter expression in the same way as desribed above.

You must select the amount of records that should be displayed and confirm it by clicking OK.

Edges

58

Figure 7.4. View Data Dialog

The records are shown in another View data dialog. This dialog has grid mode. You can sort the records in anyof its columns in ascending or descending order by simply clicking its header.

Figure 7.5. Viewing Data

Above the grid, there are three labels: Edit, View, Hide/Show columns.

By clicking the Hide/Show columns label, you can select which columns should be displayed: all, none, onlyselected. You can select any option by clicking.

Figure 7.6. Hide/Show Columns when Viewing Data

By clicking the View label, you are presented with two options: You can decide whether you want to view alsothe unprintable characters, or not. And, you can decide whether you want to view only one record separately. Such

Edges

59

a record appears in the View record dialog. At the bottom of this dialog, you can see some arrow buttons. Theyallow user to browse the records and view them in sequence. Note that by clicking the button most on the right, youcan see the last record of the displayed records, but it does not display the record that is the last of the processed.

Figure 7.7. View Record Dialog

By clicking the Edit label, you are presented with four options.

• You can select the number of record or line you want to see. Such a record will be highlighted after typing itsnumber and clicking OK.

• Another option opens the Find dialog. First of all, this wizard contains some textarea you can type some ex-pression into. Then, if you check the Match case checkbox, the searching will be case sensitive. If you checkthe Entire cells checkbox, only the cells that meet the expression completely will be highlighted. If you checkthe Regular expression checkbox, the expression you have typed into the textarea will be used as a regular ex-pression. You can also decide whether you want to search some expression in the direction of rows or columns.You can also select what column shall it be searched in: all, only visible, one column from the list. And, as thelast option, you can select whether you want to find all cells that meet some criterion or only one of them.

Figure 7.8. Find Dialog

• As the last option, you can copy some of your records or their part. You need to select whether you want tocopy the entire record either to string, or as a record (in this last case you can select the delimiter as well). Orwhether you want to copy only some of the record fields. After clicking the OK button, you only need to choosethe location where it shall be copied into and past it there.

Edges

60

Figure 7.9. Copy Dialog

Types of EdgesEvery edge has some internal buffer, you can select among edges by clicking the Select edge item and clickingsome of the presented types.

Figure 7.10. Selecting the Edge Type

Types of edges can be set to one of the following:

• Direct edge. This type of edge has a buffer in memory, what makes possible faster data flow. This is the edgedefault type.

• Buffered edge. This type of edge has also a buffer in memory, but, if necessary, it can store data on disk aswell. Thus the buffer size is unlimited. It has two buffers, for reading and writing.

• Direct fast propagate edge. This is an alternative implementation of the Direct edge. It makes possible fastdata flow too.

• Phase connection edge. This edge cannot be selected, it is created automatically between two components withdifferent phase numbers.

If you do not want to specify some explicit edge type, you can let selected the default option: Detect default. Insuch a case, Clover itself decides which edge type should be used.

Edges

61

Colors of the Edges• When you connect two components by an edge, it is gray and dashed.

• After assigning metadata to the edge, it becomes solid, but still remains gray.

• When you click any metadata item in the Outline pane, all edges with the selected metadata become blue.

• If you click an edge in the Graph Editor, the selected edge becomes black and all of the other edges with thesame metadata become blue. (In this case, metadata are shown in the edge tooltip as well.)

Figure 7.11. Metadata in the Tooltip

62

Chapter 8. MetadataEvery edge of any graph carries some metadata information. So, each of your graphs containing some edge oredges contains metadata as well. These can be either internal, or external (shared).

If metadata are internal, they are part of the graph. They are contained in the graph and you can see them whenyou look at the source tab in the Graph Editor.

If metadata are external (shared), they are located outside the graph in some metadata file (in the meta folderby default). If you look at the Source tab, you can only see a link to such external file. It is in that file wheremetadata are described.

Lets suppose that you have more graphs that use the same data files or the same database tables or any other dataresource. For either of such graphs you can have the same metadata. These metadata can be either in each of thesegraphs separately, or all of the graphs can share them.

It is more convenient and more simple to have one metadata for more graphs in one location, i.e. to have oneexternal file (shared by all of these graphs) that is linked to these various graphs that use the same data resource.That would be very difficult if you should work with some metadata for more graphs separately in case you wouldhave to make some changes in your metadata. In such a case you should have to change the same characteristicsin all of the other graphs. As you can see, much better is to change the desired property in only one location -in a metadata file.

On the other hand, if you want to give someone any of your graphs, you must give to such a person not only thegraph, but also metadata information. In such a case, more simple is to have metadata contained in your graph.

CloverGUI helps you to solve this problem whether to have internal or external metadata. If you have somemetadata in some file or more files outside the graph, you can internalize such a metadata. That means - you can putthem into your graph! After doing that, you do not need to give him or her your graph and its metadata separately.You can give only the graph with internal metadata. The external (shared) metadata file still remain to exist, butthe metadata have already become a part of the desired graph. Subsequently, the person who receives your graphcan externalize such metadata after receiving them! That means - he or she can create a new metadata file and linkthe resulting file to the graph. It is also possible to export metadata. By exporting any internal metadata you createan external (shared) metadata file, but the original internal metadata still remain in your graph.

The same is valid for connections (both database connections and JMS connections) and parameters. Also con-nections and parameters can be internal and external (shared). It is also possible to externalize internal connectionsand/or parameters and internalize external (shared) connections and/or parameters. But they cannot be exported.If you wanted to export them, you should externalize them and internalize again. The external connection andparameter would still remain exist.

Internal MetadataAs mentioned above, internal metadata are part of a graph, they are contained in it and can be seen in its source tab.

How You Can Create Internal MetadataIf you want to create internal metadata, you can do it in two ways:

• You can do it in the Outline pane.

In the Outline pane, you can select the Metadata item and open the context menu by right-clicking and selectthe New metadata item there.

• You can do it in the Graph Editor.

Metadata

63

In the Graph Editor, you must open the context menu by right-clicking any of the edges. There you can seethe New metadata item.

Figure 8.1. Creating Internal Metadata in the Outline Pane

Figure 8.2. Creating Internal Metadata in the Graph Editor

In both cases, after selecting the New metadata item, a new submenu appears. There you can select the way howto define metadata.

Now you have three possibilities for either case mentioned above: If you want to define metadata yourself, youmust select the User defined item or, if you want to extract metadata from a file, you must select the Extract

Metadata

64

from flat file or Extract from xls file items, if you want to extract metadata from a database, you must select theExtract from database item. This way, you can only create internal metadata.

Externalizing Internal MetadataOnce you have created internal metadata as a part of a graph, you have them in your graph, once they are containedand visible in the graph, you may want to convert them to external (shared) metadata. In such a case, you wouldbe able to use the same metadata for more graphs (more graphs would share them).

You can externalize internal metadata into external (shared) one by right-clicking some of the internal metadataitems in the Outline pane, clicking Externalize metadata from the context menu, selecting the project you wantto add metadata into, expanding that project, selecting the meta folder, renaming the metadata file, if necessary,and clicking Finish.

Then, the internal file disappears from the Outline pane metadata folder, but, at the same location, there appearsa newly created metadata file.

The same metadata file appears in the meta subfolder in the Navigator pane.

Exporting Internal MetadataThis case is somewhat similar to that of externalizing metadata. But, now you create a metadata file that is outsidethe graph in the same way as that of externalized file, but such a file is not linked to the original graph! Onlya metadata file is being created. Subsequently you can use such a file for more graphs as an external (shared)metadata file as mentioned in the previous sections.

You can export internal metadata into external (shared) one by right-clicking some of the internal metadata itemsin the Outline pane, clicking Export metadata from the context menu, selecting the project you want to addmetadata into, expanding that project, selecting the meta folder, renaming the metadata file, if necessary, andclicking Finish.

After that, the Outline pane metadata folder remains the same, but in the meta folder in the Navigator pane thenewly created metadata file appears.

Figure 8.3. Externalizing and/or Exporting Internal Metadata

Metadata

65

Figure 8.4. Selecting a Location for a New Externalized and/or Exported InternalMetadata

External (Shared) MetadataAs mentioned above, external (shared) metadata are metadata that serve for more graphs than only one. They arelocated outside the graph and that is why more graphs share them.

How You Can Create External (Shared) Metadata

If you want to create shared metadata, you can do it in two ways:

• You can do it by selecting File → New → Other in the main menu.

To create external (shared) metadata, after clicking the Other item, you must select the CloverETL item,expand it and decide whether you want to define metadata yourself (Define by hand), extract them from a file(Flat file or XLS file), or extract them from a database (Database).

• You can do it in the Navigator pane.

To create external (shared) metadata, you can open the context menu by right-clicking, select New → Others-from it, and after opening the list of wizards you must select the CloverETL item, expand it and decide whetheryou want to define metadata yourself (Define by hand), extract them from a file (Flat file or XLS file), orextract them from a database (Database).

Metadata

66

Figure 8.5. Creating External (Shared) Metadata in the Main Menu and/or in theNavigator Pane

Linking External (Shared) Metadata

After their creation (see previous sections), shared metadata must be linked to all of the graphs in which they

should be used. You must do it in the context menu by selecting New metadata → Link shared definition (Formore information see Section "How You Can Create Internal Metadata".), and, after clicking it, you only need toselect the metadata file from the files contained in the File selection wizard.

Internalizing External (Shared) Metadata

Once you have created and linked external (shared) metadata, in case you want to put them into the graph, you maywant to convert them to internal metadata. In such a case you would be able to see their structure in the graph itself.

You can internalize external (shared) metadata file into internal metadata by right-clicking some of the external(shared) metadata items in the Outline pane and clicking Internalize metadata from the context menu.

Then, the external (shared) metadata item disappears from the Outline pane metadata folder, but, at the samelocation, there appears the newly created internal metadata item.

However, the original external (shared) metadata file still remain to exist in the meta subfolder in the Navigatorpane.

Metadata

67

Figure 8.6. Internalizing External (Shared) Metadata

The Resources From Which You Can ExtractMetadataAs mentioned above, metadata describe structure of data. And, since data itself is contained in flat files, in XLSfiles, in DBF files, in XML files or in database tables, you must extract metadata in a different way for either ofthese data resources. We will describe how to create or extract (for some files only) metadata from files mentionedabove. Let`s start with a flat file.

The description is valid for both internal and external (shared) metadata.

Extracting Metadata from a Flat File

If you want to extract metadata from a flat file (Flat file for shared definition or Extract from file for internaldefinition), you must do it in the following way:

When you want to create metadata definition by extracting from a flat file, you must click the corresponding item(Flat file for shared definition or Extract from flat file for internal definition). After that, a Flat file wizard opens.

In that wizard, you must type the file name or find it with the help of the Browse... button. Once you have selectedthe file, you must specify the Encoding and Record type options as well.

If the fields of a record are separated from each other by some delimiters, you need to select Delimited as theRecord type option. If the fields are of some defined sizes, you need to select the Fixed Length option.

In the Input file pane below you can see the data from the file.

Metadata

68

Figure 8.7. Extracting Metadata from Delimited Flat File

Figure 8.8. Extracting Metadata from Fixed Length Flat File

How to Extract Metadata from Delimited Files

After clicking the Next button, you can see more detailed information about the content of the input file and thedelimiters in the Metadata wizard. It consists of four panes. The first two are at the upper part of the window, thethird is at the middle, the fourth is at the bottom. Either pane can be expanded to the whole window by clickingthe corresponding symbol in the upper right corner.

The first two panes at the top are the panes described in the Section "Metadata Editor". If you want to set up themetadata, you can do it in the way explained in more details in the mentioned section. You can click the symbol

Metadata

69

in the upper right corner of the pane after which the two panes expand to the whole window. The two upper panesare the Record pane and the Field pane. In the Record pane, there are also Delimiters (for delimited files) orSizes (for fixed length files) of the fields or both (for mixed files). In the Field pane, after clicking some of thefields of the record, you can see the structure of the selected individual field: the Properties of the field along withtheir values. Some Properties have default values, whereas others have not. Also here, you can change the Name,Type, Format, Nullable property, Default, Delimiter, EOF as delimiter, Size, Autofilling, Locale, Format,Shift. (For more details on how you should change the metadata structure see Section "Metadata Editor".)

In the middle there is the third pane. If you expand it to the whole wizard window, you will see the following:

Figure 8.9. Setting Up Delimited Metadata

In addition to the upper two panes, you can also change some metadata settings in the third pane in the middle. Inthis pane you can specify whether the first line of the file contains the names of the record fields. If so, you need tocheck the Extract names checkbox. If you want, you can also click some column header and decide whether youwant to change the name of the field (Rename) or the data type (Retype). If there are no field names in the file,CloverGUI gives them the names Field# as the default names of the fields. By default, the type of all record fieldsis set to string. You can change this data type for any other type by selecting the right option from the presentedlist. These options are as follows: boolean, byte, cbyte, date, decimal, integer, long, numeric,string. (For more detailed description see Section "Data Types".) Also you must specify what kind of delimiteris used in the file (Delimiter). It can be comma, colon, semicolon, space, tabulator, or a sequence of characters.You need to select the right one. Finally, you can also click the Reparse button after which you can see the resultof a new parsing of the file in the pane.

At the bottom of the wizard, the fourth pane displays the data of the file.

In case you are creating internal metadata, you only need to click the Finish button. If you are creating external(shared) metadata, you must click the offered Next button, then select the folder (meta) and name of metadataand click Finish. The extension .fmt will be added to the metadata file automatically.

How to Extract Metadata from Fixed Length Files

After clicking the Next button, you can see more detailed information about the content of the input file and thedelimiters in the Metadata wizard. It consists of four panes. The first two are at the upper part of the window,

Metadata

70

the third is at the middle, the fourth is at the bottom. Either pane can be expanded to the whole wizard windowby clicking the corresponding symbol in the upper right corner.

The first two panes at the top are the panes described in the Section "Metadata Editor". If you want to set up themetadata, you can do it in the way explained in more details in the mentioned section. You can click the symbolin the upper right corner of the pane after which the two panes expand to the whole window. The two upper panesare the Record pane and the Field pane. In the Record pane, there are also Delimiters (for delimited files) orSizes (for fixed length files) of the fields or both (for mixed files). In the Field pane, after clicking some of thefields of the record, you can see the structure of the selected individual field: the Properties of the field along withtheir values. Some Properties have default values, whereas others have not. Also here, you can change the Name,Type, Format, Nullable property, Default, Delimiter, EOF as delimiter, Size, Autofilling, Locale, Format,Shift. (For more details on how you should change the metadata structure see Section "Metadata Editor".)

In the middle there is the third pane. If you expand it to the whole window, you will see the following:

Figure 8.10. Setting Up Fixed Length Metadata

If you want, you can also click some column header and choose one of the following: Rename, Resize, Retype.If you want to change the name of a column, you can choose the Rename option. If there are no field names inthe file, CloverGUI gives them the names Field# as the default names of the fields. Also, the type of all recordfields is set to string by default. If you want to change the data type, you must choose Retype. You can changethe default data type for any other type by selecting the right option from the presented list. These options areas follows: boolean, byte, cbyte, date, decimal, integer, long, numeric, string. (For moredetailed description see Section "Data Types".) You must also change the default sizes of the individual fields(Resize). You may also want to split column, merge column, add one column or more, remove column. And youcan change the sizes by moving the borders of the columns.

At the bottom of the wizard, the fourth pane displays the data of the file.

In case you are creating internal metadata, you only need to click the Finish button. If you are creating external(shared) metadata, you must click the offered Next button, then select the folder (meta) and name of metadataand click Finish. The extension .fmt will be added to the metadata file automatically.

Metadata

71

Extracting Metadata from an XLS FileWhen you want to extract metadata from an XLS file, you must select XLS file for external (shared) metadata orExtract from xls file for internal metadata from the context menu.

In the Sheet properties wizard that appears after clicking either of the two mentioned items, you must browseand locate the desired XLS file and click the Open button.

After that, some properties that you can see in the wizard appear filled with some values. They are Sheet name,Metadata row, Sample data row, Encoding. If they do not appear filled, you can do it yourself. Also, at thebottom, you can see data probe from the selected XLS file.

You can select the Sheet name. You may want to change the encoding as well.

As regards Metadata row and Sample data row: Metadata row is set to 1 and Sample data row is set to 2 bydefault. (Sample data row means the row from which data types are extracted. Metadata row is the row whichcontains the names of the fields. Together they give rise to metadata description of the file.)

If the XSL file does not contain any row with field names, you should set Metadata row to 0. In such a case,headers or codes of columns (letters starting from A, etc.) will serve as the names of the fields.

In case of XSL files, data types are set to their right types thanks to the Sample data row. Also the formats areset to the right format types.

You can also select the Number of lines in preview. By default it is 100.

As the last step, you click either the OK button (when creating internal metadata), or the Next button, select thelocation (meta, by default) and choose some name (when creating external (shared) metadata file). The extension.fmt will be added to the name of metadata file automatically.

Figure 8.11. Extracting Metadata from XLS File

Metadata

72

Extracting Metadata from a DatabaseIf you want to extract metadata from a database (when you select the Database item for external (shared) definitionor the Extract from database item for internal definition), you must have some database connection defined priorto extracting metadata.

In addition to this, if you want to extract internal metadata from a database, you can also right-click any connection

item in the Outline pane and select New metadata → Extract from database.

Figure 8.12. Extracting Internal Metadata from a Database

After each of these three options, a Database Connection wizard opens.

Figure 8.13. Database Connection Wizard

Metadata

73

In order to extract metadata, you must first create database connection as shown in corresponding section. Onceit has been created, User, Password and URL fields become filled in the Database Connection wizard.

Then you must click Next. After that, you can see a database schema.

Figure 8.14. Selecting Columns for Metadata

Now you have two possibilities:

Either you write a query directly, or you generate the query by selecting individual columns of database tables.

If you want to generate the query, hold Ctrl on the keyboard, highlight individual columns from individual tablesby clicking and (at the end) click the Generate button. The query will be generated automatically.

Figure 8.15. Generating a Query

Once you have written or generated the query, you can check its validity by clicking the Validate button.

Then you must click Next. After that, Metadata Editor opens. In it, you must finish the extraction of metadata.

• By clicking the Finish button (in case of internal metadata), you will get internal metadata in the Outline pane.

Metadata

74

• On the other hand, if you wanted to extract external (shared) metadata, you must click the Next button first, afterwhich you will be prompted to decide which project and which subfolder should contain your future metadatafile. After expanding the project, selecting the meta subfolder, specifying the name of the metadata file andclicking Finish, it is saved into the selected location.

Creating Metadata from a DBase File

If you want to create metadata from a dBase file, you need two jars to do so. Both of them are provided withour CloverGUI.

So, if you want to create a dBase file metadata in any of your projects, you must first right-click the project namein the Navigator pane, click the Properties item from the context menu and click Java Build Path.

Figure 8.16. Original Libraries Tab of Java Build Path

There you must open the Libraries tab, click the Add Externals JARS button and locate the two .jar-s men-tioned above.

The two .jar-s are the following:

C:\Users\cloveruser\Desktop\eclipse\plugins\com.cloveretl.gui_2.0.0\lib\lib\cloveretl.engine.jar

C:\Users\cloveruser\Desktop\eclipse\plugins\com.cloveretl.gui_2.0.0\lib\lib\commons-logging.jar

In your case, you may need to change the path to your eclipse folder which is C:\Users\cloverus-er\Desktop in our case. Maybe you also need to change the numbers indicating the CloverGUI release thatare 1.10.1 here.

When you have found these two .jar-s, you must add them to the libraries by clicking Open and then OK.

Metadata

75

Figure 8.17. Adding the Two Libraries for Extracting Metadata from DBASE File

Then you must right-click in the Graph Editor and select Run as → Open Run Dialog... from the context menu.In it, you must collaps all of the graphs, select and double-click the Java Application item. After that, a newconfiguration appears. You must type dbasefile_metadata as its name in the Main tab (or you can chooseany other name for this Java application).

Figure 8.18. Creating Java Application for Extracting Metadata from DBASE File

Now you only need to click the Search... button, locate the main class and select its name from the list providedby the Select Main Type wizard.

Metadata

76

Figure 8.19. Selecting the Main Class

You must select the DBFAnalyzer name and double-click this item. Then the nameorg.jetel.database.dbf.DBFAnalyzer appears in the Main class textarea of the Main tab.

Figure 8.20. Adding the Main Class

Now you must switch to the Arguments tab. There you must type data-in/DBASEFILENAME.DBF and (afterone white space) meta/metadatadbf.fmt. Or you can type any other names depending on what dbase fileyou have in the data-in subfolder and what metadata file you want to create. For different dbase files you mustselect different metadata files.

Metadata

77

Figure 8.21. Adding Arguments

Now, when you click the Run button, you are creating the metadata file with the help of your dbase_metadataconfiguration.

Figure 8.22. Configuration for Extracting Metadata from DBASE File Has Been Created

Then, you can do with the metadata file all that has been described above for the case of the other external(shared) metadata. You can link this external (shared) dbase metadata file to each graph that use the mentionedDBASEFILENAME.DBF. You can assign this metadata to edges, you can internalize the file and, if needed, youcan also externalize this metadata again.

This way, CloverGUI helps you create metadata from every dbase file.

Metadata

78

Creating Metadata by UserIf you want to create metadata yourself (Define by hand for external (shared) definition or User defined forinternal definition), you must do it in the following manner:

After opening the Metadata wizard, you must add a desired number of fields by clicking the plus sign, set up theirnames, their data types, their delimiters, their sizes, formats and all that has been described above.

Once you have done all of that, you must click either OK for internal metadata, or Next for external (shared) meta-data. In the last case, you only need to select the location (meta, by default) and a name for metadata file. Whenyou click OK, your metadata file will be saved and the extension .fmt will be added to the file automatically.

Assigning Metadata to an EdgeWhen you have created metadata, you must assign them to an edge. You need to right-click the edge, choose theSelect metadata item and select the right metadata item from the metadata list.

Figure 8.23. Assigning Metadata to an Edge

Editing MetadataWhen you want to edit already defined metadata, you can do it with the help of the same wizards.

• You can edit both internal and external (shared) metadata by double-clicking an edge. A Metadata editor opens.

• You can edit both internal and external (shared) metadata by using the context menu that has been called outin the Outline pane by right-clicking the Metadata item.

• You can edit both internal and external (shared) metadata from the context menu that has been called out in theGraph Editor by right-clicking an edge of some graph.

In all cases, then you must select the Edit item, after which you will work with the same wizard as when extractingmetadata from a file.

Metadata

79

• Internal and external (shared) metadata can also be edited as a part of the graph or as a separate XML file,respectively, if you open them in the Graph Editor.

You must display the source code in the source tab of the Graph Editor or in the Navigator pane you mustfirst select the project folder, expand it, select the metadata folder, select the metadata definition file (with .fmt

extension) and open it by choosing Open With → Text Editor. Then the metadata definition file appears in theGraph Editor. There you can change its content.

If you select any metadata item in the Outline pane and click Enter, the Metadata editor will open.

Creating Database Table on the basis ofMetadata and Database ConnectionAs the last option, you can also create a database table on the basis of metadata (both internal and external).

When you select the Create database table item from either of the two context menus (called out from the Outlinepane and/or Graph Editor), a wizard with a SQL query that can create database table opens.

Figure 8.24. Creating Database Table on the Basis of Metadata and Database Connection

You can edit the content of this window if you want.

When you select some connection to a database (for more details see Section "Database Connections"), suchdatabase table will be created.

Metadata EditorYou can also open this editor when selecting some metadata item in the Outline pane and clicking Enter.

Here we will describe the appearance of this Metadata editor.

It consists of two panes - Record pane and Field pane.

In the Record pane, there are also Delimiters (for delimited files) or Sizes (for fixed length files) of the fieldsor both (for mixed files).

Metadata

80

Remember that you must define mixed metadata by hand. You must specify both delimiters and sizes. Delimitersfor some fields, whereas sizes for others.

In the Field pane, after clicking some of the fields of the record, you can see the structure of the selected individualfield: the Properties of the field along with their values. Some Properties have default values, whereas othershave not. Also here, you can change the Name, Type, Format, Nullable property, Default, Delimiter, EOF asdelimiter, Size, Autofilling, Locale, Format, Shift.

It must be mentioned now that field delimiters are mostly the same for all of the record fields. Thus, the delimiterlocated on the first row is the default field delimiter that is used by default for all fields (Except the last one - itis the record delimiter itself. It is displayed grayish.). These used default field delimiters are displayed grayishfor all fields.

Nevertheless, you can set delimiters for all fields (including the record delimiter) to any other values.

Note that the delimiter on the first row is not the record delimiter, it is the default field delimiter. The recorddelimiter can be seen on the last row. It is grayish as well. Record delimiter is set to \r\n by default, but youcan select any other record delimiter.

Now we will explain how you should understand the whole system of delimiters in metadata editor:

You can see the numbers in the first column of this metadata editor. They are the numbers of individual recordfields. The field names that correspond to these numbers are on the right-hand side from these numbers. In thesame way, the delimiters that correspond to these fields are on the right-hand side from these field names.

Remember that the first field lies at the right-hand side from the number "1" and the delimiter that follows thisfield (first field) on the same row of this metadata editor corresponds to this field in the following way:

In the same way as this first delimiter follows the first field name on this row of this metadata editor, the samedelimiter follows the first field within the whole structure of each record.

Thus, the delimiter is located at the right-hand side from the field name on the same row of metadata editor and(in the same way) the same delimiter is located next to this field at the right-hand side from the same field withinthe record structure.

And what is also important:

You can see the last row of metadata editor. The delimiter on this row follows the field name on the same last rowof this metadata editor. In the same way, this delimiter follows the last field within the record structure. This is therecord delimiter. The record delimiter is the same delimiter that follows the field on the last row of metadata editor!

If you want to change the record delimiter for any other value, click the second column in the first row of themetadata editor. It contains the following label in bold: Record: recordname. If you click this label, it turnsto be recordname on the blue background and on the right side, in the Field pane, you can find the follow-ing properties: Default delimiter, Record delimiter, Name, Preview Attachment Metadata Row, Preview At-tachment Sample Data Row, Preview Charset, Preview attachment, Skip first line, Type (type of metadata- delimited, fixed or mixed) and Locale. There you can change the Record delimiter property. You canset this record delimiter to any value.

Below you can see an example of delimited metadata and another one of fixed length metadata. Mixed metadatawould be a combination of both cases. For some field names delimiter would be defined and no size would bespecified, whereas for others size would be defined and no delimiter would be specified. To create such a metadata,you must do it by hand.

Metadata

81

Figure 8.25. Metadata Editor for a Delimited File

Figure 8.26. Metadata Editor for a Fixed Length File

Record PaneOn the left side of the wizard, at the left from the Record pane, there are six buttons (down from the top) - foradding or removing fields, for moving fields to top, up, down or bottom. Above these buttons, there are two arrows(for undoing and redoing, from left to right).

Metadata

82

In addition to it, each column of the Record pane can be sorted in ascending or descending order by simple clickingits header.

Field Names

There you can type the names of the fields. Every name of the field is the same field name as in the Field pane.By changing this name and clicking the Enter button, you are changing the same field name in the Field pane aswell. We suggest you only use the following characters for the field names: [a-zA-Z0-9_].

Data and Record Types

There you can select some type of data. Every type of data is the same data type as in the Field pane. By changingits value and clicking the Enter button, you are changing the same value of the data type in the Field pane as well.

Data Types

• Boolean. This data type can have values either true or false.

• Byte. This data type is an array of bytes. Each byte has values from -127 to 128.

• CByte. This is a compressed array of bytes. Each byte has values from -127 to 128.

• Date. This data type serves to designate date. Its size is 8 bytes.

• Decimal. This data type is defined by scale and precision. Scale is the number of all digits contained in thisnumeric data type as a maximum. Precision is the number of digits after the decimal dot. Thus, data type deci-mal(6,2) can have values from -9999.99 to 9999.99. Its size depends on its precision.

• Integer. This data type can have values from -231 to 231-1. Its size is 4 bytes.

• Long. This data type can have values from -263 to 263-1. Its size is 8 bytes.

• Numeric. This data type can have the following values: 0 and negative values from -(2-2-52).21023 to -2-1074

and positive values from 2-1074 to (2-2-52).21023 Its size is 8 bytes.

• String. This data type is a sequence of characters. Every data type can be converted to a string in a simple way.

Record Types

• Delimited. This is the format of records in which every two adjacent fields are separated between each otherby some delimiter.

• Fixed. This is the format of records in which every field has some defined size.

• Mixed. This is the format of records in which some fields are separated between each other by delimiterswhereas the other fields have some defined sizes. It is the mixture of the two cases above.

Delimiters

There you can select delimiters separating two adjacent fields (pipe, comma, semicolon, colon, tabulator, linefeed, carriage return, carriage return plus line feed, etc.). For either field, it is the same delimiter as in the Fieldpane. By changing its value and clicking the Enter button, you are changing the value of the delimiter in theField pane as well.

Sizes

There you can type the sizes of the fields in characters. These sizes are the same as in the Field pane. For eitherfield, it is the same size as in the Field pane. By changing its value and clicking the Enter button, you are changingthe value of the size in the Field pane as well.

Metadata

83

Field PaneOn the right side of the wizard, there is a Field pane. In this pane, following properties can be set up:

• Default. This is the default value of the field. It is used if you set the Autofilling property to default_value.

• Delimiter. This is the same field delimiter as in the Record pane. By changing its value and clicking the Enterbutton, you are changing the same value of the delimiter in the Record pane as well.

• Name. This is the same field name as in the Record pane. By changing this name and clicking the Enter button,you are changing the same field name in the Record pane as well.

• Nullable. This can be true or false. If it is set to true, the field value can be null. This value is set to true bydefault. The fields are nullable by default.

• Size. This is the same size as in the Record pane. By changing its value and clicking the Enter button, you arechanging the same value of the size in the Record pane as well.

• Type. This is the same data type as in the Record pane. By changing its value and clicking the Enter button,you are changing the same value of the data type in the Record pane as well.

• Autofilling. From this list, you can select some of the functions that should be used to fill some of the fieldsby some of the components when they read data records. It cannot be used in the edges following after theXMLExtract and CloverDataReader components.

• default_value. This function fills the specified record fields of corresponding data type by the valuespecified as the Default property.

• global_row_count. This function fills the specified record fields of any numeric data type in more edgessequentially, in the order in which data records are sent out through the output ports. The numbering startsfrom 0. However, if data records are read from more data sources, the numbering goes continuously throughall data sources. On the other hand, if some edge does not include such a field, corresponding numbers willbe omitted. The others will be written to the specified fields.

• source_row_count. This function fills the specified record fields of any numeric data type in more edgessequentially, in the order in which data records are sent out through the output ports. If data records are readfrom more data sources, the numbering starts from 0 for each data source. And, if some edge does not includesuch a field, corresponding numbers will be omitted. The others will be written to the specified fields.

• metadata_row_count. This function fills the specified record fields of any numeric data type for onemetadata sequentially, in the order in which data records are sent out through the output ports. The numberingstarts from 0. However, if data records are read from more data sources, the numbering goes continuouslythrough all data sources.

• metadata_source_row_count. This function fills the specified record field of any numeric data typefor one metadata sequentially, in the order in which data records are sent out through the output ports. If datarecords are read from more data sources, the numbering starts from 0 for each data source.

• source_name. This function fills the specified record fields of string data type by the name of data sourcefrom which records are read.

• source_timestamp. This function fills the specified record fields of date data type by the timestampcorresponing to the data source from which records are read.

• source_size. This function fills the specified record fields of any numeric data type by the size of datasource from which records are read.

• EOF as delimiter. This can be set to true or false according to whether EOF character is used as delimiter. Itcan be useful when your file does not end with any other delimiter. If you did not set this property to true, runof the graph with such data file would fail (by default it is false).

Metadata

84

• Format. This is a description of the format. For example, the data type of date field can have the format dd/MM/yyyy or dd.MM.yyyy. The integer numbers have format #, etc.

• Locale. This property can be set up according to the localization of your computer. It can be useful for dateformats or for decimal separator, for example.

• Shift. This is the gap between the end of one field and the start of the next one when the fields are part of fixedor mixed record and their sizes are set to some value.

Filter TextareaIn the Filter textarea, you can type any expression you want to search among the fields of the Record pane. Notethat this is case sensitive.

Dynamic MetadataIn addition to all other metadata created or extracted using CloverGUI, you can also write metadata definitionin the Source tab of the Graph Editor pane. Unlike the metadata defined in GUI, such metadata written in theSource tab cannot be edited in GUI.

To define the metadata in the Source tab, open this tab and write there the following:

<Metadata id="YourMetadataId" connection="YourConnectionToDB"sqlQuery="YourQuery"/>

Select any expression for YourMetadataId, type your DB connection that should be used to connect to DB asYourConnectionToDB and type the query that will be used to extract data from DB as YourQuery.

If you want to speed the run of your graph, you can also add to your query "where 1=0" or "and where1=0" (the last expression should be added to the query terminated by other "where ..." expression.

This way only metadata will be extracted and no data will be read. Remember that such metadata cannot be createdin GUI and will only be generated at the runtime.

85

Chapter 9. Database ConnectionsIf you want to parse data, you need to have some resources of data. Sometimes you get data from files, in othercases from databases or other data resources.

Now we will describe how you can work with the resources that are not files. In order to work with them, you needto make a connection to such data resources. By now we will describe only how to work with databases, some ofthe more advanced data resources using connections will be described later.

When you want to work with databases, you can do it in two following ways: Either you have a client on yourcomputer that connects with a database located on some server by means of some client utility . The other way isto use a JDBC driver. Now we will describe the database connections that use some JDBC drivers. The other way(client-server architecture) will be described later when we are talking about components.

As in the case of metadata, database connections can be internal or external (shared). You can create them intwo ways.

Internal Database ConnectionsAs mentioned above about metadata, also internal database connections are part of a graph, they are contained init and can be seen in its source tab. This property is common for all internal structures.

How You Can Create Internal Database ConnectionsIf you want to create an internal database connection, you must do it in the Outline pane by selecting the Con-

nections item, right-clicking this item, selecting Connections → Create internal.

Figure 9.1. Creating Internal Database Connection

A Database connection wizard opens. (You can also open this wizard when selecting some DB connection itemin the Outline pane and clicking Enter.)

In the Database connection wizard, you must specify the name of the connection, type your username, youraccess password and URL of the database connection (hostname, database name or other properties). You also

Database Connections

86

decide whether you want to encrypt the access password by checking the checkbox. And you need to select theJDBC specific property. You can also use the default one, however, it may not do all that you want.


To add some driver, you must click some of the available drivers in the list. In case that you still do not have thedesired JDBC driver in the list, you must load such driver by clicking the Plus sign located on the right side ofthe wizard ("Load driver from JAR"). The result can be as follows:

Figure 9.3. Adding a new JDBC Driver into the List of Available Drivers

If necessary, you can also add another JAR to the driver classpath (Add JAR to driver classpath). For example,some databases may need their license be added.

You can also add some property (Add user-defined property).

Note that you can also remove some driver from the list (Remove selected) by clicking the Minus sign.


87

CloverGUI already provides two built-in JDBC drivers that are displayed in the list of available drivers. They arethe JDBC drivers for MySQL and PostgreSQL databases.

You can choose some JDBC driver from the list of available drivers. By clicking any of them, connection stringhint appears in the URL textarea. You only need to modify the connection.

Once you have selected the driver from the list, you only need to type your username and password for connectingto the database. You also need to change the "hostname" for its right name. You must also type the right databasename instead of the "database" word. Some other drivers provide different URLs that must be changed in a differentway. You can also load some existing connection from one of the existing configuration files. And you can setup the JDBC specific property.

Figure 9.4. Defining Internal Database Connection

When all has been done, you can validate your connection by clicking the Validate connection button.

After clicking Finish, your internal database connection has been created.


88

Externalizing Internal Database ConnectionsOnce you have created internal database connection as a part of a graph, you have it in your graph, once it iscontained and visible in the graph, you may want to convert it into external (shared) database connection. Thus,you would be able to use the same database connection for more graphs (more graphs would share the connection).

You can externalize internal database connection into external (shared) one by right-clicking some of the internaldatabase connection items in the Outline pane, clicking Externalize connection from the context menu, select-ing the project you want to add the database connection into, expanding that project, selecting the conn folder,renaming the configuration file, if necessary, and clicking Finish.

Figure 9.5. Externalizing Internal Database Connection

After that, the internal file disappears from the Outline pane connections folder, but, at the same location, a newlycreated configuration file appears.

The same configuration file appears in the conn subfolder in the Navigator pane.


89

External (Shared) Database ConnectionsAs mentioned above, external (shared) database connections are such connections that serve for more graphs thanonly one. They are stored outside the graph and that is why more graphs can share them.

How You Can Create External (Shared) Database Con-nections

If you want to create an external (shared) database connection, you must do it by selecting File → New → Other...

Figure 9.6. Creating External (Shared) Database Connection

Then you must expand the CloverETL item and either click the Database connection item and Next, or dou-ble-click the Database Connection item.


90

Figure 9.7. Selecting Database Connection Item

After that, a Database connection wizard opens. (You can also open this wizard when selecting some DB con-nection item in the Outline pane and clicking Enter.)

In the Database connection wizard, you must specify the name of the connection, type your username, youraccess password and URL of the database connection (hostname, database name or other properties). You alsodecide whether you want to encrypt the access password by checking the checkbox. And you need to select theJDBC specific property.


To add some driver, you must click some of the available drivers in the list. In case that you still do not have thedesired JDBC driver in the list, you must load such driver by clicking the Plus sign located on the right side ofthe wizard ("Load driver from JAR"). The result can be as follows:


91

Figure 9.9. Adding a new JDBC Driver into the List of Available Drivers

If necessary, you can also add another JAR to the driver classpath (Add JAR to driver classpath). For example,some databases may need their license be added.

You can also add some property (Add user-defined property).

Note that you can also remove some driver from the list (Remove selected) by clicking the Minus sign.

CloverGUI already provides two built-in JDBC drivers that are displayed in the list of available drivers. They arethe JDBC drivers for MySQL and PostgreSQL databases.

You can choose some JDBC driver from the list of available drivers. By clicking any of them, connection stringhint appears in the URL textarea. You only need to modify the connection.

Once you have selected the driver from the list, you only need to type your username and password for connectingto the database. You also need to change the "hostname" for its right name. You must also type the right databasename instead of the "database" word. Some other drivers provide different URLs that must be changed in a differentway. You can also load some existing connection from one of the existing configuration files. And you can setup the JDBC specific property.


92

Figure 9.10. Defining External (Shared) Database Connection

When all has been done, you can validate your connection by clicking the Validate connection button.

Then you only need to click the Next button and select the folder for your connection configuration file.

Figure 9.11. Selecting a Folder for External (Shared) Database Connection

After clicking Finish, your external (shared) database connection has been created.


93

Linking External (Shared) Database ConnectionIf you want to link an already existing external (shared) database connection, you must do it in the Outline pane

by selecting the Connections item, right-clicking this item, selecting Connections → Link shared connection.Then you must select some of the existing configuration files (extension .cfg).

Internalizing External (Shared) Database ConnectionsOnce you have created and linked external (shared) database connection, in case you want to put this connectioninto the graph, you may want to convert it into internal database connection. Thus, you would be able to see itsstructure in the graph itself.

You can internalize external (shared) configuration file into internal database connection by right-clicking someof the external (shared) database connections items in the Outline pane and clicking Internalize connection fromthe context menu.

Figure 9.12. Internalizing External (Shared) Database Connection

After that, the external (shared) database connection item disappears from the Outline pane connections folder,but, at the same location, a newly created internal database connection item appears.

However, the original external (shared) configuration file still remains in the conn subfolder in the Navigatorpane.

Browsing Database and Extracting Metadatafrom Database TablesAs you could see above (in Sections "Externalizing Internal Database Connections" and "Internalizing External(Shared) Database Connections".), in both of these cases the context menu contains two interesting items: theBrowse database and New metadata items. They give you the opportunity to browse a database (if your connec-tion is valid) and/or extract metadata from some selected database table. Such a metadata will be internal only,but you can externalize and/or export them.


94

Encrypting the Access PasswordIf you do not encrypt your access password, it remains stored and visible in the configuration file (shared con-nection) or in the graph itself (internal connection). Thus, the access password can be seen in some of these twolocations.

Of course, this would not present any problem if you were the only one who had access to your graph and/orwho had your computer to yourself only. And even more, there would not be any problem if the password did notprovide access to the whole database! But it does provide access right there!

So, in case you want and need to give someone any of your graphs, you must not give him or her the accesspassword to the whole database. This is the reason why it is important to encrypt your access password. Withoutdoing so, you would be at great risk of some intrusion into your database or of some other damage from whoeverwho could get this access password.

Thus, it is important and possible that you give him or her the graph with the access password encrypted. Thisway, no person would be able to change your database without your permission in any way.

In order to hide your access password, you must select Encrypt password by checking the checkbox in theDatabase connection wizard, typing a new (encrypting) password to encrypt the original (encrypted now) accesspassword and clicking the Finish button.

And then, in order to run such a graph, you will not be able to run the graph by choosing Run as → CloverETLgraph any more. To run the graph, you must use the Open Run Dialog wizard now. There, in the Main tab,you must type or find by browsing the name of the project, its graph name, parameter file and - what is the mostimportant - type in the Password textarea the encrypting password. The access password cannot be read now, ithas been already encrypted and cannot be seen neither in the configuration file nor in the graph.

Figure 9.13. Running a Graph with the Password Encrypted

If you should want to return to your access password, you can do it by typing the encrypting password into theDatabase connection wizard and clicking Finish.

95

Chapter 10. Lookup TablesWhen you are working with CloverGUI, you can also create and use Lookup Tables. These tables are data struc-tures that allow fast access to stored data using known key or SQL query. This way you can reduce the need tobrowse database or data files.

Creating Lookup TablesIf you want to create a lookup table, you must do it in the Outline pane by selecting the Lookups item, right-

clicking this item, selecting Lookup tables → Create lookup table. A Lookup table wizard opens. After selectingthe lookup table type and clicking Next, you can specify the properties of the selected lookup table.

(You can also open this wizard when selecting some lookup item in the Outline pane and clicking Enter.)

Figure 10.1. Lookup Table Wizard

Lookup Tables

96

Simple Lookup Table

In the Simple lookup table wizard, you must set up the demanded properties:

In the Table definition tab, you must give a Name to the lookup table, select the corresponding Metadata andthe Key that should be used to look up data records from the table. You can select Charset and the Initial size ofthe lookup table (512 by default) and decide whether Byte mode should be used.

Figure 10.2. Simple Lookup Table Wizard

After clicking the button on the right side from the Key area, you will be presented with the Edit key wizardwhich helps you select the Key.

Figure 10.3. Edit Key Wizard

By highlighting some of the field names in the Field pane and clicking the Right arrow button you are movingsuch a field name into the Key parts pane. You can move more fields into the Key parts pane. You can alsochange the position of any of them in the list of the Key parts by clicking the Up or Down buttons. The key partsthat are higher in the list have higher priority. When you have finished, you only need to click OK. (You can alsoremove any of them by highlighting it and clicking the Left arrow button.)

In the Data source tab, you can either locate the file URL, or type or paste some data into the Data textarea.

Lookup Tables

97

Figure 10.4. Simple Lookup Table Wizard with File URL

Figure 10.5. Simple Lookup Table Wizard with Data

You can click the Edit data button after which you can change some data.

Figure 10.6. Changing Data

Lookup Tables

98

After all has been done, you can click OK and then Finish.

Database Lookup TableWhen creating or editing a Database lookup table, you must check the Database lookup radio button and clickNext. (See Figure "Lookup Table Wizard".)

Figure 10.7. Database Lookup Table Wizard

Then, in the Database lookup table wizard, you must give a Name to the selected lookup table, specify someMetadata and DB connection.

You can also check the Store negative response key. (If some key value has not been found in the table, thisvalue is stored for future purposes and will not be searched again.) And type some Max cache size value.

You must also type or edit some SQL query that serves to look up data records from lookup table. (This querycorresponds to the key that had to be used in Simple lookup table.) If you want to edit the query, you must clickthe Edit button and, if your database connection is valid and working, you will be presented with the Query editorwizard, where you can browse the database, generate some query, validate it and view the resulting data.

Figure 10.8. Query Editor Wizard

Now, you can click OK and then Finish.

Lookup Tables

99

Range Lookup Table

You can create a Range lookup table only in case some fields of the records create ranges. That means the fieldsare of the same data type and they can be assigned both start and end. You can see it in the following example:

Figure 10.9. Appropriate Data for Range Lookup Table

When you create a Range lookup table, you must check the Range lookup radio button and click Next. (SeeFigure "Lookup Table Wizard".)

Figure 10.10. Range Lookup Table Wizard

Then, in the Range lookup table wizard, you must give a Name to the selected lookup table, specify Metadata.

You can select Charset and decide whether Byte mode and/or Internationalization should be used.

When you click the Edit button, you are opening the following wizard:

Lookup Tables

100

Figure 10.11. Define Range Lookup Table Key Wizard

There you can see two panes, Fields on the left and Ranges definition on the right.

Now you must select End fields and assign them to Start fields by dragging and dropping some selected fieldsfrom the left pane to the End fields column in the right pane.

You can also use the buttons on the right side of the wizard.

Figure 10.12. Assigning End Fields to Start Fields

You must also select whether any start or end field should be included to these ranges or not. You can do it byselecting any of them in the corresponding column of the wizard and clicking.

After that, you only need to click OK and then Finish.

101

Chapter 11. ParametersWhen you work with your graphs, sometimes you need to create parameters. Like metadata and connections, alsoparameters can be both internal and external (shared). The reason why to create parameters is the following: Whenusing parameters, you can do more simple all your work with your graphs. Every value, number, path, filenameor attribute, etc. can be set up or changed with the help of parameters. Parameters are similar to named constants.They are stored in one place and after the value of any of them is changed, this new value is used in the program.

Internal ParametersInternal parameters are stored in the graph, they can be seen there. If you want to change the value of someparameter, it is better to have external (shared) parameters. If you want to give someone your graph, it is better tohave internal parameters. It is the same as with metadata and connections.

How You Can Create Internal ParametersIf you want to create internal parameters, you must do it in the Outline pane by selecting the Parameters item,

right-clicking this item, selecting Parameters → Create internal parameter. A Graph parameters wizard ap-pears. You must set up the names and values and click Finish.

Figure 11.1. Creating Internal Parameters

Externalizing Internal ParametersOnce you have created internal parameters as a part of a graph, you have them in your graph, but you may wantto convert them into external (shared) parameters. Thus, you would be able to use the same parameters for moregraphs (when more graphs share them).

You can externalize one internal parameter into external (shared) by right-clicking its item in the Outline pane,clicking Externalize parameters from the context menu, selecting the project you want to add the parameter fileinto, expanding that project and clicking OK.

Parameters

102

But mostly there are more internal parameters in a graph than only one and we suggest you externalize all of itsinternal parameters into one external (shared) parameter file.

To do so, you must first select them. You can do it by holding down the Shift key while dragging by the Uparrow or Down arrow key. Then you must call out the context menu by right-clicking the selected items, clickthe Externalize parameters item, select the project you want to add the parameter file into, expand the projectand click OK.

Figure 11.2. Externalizing Internal Parameters

After that, the internal parameter or parameters disappear from the Outline pane parameters folder, but, at thesame location, a newly created parameter file appears.

The same parameter file appears in the project folder in the Navigator pane. The extension .prm is given auto-matically to the parameter file.

External (Shared) ParametersExternal (shared) parameters are stored outside the graph, they are stored in a separate file within the project folder.If you want to change the value of some of the parameters, it is better to have external (shared) parameters. But,if you want to give someone your graph, it is better to have internal parameters. It is the same as with metadataand connections.

How You Can Create External (Shared) Parameters

If you want to create external (shared) parameters, you must do it by selecting File → New → Other and then byexpanding the CloverETL item and clicking the Graph parameter file item. (See Section "How You Can CreateExternal (Shared) Metadata".) Then you must click Next and a Graph parameters wizard appears and you onlyneed to create parameters with the help of it.

Linking External (Shared) ParametersIf you want to link an already existing external (shared) parameter file, you must do it in the Outline pane by se-

lecting the Parameters item, right-clicking this item, selecting Parameters → Link parameter file. (See Section

Parameters

103

"How You Can Create Internal Parameters".) Then you must select some of the existing parameter files (extension.prm).

Internalizing External (Shared) Parameters

Once you have created and linked external (shared) parameters, in case you want to put them into the graph, youmay want to convert them into internal parameters. In such a case you would be able to see their structure in thegraph itself. Note that one parameter file with more parameters will create more internal parameters.

You can internalize external (shared) parameter file into internal parameters by right-clicking some of the external(shared) parameters items in the Outline pane and clicking Internalize parameters from the context menu.

After that, the external (shared) parameters item disappears from the Outline pane parameters folder, but, at thesame location, the newly created internal metadata items appear. Mostly more than one.

However, the original external (shared) parameter file still remains to exist in the project folder in the Navigatorpane.

Figure 11.3. Internalizing External (Shared) Parameter

Parameters Wizard(You can also open this wizard when selecting some parameters item in the Outline pane and clicking Enter.)

By clicking the plus button on the right side, a pair of words "name" and "value" appear in the wizard. After eachclicking the Plus button, a new line with name and value labels appears and you must set up both names andvalues. You can do it when highlight any of them by clicking and change it to whatever you want and need. Whenyou select all names and set up all values you want, you can click the Finish button (for internal parameters) or theNext button and type the name of the parameter file. The extension .prm will be added to the file automatically.

You also need to select the location for the parameter file in the project folder. Then you can click the Finishbutton. After that, the file will be saved.

Parameters

104

Figure 11.4. Example of a Parameter-Value Pair

Using ParametersWhen you have defined, for example, a db_table (parameter) which means a database table named employ-ee (its value) (as above), you can only use ${db_table} instead of employee wherever you are using thisdatabase table.

105

Chapter 12. SequencesCloverGUI contains a tool designed to create sequences of numbers that can be used, for example, for numberingrecords. In records, a new field is created and filled by numbers taken from the sequence.

Creating a SequenceIf you want to create a sequence, you must right-click the Sequence item in the Outline pane and choose Sequence

→ Create sequence from the context menu. After that, a Sequence wizard appears. There you must type thename of the sequence, select the value of its first number, the incrementing step (in other words, the differencebetween every pair of adjacent numbers), the number of precomputed values that you want to be cached and,at the end, the name of the sequence file where the numbers should be stored. The name can be, for example,${SEQ_DIR}/sequencefile.seq or ${SEQ_DIR}/anyothername. Note that we are using here theSEQ_DIR parameter defined in the workspace.prm file, whose value is ${PROJECT}/seq. And PROJECTis another parameter defining the path to your project located in workspace.

Figure 12.1. Creating a Sequence

Editing a SequenceWhen you want to edit some of the existing sequences, you must select the sequence name in the Outline pane,open the context menu by right-clicking this name and select the Edit item. A Sequence wizard appears. (Youcan also open this wizard when selecting some sequence item in the Outline pane and clicking Enter.)

Now it differs from that mentioned above by a new textarea with the current value of the sequence number. Thevalue has been taken from a file. If you want, you can change all of the sequence properties and you can reset thecurrent value to its original value by clicking the button.

Sequences

106

Figure 12.2. Editing a Sequence

And when the graph has been run once again, the same sequence started from 1001:

Figure 12.3. A New Run of the Graph with the Previous Start Value of the Sequence

You can also see how the sequence numbers fill one of the record fields.

107

Appendix C. JMS ConnectionsFor receiving Java messages you need JMS connections. Like metadata, parameters and database connections,these can also be internal or external (shared).

Internal JMS ConnectionsAs mentioned above in case for other tools (metadata, database connections and parameters), also internal JMSconnections are part of a graph, they are contained in it and can be seen in its source tab. This property is commonfor all internal structures.

How You Can Create Internal JMS ConnectionsIf you want to create an internal JMS connection, you must do it in the Outline pane by selecting the Connections

item, right-clicking this item, selecting Connections → JMS internal connection. A Edit JMS connection wiz-ard opens. You can define the JMS connection in this wizard. Its appearance and the way how you must set upthe connection are described below.

Externalizing Internal JMS ConnectionsOnce you have created internal JMS connection as a part of a graph, you have it in your graph, once it is containedand visible in the graph, you may want to convert it into external (shared) JMS connection. Thus, you would beable to use the same JMS connection for more graphs (more graphs would share the connection).

You can externalize internal JMS connection into external (shared) one by right-clicking some of the internalJMS connection items in the Outline pane, clicking Externalize connection from the context menu, selecting theproject you want to add the JMS connection into, expanding that project, selecting the conn folder, renaming theconfiguration file, if necessary, and clicking Finish.

After that, the internal item disappears from the Outline pane connections folder, but, at the same location, anewly created configuration file appears.

The same configuration file appears in the conn subfolder in the Navigator pane.

External (Shared) JMS ConnectionsAs mentioned above, external (shared) JMS connections are such connections that serve for more graphs than onlyone. They are stored outside the graph and that is why more graphs can share them.

How You Can Create External (Shared) JMS Connec-tions

If you want to create an external (shared) JMS connection, you must do it by selecting File → New → Other...and then by expanding the CloverETL item and either by clicking the JMS connection item and then Next, orby double-clicking the JMS Connection item. (See Section "How You Can Create External (Shared) Metadata".)An Edit JMS connection wizard opens.

Linking External (Shared) JMS ConnectionIf you want to link an already existing external (shared) JMS connection, you must do it in the Outline pane

by selecting the Connections item, right-clicking this item, selecting Connections → JMS shared connection.

JMS Connections

108

(See Section "How You can Create Internal Database Connection".) Then you must select some of the existingconfiguration files (extension .cfg).

Internalizing External (Shared) JMS ConnectionsOnce you have created and linked external (shared) JMS connection, in case you want to put this connection intothe graph, you may want to convert it into internal JMS connection. Thus, you would be able to see its structurein the graph itself.

You can internalize external (shared) configuration file into internal JMS connection by right-clicking some ofthe external (shared) JMS connections items in the Outline pane and clicking Internalize connection from thecontext menu.

After that, the external (shared) JMS connection item disappears from the Outline pane connections folder, but,at the same location, a newly created internal JMS connection item appears.

However, the original external (shared) configuration file still remains in the right conn subfolder in the Navi-gator pane.

Edit JMS Connection WizardAs you can see, the Edit JMS connection wizard contains eight textareas that must be filled by: Name, Initialcontext factory class, Libraries, URL, Connection factory JNDI name, Destination JNDI, User, Password(password to receive and/or produce the messages).

(You can also open this wizard when selecting some JMS connection item in the Outline pane and clicking Enter.)

Figure C.1. Edit JMS Connection Wizard

In the Edit JMS connection wizard, you must specify the name of the connection, Initial context factory class,select necessary libraries (you can add them by clicking the plus button), URL of the connection, Connectionfactory JNDI name, Destination JNDI name, your authentication username (User) and your authenticationpassword (Password). You can also decide whether you want to encrypt this authentication password. This canbe done by checking the Encrypt password checkbox. If you are creating the external (shared) JMS connection,you must select a filename for this external (shared) JMS connection and its location.

JMS Connections

109

Encrypting the Authentication PasswordIf you do not encrypt your authentication password, it remains stored and visible in the configuration file (sharedconnection) or in the graph itself (internal connection). Thus, the authentication password can be seen in someof these two locations.

Of course, this would not present any problem if you were the only one who had access to your graph and/or whohad your computer to yourself only. And even more, there would not be any problem if the password did not givethe right to receive and/or send the messages! But it does give such a right!

So, in case you want and need to give someone any of your graphs, you must not give him or her the authenticationpassword. This is the reason why it is important to encrypt your authentication password. Without doing so,you would be at great risk of some intrusion actions or some other damage from whoever who could get thisauthentication password.

Thus, it is important and possible that you give him or her the graph with the authentication password encrypted.This way, no person would be able to receive and/or produce the messages without your permission in any way.

In order to hide your authentication password, you must select Encrypt password by checking the checkbox inthe Edit JMS connection wizard, typing a new (encrypting) password to encrypt the original (encrypted now)authentication password and clicking the Finish button.

And then, in order to run such a graph, you will not be able to run the graph by choosing Run as → CloverETLgraph any more. To run the graph, you must use the Open Run Dialog wizard now. There, in the Main tab,you must type or find by browsing the name of the project, its graph name, parameter file and - what is the mostimportant - type in the Password textarea the encrypting password. The authentication password cannot be readnow, it has been already encrypted and cannot be seen neither in the configuration file nor in the graph.

If you should want to return to your authentication password, you can do it by typing the encrypting passwordinto the JMS connection wizard and clicking Finish.

Part III. Components Guide

111

Chapter 13. Introduction toComponentsIn the palette of components of the Graph Editor, all components are divided into 5 groups: Readers, Writers,Transformers, Joiners and Others. We will describe either group step by step. One more category is calledDeprecated now. It should not be used any more and we do not describe its components either.

So far we have talked about how to paste components to graphs. We will now discuss the properties of componentsand the manner of configuring them. You can configure the properties of any graph component in the followingway:

• You can simply double-click the component in the Graph Editor.

• You can do it by clicking the component and/or its item in the Outline pane and editing the items in the Prop-erties tab.

• You can select the component item in the Outline pane and click Enter.

• You can also open the context menu by right-clicking the component in the Graph Editor and/or in the Outlinepane. Then you can select the Edit item from the context menu and edit the items in the Edit component wizard.

Common Properties of ComponentsSome properties are common for all components. They are the following: Component names, Phases, Enablingvs. Disabling components vs. PassThrough status.

Data policy is common for some of them only but we will describe it here. It is also important to remember thatyou must specify some URL in some of the components.

This can be done with the help of URL File Dialog.

In addition to these properties, you can also choose which components should be displayed in the Palette ofComponents and which should be removed from there.

We can also describe how you can view your input or output data.

Introduction to Components

112

Palette of ComponentsCloverGUI provides all components in the Palette of Components. However, you can choose which should be

included in the Palette and which not. If you want to choose only some components, select Window → Prefer-ences... from the main menu.

Figure 13.1. Selecting Components

After that, you must expand the CloverETL item and choose Components in Palette.

Figure 13.2. Components in Palette

In the window, you can see the categories of components. Expand the category you want and uncheck the check-boxes of the components you want to remove from the palette.


113

Figure 13.3. Removing Components from he Palette

Then you only need to close and open graph and the components will be removed from the Palette.

Giving a Name to a Component

Each component has a label on it which can be changed for another one. As you may have many components inyour graph and they may have some specified functions, you can give them names according to what they do.Otherwise you would have many different components with identical names in your graph.

You can rename any component in either of the following four ways:

• You can rename the component in the Edit component dialog by specifying the Component name attribute.

• You can rename the component in the Properties tab by specifying the Component name attribute.

• You can rename the component by highlighting and clicking it.

If you highlight any component (by clicking the component itself or by clicking its item in the Outline pane),a hint appears showing the name of the component. After that, when you click the highlighted component, arectangle appears below the component, showing the Component name on a blue background. You can changethe name showed in this rectangle and then you only need to click Enter. The Component name has beenchanged and it can be seen on the component.


114

Figure 13.4. Simple Renaming Components

• You can right-click the component and select Rename from the context menu. After that, the same rectangle asmentioned above appears below the component. You can rename the component in the way described above.

Phases

Each graph can be divided into some amount of phases by setting the phase numbers on components. You can seethis phase number in the upper left corner of every component.

The meaning of a phase is that each graph runs in parallel within the same phase number. That means that eachcomponent and each edge that have the same phase number run simultaneously. If the process stops within somephase, higher phases do not start. Only after all processes within one phase terminate successfully, next phase starts.

That is why the phases must at least remain the same as the process is running. They must not descend.

So, when you increase some phase number on any of the graph components, all components with the same phasenumber (unlike those with higher phase numbers) lying further along the graph change their phase to this newvalue automatically.


115

Figure 13.5. Running a Graph with Various Phases

Enabling vs. Disabling Components vs. PassThroughStatus

By default all components are enabled. Once configured, they can parse data. However, you can turn off any groupof components of any graph. Each component can be disabled. When you disable some component, it becomesgreyish and does not parse data when the process starts. Moreover, neither the components that lie further along thegraph parse data. Only if there is some other enabled component that enter the branch further along the graph, datacan flow into the branch through that enabled component. But, if some component from which data flows to thedisabled component or to which data flows from the disabled component cannot parse data without the disabledcomponent, graph terminates with error. Data that are parsed by some component must be sent to other componentsand if it is not possible, parsing is impossible as well. Disabling can be done in the context menu or Propertiestab. You can see the following example of when parsing is possible even with some component disabled:


116

Figure 13.6. Running a Graph with Disabled Component

You can see that data records from the disabled component are not necessary for the Concatenate componentand for this reason parsing is possible. Nevertheless, if you disabled this Concatenate component, readers beforethis component would not have at their disposal any component to which they could send their data records andgraph would terminate with error.

But, if you want to process the graph in this example even with the Concatenate component disabled, you can doit by setting the component to the passThrough status. Thus, data records would pass through the component frominput to output ports, but component would not change them. Unlike Disabling it can be done in the Propertiestab only.

Figure 13.7. Running a Graph with Component in PassThrough Status


117

Data PolicyWhen you want to configure some of the components (some Readers), you must first decide what should be donewhen incorrect or incomplete records are parsed. This can be specified with the help of the Data Policy attribute.Following are the components in which you can set this attribute. You have three options according to what datapolicy you want to select:

• Strict. This data policy is set by default. It means that data parsing stops when first error occurs. This datapolicy does not create any error port.

• Controlled. This data policy means that every error is logged, but incorrect records are skipped and data parsingcontinues. If you set the Data Policy attribute to this value, a new output port is created through which the logcan be sent to other components (for UniversalDataReader only) or the log information is sent to stdout(the others).

Thus, if you have set the Data policy attribute to controlled in UniversalDataReader, of course, you needto select the components that should process the information or maybe you only want to write it. You mustselect an edge and connect the error port of the UniversalDataReader (in which the data policy attribute isset to controlled) with the input port of the selected writer if you only want to write it or with the input portother processing component. And you must only assign metadata to this edge. The metadata must be created byhand. They consist of 4 fields: number of incorrect record, number of incorrect field,incorrect record, error message. The first two fields are of integer data type, the other two arestrings. (For more detailed information on how you can create metadata by hand, see corresponding section.)

• Lenient. This data policy means that incorrect records are set to their default values (if possible) and dataparsing continues.

Locating Files with URL File DialogIn some components you must also specify URL of some files. These files can serve to locate the files from whichdata should be read, the files to which data should be written or the files that must be used to transform data flowingthrough a component and some other file URL. To specify such a file URL, you can use the URL File Dialog.

When you open the URL File Dialog, you can see some textareas and panes on it. Upon opening this wizard, youcan see your local file structure in the left pane.

Figure 13.8. URL File Dialog


118

However, if you want to find some file that is not stored locally, you must first specify the connection to the serverin the upper textarea (Server). If the connection is valid, you can connect to the specified server.

To connect to the server, you must type the connection according to the following pattern: proto-col://username:password@hostname:portnumber. If you have typed the connection in the Servertextarea, you must click the Refresh button so as to display the file structure of the server in the left pane.

You can do the same by clicking the Connect to server button that is the second on the right side from the Servertextarea, after which a Connection Settings wizard opens. In the Connection Settings wizard, you must specifythe Protocol that should be used to connect to the server. The possible protocols are the following: HTTP, HTTPS,FTP, FTPS and SFTP. You also need to specify some of the following: the host name (Host), the port number(Port), your user name (Username) and your password (Password). You can validate the connection by clickingthe Validate connection button. After that, you can add the connection settings to the Server textarea by clickingthe OK button. When you click the Refresh button, the file structure of the server appears in the left pane.

(If you use the http protocol when connecting to the server, you can set some parameters of this connection. TheHttp parameters button located on the right side from the Server textarea serves to specify the properties of thehttp connection. When you click this button, a Property dialog opens. There you can define some properties alongwith their values. When you type both the Property name and the Value and click the Down arrow button, youadd these properties to the list of the pane. At the end you only need to confirm the list by clicking the OK button.You can edit any of the selected properties and their values by clicking and changing any of the items, you canalso remove any of them by clicking the item and clicking the Minus button. If you click the Two crosses button,you remove all of the properties and values.)

Independently on whether you are selecting local or remote files, you see some file structure in the left pane. Youneed to select some files from this pane. By double-clicking the items of the left pane, you can add the paths andthe file names to the Path textarea and (at the same time) the same item appears in the right pane. You can alsoadd wildcards to this textarea by clicking any of the first two buttons on the right side from this textarea (with ?or * signs). The third button serves to refresh the path and the file name in the textarea.

If you click the Two right arrows button located between the two panes, you add all of the files from the left paneto the File URLs pane on the right. If you click any of the file items of the left pane and click the Right arrowbutton, you add this file URL to the right pane. You can do the same by simply double-clicking the item in theleft pane. After that, the selected item appears in the right pane. If you have a path and a file name in the Pathtextarea, you can add it to the right pane by clicking the Down arrow button located above the right pane. In allof these cases, at least one file URL appears in the right pane. Then you add this file URL or these file URLs tothe Component editor wizard by clicking the OK button.

The resulting URL is of the following type:

protocol://username:password@hostname:portnumber:/path/filename.

Above the File URLs pane on the right you can see the button for creating the URL of the following type: port:$0.FieldName[:processingType]. After clicking the button, you can select FieldName and one of thefollowing processing types: source, discrete and stream.

Viewing Data in Readers and Writers

You can also view data in Readers and Writers using the context menu. To do that, right-click the desired com-ponent and select View data from the context menu.


119

Figure 13.9. Viewing Data in Components

After that, you can choose whether you want to see data as plain text or grid. If you select the Plain text option,you can select Charset, but you cannot select any filter expression.

Figure 13.10. Viewing Data as Plain Text

On the other hand, if you select the Grid option, you can select Filter expression, but no Charset.

Figure 13.11. Viewing Data as Grid

The result can be as follows in the Plain text mode:


120

Figure 13.12. Plain Text Data Viewing

Or in the Grid mode, it can be like the following:

Figure 13.13. Grid Data Viewing

The same can be done in some Writers. However, only after the output file has been created.

121

Chapter 14. Defining theTransformationsThis section describes how you can define transformation in the following components:

• Partition, DataIntersection, Reformat, Denormalizer and Normalizer

Except Partition, the other four components require that some transformation should be defined. In the Par-tition component, transformation is required only if neither the Ranges nor the Partition key attributes aredefined.

You can define the transformation in Java or Clover transformation language or in Clover transformation lan-guage Lite.

However, we suggest you better do not use CTL Lite.

• ApproximativeJoin, ExtHashJoin, ExtMergeJoin, LookupJoin and DBJoin

Some transformation is required in these components.

You can define the transformation in Java or Clover transformation language or in Clover transformation lan-guage Lite.

However, we suggest you better do not use CTL Lite.

• JMSReader, JMSWriter and JavaExecute

Only in JavaExecute, some transformation is required, however, it must be written in Java language only. InJMSReader and JMSWriter, transformation is optional, but must also be written in Java.

Of course, in all of these components, you can use some compiled transformation class. To do that, use the OpenType wizard. In this case, transformation is located outside the graph.

You can also use a transformation defined in some source file outside the graph. To locate the transformationsource file, use the URL File Dialog. Each of the mentioned components can use this transformation definition.This file must contain definition of the transformation written in the same languages as in case of the internaltransformation definition. In this case, transformation is located outside the graph. (For more detailed informationsee Section "Locating Files with URL File Dialog" above.)

To define the transformation in the graph itself, you must use the Transform editor (or the Edit value wizardin case of JMSReader, JMSWriter and JavaExecute components). In it, you can define some transformationlocated and visible in the graph itself. The languages that serve for writing transformation has been mentionedabove.

More details about how you should define the transformations can be found in the sections concerning correspond-ing components.

Some details about writing transformations in Java can be found in corresponding Appendix.

Open Type WizardThis wizard serves to select some class that defines the desired transformation. When you open it, you only needto type a part of a class name. By typing the name, the classes satisfying to the written letters appear in this wizardand you can select the right one.

Defining the Transformations

122

Figure 14.1. Open Type Wizard

Edit Value WizardThe Edit Value wizard contains a simple textarea where you can write the transformation code in JMSReader,JMSWriter and JavaExecute components.

Figure 14.2. Edit Value Wizard

When you click the Navigate button at the upper left corner, you will be presented with the list of possible options.You can select either Find or Go to line.

Figure 14.3. Find Wizard

If you click the Find item, you will be presented with another wizard. In it you can type the expression you wantto find (Find textarea), decide whether you want to find the whole word only (Whole word), whether the casesshould match or not (Match case), and the Direction in which the word will be searched - downwards (Forward)or upwards (Backward). These options must be selected by checking the presented checkboxes or radio buttons.

If you click the Go to line item, a new wizard opens in which you must type the number of the line you wantto go to.


123

Figure 14.4. Go to Line Wizard

Transform EditorSome of the components provide the Transform editor in which you can define the transformation.

When you open the Transform editor, you can see the following tabs: the Transformations and Source tabs.

The Transformations tab can look like this:

Figure 14.5. Transformations Tab of the Transform Editor

In this Transformations tab, you can define the transformation using a simple mapping of inputs to outputs. First,you must have both input and output metadata defined and assigned. Only after that, you can define the desiredmapping.

After opening the Transform editor, you can see some panes and tabs in it. You can see the input fields of allinput ports and their data types in the left pane and the output fields of all output ports and their data types in theright pane. You can see the following three tabs in the middle bottom: Variables, Sequences, Parameters.

If you want to define the mapping, you must select some of the input fields, push down the left mouse button onit, hold the button, drag to the Transformations pane in the middle and release the button. After that, the selectedfield name appears in the Transformations pane.

The following will be the resulting form of the expression: $portnumber.fieldname.

After that, you can do the same with some of the other input fields. If you want to concatenate the values of variousfields (even from different input ports, in case of Joiners and the DataIntersection component), you can transfer


124

all of the selected fields to the same row in the Transformations pane after which there will appear the expressionthat can look like this: $portnumber1.fieldnameA+$portnumber2.fieldnameB.

The port numbers can be the same or different. The portnumber1 and portnumber2 can be 0 or 1 or anyother integer number. (In all components both input and output ports are numbered starting from 0.) This way youhave defined some part of the transformation. You only need to assign these expressions to the output fields.

In order to assign these expressions to the output, you must select any item in the Transformations pane in themiddle, push the left mouse button on it, hold the button, drag to the desired field in right pane and release thebutton. This output field in the right pane becomes bold.

In addition to the said until now, you can see empty little circles on the left from either of these expressions (stillin the Transformations pane). Whenever some mapping is made, the corresponding circle fills up with blue.This way you must map all of the expressions in the Transformations pane to the output fields until all of theexpressions in the Transformations pane becomes blue. At that moment, the transformation has been defined.

You can also copy any input field to the output by right-clicking the input item in the left pane and selecting Copyfields to... and the name of the output metadata:

Figure 14.6. Copying the Input Field to the Output

Remember that if you have not defined the output metadata before defining the transformation, you can definethem even here, by copying and renaming the output fields using right-click, however, it is much more simple todefine new metadata prior to defining the transformation. If you defined the output metadata using this Transformeditor, you would be informed that output records are not known and you would have to confirm the transformationwith this error and (after that) specify the delimiters in metadata editor.

The resulting simple mapping can look like this:


125

Figure 14.7. Transformation Definition in CTL (Transformations Tab)

If you select any item in the left, middle or right pane, corresponding items will be connected by lines. See examplebelow:

Figure 14.8. Mapping of Inputs to Outputs (Connecting Lines)

By clicking the button that appears after selecting the row of the Transformations pane, you can also open theeditor for defining the transformation of each individual field. It contains a list of fields, functions and operatorsand it also provides the hints. See below:


126

Figure 14.9. Editor with Fields and Functions

Some of your transformations may be complicated and it is difficult to define them in the Transformations tab.You can use the Source tab instead.

(Source tabs for individual components are displayed in corresponding sections concerning these components.)

Below you can see the Source tab with the transformation defined above. It is written in Clover transformationlanguage.

Figure 14.10. Transformation Definition in CTL (Source Tab)


127

In the upper right corner of either tab, there are two buttons: for creating a new tab in Graph Editor (Open tabbutton) and for converting the defined transformation to Java (Convert to Java button).

If you click the Open tab button, a new tab with the transformation will be opened in the Graph Editor. It willbe confirmed by the following message:

Figure 14.11. Confirmation Message

The tab can look like this:

Figure 14.12. Transformation Definition in CTL (Transform Tab of the Graph Editor)

If you switch to this tab, you can view the declared variables and functions in the Outline pane. (The tab can beclosed by clicking the red cross in the upper right corner of the tab.)

The Outline pane can look like this:


128

Figure 14.13. Outline Pane Displaying Variables and Functions

Note that you can also use some content assist by clicking Ctrl-Space.

If you click these two keys inside any of the expressions, the help advises what should be written to define thetransformation.

Figure 14.14. Content Assist (Record and Field Names)

If you click these two keys outside any of the expressions, the help gives a list of functions that can be used todefine the transformation.


129

Figure 14.15. Content Assist (List of CTL Functions)

If you have some error in your definition, the line will be highlighted by red circle with a white cross in it and atthe lower left corner there will be a more detailed information about it.

Figure 14.16. Error in Transformation

If you want to convert the transformation code into the Java language, click the Convert to Java button and selectwhether you want to use clover preprocessor macros or not.


130

Figure 14.17. Converting Transformation to Java

After selecting and clicking OK, the transformation converts into the following form:

Figure 14.18. Transformation Definition in Java

In older transformations, Clover transformation language Lite was used, however, we suggest you do not use itfrom now.

Nevertheless, the same transformation could look in the Transformations tab (in CTL Lite) like this:


131

Figure 14.19. Older Transformation Definition in CTL Lite (Transformations Tab)

Nevertheless, the same transformation could look in the Source tab (in CTL Lite) like this:

Figure 14.20. Older Transformation Definition in CTL Lite (Source Tab)

You should convert older transformations in CTL Lite to a new form in CTL.

In the upper right corner you can also see two buttons. This time, you can either convert the transformation toJava (Convert to Java button) or CTL (Convert to CTL button). Only transformation in CTL can be displayedas a new tab in the Graph Editor.

132

Chapter 15. ReadersReaders are mostly the initial components of graphs. They read data from data resources and send it to othergraph components. This is the reason why each reader must have at least one output port through which the dataflows out. Readers can read data from files or databases located on disk. They can also receive data through someconnection using FTP, LDAP, or JMS. Some Readers can log the information about errors. Among the readers,there is also the Data Generator component that generates data according to some specified pattern. And, someReaders have an optional input port through which they can also receive data.

Remember that (in case of most readers) you can see some part of input data when you right-click a reader andselect the View data option. After that, you will be prompted with the same View data dialog as when debuggingthe edges (For more details see Section "Viewing the Data Flowing through the Edges".). This wizard allows youview the read data (it can even be used before graph has been run).

File URLIn order to work with the components, you must set File URL in some of them.

These are some examples of the File URL attributes for reading data.

• /path/filename.txt

• /path/filename1.txt;/path/filename2.txt This way you can read two files that are located onyour disk.

• /path/filename?.txt This way you can read the files conforming the mask that are located on your disk.

• /path/* This way you can read all of the files inside some folder.

• zip:/path/file.zip This way you can read the first file from the compressed file.

• zip:/path/file.zip#filename.txt This way you can read the specified file from the compressedfile.

• gzip:/path/file.gz Like above. Remember that no gzip file can be read by CloverDataReader.

• gzip:/path/file.gz#filename.txt Like above. Remember that no gzip file can be read by Clover-DataReader.

• ftp://user:password@server/path/filename.txt Remember that ftp cannot be used byCloverDataReader.

• http://server/path/filename.txt Remember that http cannot be used by CloverDataReader.

• https://server/path/filename.txt Remember that https cannot be used by CloverDataRead-er.

• zip:(ftp://user:password@server/path/file.zip)#filename.txt Remember that ftpcannot be used by CloverDataReader.

• zip:(http://server/path/file.zip)#filename.txt Remember that http cannot be used byCloverDataReader.

• zip:(zip:(ftp://user:password@server/path/name.zip)#file.zip)#filename.txt Remember that ftp cannot be used by CloverDataReader.

Readers

133

• gzip:(http://server/path/file.gz) Remember that no gzip file can be read by Clover-DataReader.

• port:$0.FieldName:source If this URL is used, input port must be connected. The specified field ofsuch data records that are received through this optional input port represents some URL from which data is readand parsed. Input data type of this FieldName must be one of the following three: string, byte or cbyte.

• port:$0.FieldName:discrete If this URL is used, input port must be connected. The specified field ofdata records that are received through this optional input port represents one particular data source. Input datatype of this FieldName must be one of the following three: string, byte or cbyte.

• port:$0.FieldName:stream If this URL is used, input port must be connected. The specified field valuesof all data records that are received through input port are concatenated to represent one particular data source.Input data type of this FieldName must be one of the following three: string, byte or cbyte.

• - This way you can specify data should be read from stdin. Remember that stdin cannot be used by Clover-DataReader.

File ReadersThese components read data from files. Only DataGenerator does not read data, it generates records accordingto the specified pattern. One component reads data from flat files: UniversalDataReader. The others read data ininternal Clover format (CloverDataReader), Excel files (XLSDataReader) and DBase files (DBFDataReader).

Unlike CloverDataReader (and DataGenerator, of course), other file readers can also receive data through theiroptional input port.

DataGenerator

This component has at least one output port. Whenever you connect an edge to any output port, a new outputport is created.

This component generates data according to some pattern instead of reading data from some file, or database, orany other data resource.

When you select this component, you must decide which fields should be generated at random (Random fields)and which by sequence (Sequence fields). The other fields will be constant. You must create a pattern (Recordpattern) that looks like data from delimited or fixed length file. Record pattern is a string containing all fields(except random and sequential fields) of the generated records in the form of delimited (with delimiters definedin metadata on the output port) or fixed length (with sizes defined in metadata on the output port) record. All ofthe record fields will be constant, and/or random values, and/or sequential values.

You must first specify how many record you want to be generated (Number of records to generate). Then youmust select which fields should be generated by sequence and/or at random.

You can do it by choosing among the sequences after clicking the Sequence fields attribute. Then the Sequencesdialog opens.

Readers

134

Figure 15.1. Sequences Dialog

This dialog consists of two panes. There are all of the graph sequences on the left and all clover fields (namesof the fields in metadata) on the right. By choosing the desired sequence from the left pane, holding down theleft mouse button on the sequence, moving it to the desired clover field on the right and releasing the button, youassign sequences to those clover fields that should be generated by sequence.

Figure 15.2. A Sequence Assigned

Remember that it is not necessary to assign the same sequence to different clover fields. But, of course, it ispossible. It depends only on your decision. This dialog contains two buttons on its right side. For cancelling anyselected assigned mapping or all assigned mappings.

You must also specify the fields that should be generated at random. For either of them you can define its ranges.(Its minimum and maximum values.) These values are of the corresponding data types according to metadata. Youcan assign random fields in the Edit key dialog that opens after clicking the Random fields attribute.

Readers

135

Figure 15.3. Edit Key Dialog

There are the Fields pane on the left, the Random fields on the right and the Random ranges pane at the bottom.In the last pane, you can specify the ranges of the selected random field. There you can type specific values. Youcan move fields between the Fields and Random fields panes as was described above - by clicking the Left arrowand Right arrow buttons.

Flat File ReadersThe UniversalDataReader component reads data from flat files indepentently on whether they are delimited,fixed or mixed. Flat files are simple text files with delimiters separating data records and fields, or with definedsizes of data records and fields or with both delimiters and sizes. Delimiters and sizes are defined in metadata.

This file reader can also receive data through its optional input port.

UniversalDataReader

This component has one optional input port and one or two output ports. The second output port is optional. Youcan extract metadata from a flat file.

If you connect an edge to the optional input port of the component, you must set the File URL attribute to port:$0.FieldName[:processingType]. Here processingType is optional and can be set to one of thefollowing: source, discrete and stream. If it is not set explicitly, it is set to discrete by default. (Youcan see the meaning of these attribute values in the section "File URL" above.) Input data type of this FieldNamemust be one of the following three: string, byte or cbyte.

This component reads data from flat files. It can read both delimited and fixed length data records depending onmetadata on its output port.

You must select which file should be read (File URL), what character type is used in these records (Charset),whether the first line describes the names of the fields and must be skipped (Skip first line), how many recordsshould be read from the file (Max number of records), otherwise the reader would read and send out all records.You can also specify what to do in case of some incorrect records (Data policy). If you switch to controlled datapolicy, you can log information about errors and send it through the second (optional) output port into some othercomponent. Therefore this component can have one optional error port.

Thus, if you have set the Data policy attribute to controlled in UniversalDataReader, of course, you need to selectthe components that should process the information or maybe you only want to write it. You must select an edge

Readers

136

and connect the error port of the UniversalDataReader (in which the data policy attribute is set to controlled) withthe input port of the selected writer if you only want to write it or with the input port other processing component.And you must only assign metadata to this edge. The metadata must consist of 4 fields: number of incorrectrecord, number of incorrect field, incorrect record, error message. The first two fieldsare of integer data type, the other two are strings. (For more detailed information on how you can create metadataby hand, see corresponding section.) The field names can be arbitrary.

In addition to the attributes mentioned above, in this type of component, you may also define the following:

Sometimes a limited number of rows is only header describing data and not data itself. In such a case, you mustset the Skip rows attribute to the number of rows that must be skipped.

You can also define whether white spaces in the leading ends of the fields should be skipped (Skip leading blanks).

You can also specify the Max error count attribute so as to limit errors that can still be processed until the dataparsing stops. The default value is 0.

The Quoted strings attribute can be set to true allowing to use field surrounded by single or double quotes. Thisis false by default.

In this type of component, you can even treat multiple delimiters like single one. You can do it by setting theTreat multiple delimiters as one attribute to true. This is false by default. If you used multiple delimiters anddid not set this attribute to true, it would be interpreted as null field between every pair of single delimiters withinsuch multiple delimiter.

You may also want to Trim strings.

You can also set the Incremental file and Incremental key attributes. The Incremental key is a string to whichinformation about read records is written. This key is stored in the Incremental file. This way, the componentreads only new records on each run of the graph.

It is also possible to set a phase of parsing data (Phase), set the visual name located on the component (Componentname) and enable/disable the component (Enable).

Other Type File ReadersThe other three components read data from other type files. (CloverDataReader, XLSDataReader and DBF-DataReader.) These files are internal Clover format files (CloverDataReader), Excel files (XLSDataReader)and dBase files (DBFDataReader). You can extract metadata from an Excel file and you can create metadata froma dBase file. (See Sections "Extracting Metadata from an Excel File" and "Creating Metadata from a DBase File".)

The last two components (XMLExtract and XMLXPathReader) read data from XML files, but these are moreadvanced questions and they are dealt in "Advanced Readers" Section below.

Unlike CloverDataReader and XMLExtract, the others of these file readers can also receive data through theiroptional input port.

CloverDataReader


You must create metadata by hand or select some prepared or you have metadata stored in some metadata file thatwas created when writing data to Clover file.

Readers

137

This component reads data that are stored in internal Clover binary data format. With this component, you canread data in this internal format that allows faster access to stored data. When you read such a file, you can alsohave an index file which allows you to select individual records from the data file. In addition to this, the file youread can be compressed.

Remember that CloverDataReader cannot work with ftp, http and https protocols and gzip files andneither it can read data from stdin. It can only read data from common files or compressed zip files.

You must select which file should be read (File URL), whether you want to read compressed data (Compresseddata) (if it is not set explicitly, it depends on the .zip extension of the data file). If you do not want to read allrecords, you can specify the Index file URL (if it is not stored in the same folder as the clover data file or if ithas other name than datafilename.idx. The datafilename includes even the extension of the data file.)and the Start record and the Final record parameters. In such a case, records are read starting from the Startrecord up to the Final record. If you do not set the Final record, CloverDataReader will read and send out allrecords starting from the Start record.

If you read some Clover data, remember that in case of compressed file, all files (data file, index file and metadatafile) are compressed together in this compressed file within subfolders in the following way: DATA/datafile-name, INDEX/datafilename.idx and META/datafilename.fmt. Again, datafilename includesits extension.


XLSDataReader

This component has one optional input port and at least one output port. Whenever you connect an edge to anyoutput port, a new output port is created. You can extract metadata from an xls file.


This component reads data that are stored in an excel file. To read such data, you must first specify the sheet whereare the data you want to read. You can specify the Sheet number, or the Sheet name. If you select or type bothattributes, only the Sheet number will be applied. Remember that (at least) one of them must be specified. (Notethat the first sheet number is 0.)

You can use wildcards (*, ?) when specifying sheet names. And you can use a mask (separated by comma and gottogether with a hyphen: (number, minNumber-maxNumber, *-maxNumber, minNumber-*) or a similarcombination of numbers (1,3,5-7,9-*) when specifying sheet numbers.

You can specify which row contains the names of columns (Metadata row). If there is no such row, codes ofcolumns will be used as the names of the fields.

And you can also specify the rows that should be read starting from the Start row to the Final row.

Max number of records defines how many records should be read from the file as a maximum.

You can also set the Max error count attribute so as to limit errors that can be processed until the data parsingstops. It is set to 0 by default.

You may also want to specify the Field mapping attribute. This attribute is not required, sometimes you do notneed to specify it, but in other cases some mapping must be defined. See below.

Readers

138



Mapping and Metadata

If you want to specify some mapping (Field mapping), click the row of this attribute. After that, a button appearsthere and when you click this button, the following dialog will open:

Figure 15.4. XLS Mapping Dialog

This dialog consists of two panes: Xls fields on the left and Mappings on the right. At the right side of this dialog,there are three buttons: for automatic mapping, canceling one selected mapping and canceling all mappings. Youmust select an xls field from the left pane, push the left mouse button, move to the right pane (to the Xls fieldscolumn) and release the button. This way, the selected xls field has been mapped to one of the output clover fields.Repeat the same with the others xls fields too. (Or you can click the Auto mapping button.)

Figure 15.5. XLS Fields Mapped to Clover Fields

Note that xls fields are derived automatically from xls column names when extracting metadata from the XLS file.

Readers

139

When you confirm the mapping by clicking OK, the resulting Field mapping attribute will look like this (forexample): $OrderDate:=#D;$OrderID:=#A;

On the other hand, if you check the Return value with xls names checkbox on the XLS mapping dialog, thesame mapping will look like this: $OrderDate:=ORDERDATE,D;$OrderID:=ORDERID,N,20,5;

You can see that the Field mapping attribute is a sequence of single mappings, each of them is followed bysemicolon. The last semicolon is optional, it can be omitted.

Each single mapping consists of assignment of clover field name and xls field. Clover field is on the left side ofthe assignment and it is preceded by dollar sign, xls field is on the right side of the assignment and it is either thecode of xls column preceded by hash, or the xls field as shown in the Xls fields pane.

A pair of clover field and xls field (or xls code) is put together using colon and equal sign.

You must remember that you do not need to read and send out all xls columns, you can even read and send outsome of them only.

Now we will describe how you can map selected or all xlsColumns to cloverFields:

• First, we suppose that you have specified Metadata row:

• If all clover fields are the same as the selected xls fields that should be mapped to these clover fields (inde-pendently on their order), you do not need to define any mapping (Field mapping). In such a case, however,these selected xls fields would be mapped to all clover fields according to their names.

• If all clover fields are the same as the selected xls fields that should be mapped to these clover fields (inde-pendently on their order), but you want to preserve the original order of selected xls fields or change it tosome other order than that defined by clover fields, you must specify some mapping (Field mapping) asshown above. You would use all clover fields and selected xls fields.

• And, if only some clover fields are the same as some xls columns, you must also define some mapping (Fieldmapping) as shown above. The reason is that, in this case, xls columns and clover fields would not be mappedto each other by their names. Such clover fields would be empty.

• Second, we suppose that you have not specified any Metadata row:

• If you do not define any mapping (Field mapping), xls columns will be mapped to all clover fields in theorder of their appearance in XLS file starting from the first column of XLS file. In this case, you should skipMetadata row (if there is any) by setting Start row to the first row that contains data.

• If you define some mapping (Field mapping) (this time with codes of xls columns preceded by hash only),selected xls columns will be mapped to all clover fields according to the defined mapping (Field mapping).In this case, you should skip Metadata row (if there is any) by setting Start row to the first row that containsdata.

DBFDataReader

This component has one optional input port and at least one output port. Whenever you connect an edge to anyoutput port, a new output port is created.

If you connect an edge to the optional input port of the component, you must set the File URL attribute to port:$0.FieldName[:processingType]. Here processingType is optional and can be set to one of thefollowing: source, discrete and stream. If it is not set explicitly, it is set to discrete by default. (You

Readers

140

can see the meaning of these attribute values in the section "File URL" above.) Input data type of this FieldNamemust be one of the following three: string, byte or cbyte.

This component reads data from dBase files (extension .DBF). It can read only fixed length data records.

When you select this component, you must specify which file should be read (File URL), what character type isused in these records (Charset), and specify what to do in case of some incorrect records (Data policy). If youswitch to controlled data policy, you can log information about errors. In this component the log information issent to stdout.

It was already mentioned how you can extract metadata from this type of file.



Database ReadersSo far we have talked about file readers, but often you want to read data from databases instead of files. In suchcases you can read data using either some client that connects to database or some JDBC driver. You can extractmetadata from a database (See corresponding Section.)

Using JDBC DriversNow we will describe the following component that uses JDBC drivers - DBInputTable.

DBInputTable


This component reads data from databases. It can be used for various database systems. You only need to definea database connection. To do that, you must specify all of the following: host name of database server, databasename, user name, access password and JDBC driver that should be used to connect such database. Sometimes youmust also define the port number of the database connection.

When you select this component, you must define a query by typing some SQL query in the graph (SQL query)or specifying the location of some file with the SQL query (Query URL). If you define both, Query URL will beapplied. In the query, database table can be specified. You must choose some of the available database connections(DB connection). And what should be done in case of some incorrect record (Data policy). If you switch tocontrolled data policy, you can log information about errors. In this component the log information is sent to stdout.

You may also want to change the number of records that should be unloaded from database at the same time(Fetch size).

Of course, you can also specify what character type (Charset) should be used when reading external query URL.

And finally, it is also possible to set a phase of parsing data (Phase), set the visual name located on the component(Component name) and enable/disable the component (Enable).

Readers

141

Advanced ReadersThe components described above can read files of different types and databases. But some information is notcontained in these two data resources or it is contained in more complicated structured files. CloverGUI offersyou four additional advanced readers: XMLExtract, XMLXPathReader, JMSReader and LDAPReader. Thefirst two components read XML files, JMSReader receives Java messages and LDAPReader gets informationfrom LDAP directories. XMLXPathReader can also receive data through its optional input port.

XMLExtract

This component has at least one output port. Whenever you connect an edge to any output port, a new output portis created. You must create metadata on the output port(s) by hand or select some prepared.

This component reads data from so called XML file or any other text file with XML-like nested tree structure.

This component is faster than XMLXPathReader that can read XML files too. The mapping can be done startingfrom some selected level and going to the depth. It uses SAX technology.

When you select this component, you must specify which file should be read (File URL). Sometimes you wantto skip some amount of records. You can do it by specifying this Skip mappings attribute. By default it is 0.You can also select how many records should be sent out from the file (Max number of mappings), otherwiseXMLExtract would read and send out all data records.

These two attributes above limit the outgoing records on the outputs and send the records to the output ports oneby one. Some number of the outgoing records is skipped (Skip mappings), some number of the others is sent outthrough the output ports one by one (Max number of mappings).

The most important is to define some mapping of the original file to the output ports. The number of output portsis not defined exactly. It depends on the selected Mapping.

You can also set the Use nested nodes attribute to true. Below we will describe what this means.



Because the original XML nested tree files have the structure of some pairs of tags surrounding either other pairsof tags or some text representing data, the mapping of the original XML file to other data file or database tableor any other data resources must have similar structure. Each mapping must use some tags, but its nesting mustbe solved in a different way. Nested parts of the original XML file (different pairs of tags surrounding a serie ofother tag pairs) are sent to different output ports unless the Use nested nodes attribute is set to true. Then, onlysome parts of the original XML nested file will be sent to different output ports. However, the structure must bevery similar in both the original file and the mapping.

• If you have some XML file which is located between some root start-tag <roottag> and the correspondingend-tag </roottag>, its mapping must have a similar structure that starts with <Mappings> as its start-tagand terminates with </Mappings> as its end-tag. All other tags of mapping must be located between thesetwo terminal tags. Their structure must correspond to the structure of the original file. The serie of tag pairs thatare at the same level in the original XML file must also be at the same level and between the correspondingsurrounding tags in the mapping. They also must create a serie of mapping tags of the same level. Those tagpairs that are located deeper in the original file must also be located deeper in the mapping and must be at

Readers

142

the corresponding place between the corresponding mapping tags. They also must be a serie of mapping tagscreating different levels.

• If you want to assign (map) some of the tags (elements) to some of the output ports, you must do it in thefollowing way:

Between this pair of terminal tags mentioned above and destined for mapping ( <Mappings> and </Mappings>), there must be a serie consisting of the following two expressions:

• Closed ("empty") tag like this:

<Mapping element="tagA" outPort="noA" xmlFields="eAA" cloverFields="eAB"parentKey="eAC" generatedKey="eAD"/>

• A nested structure like this:

<Mapping element="tagB" outPort="noB" xmlFields="eBA" cloverFields="eBB"parentKey="eBC" generatedKey="eBD">

A serie of some closed mapping tags as mentioned above or a serie of somenested structures like this one or both types of the structure can be here.

</Mapping>

In the last case, the elements in the middle must be a combination of the two structure types mentioned above.They can be either a serie of closed ("empty") tags or a serie of nested structures or a mixture of the two types.Remember that the levels and nesting of the mapping must correspond to the levels and nesting of the originalfile. Remember also that the numbers of the output ports must differ in each of these mapping expressions. Foreach selected element, different output port must be assigned.

Thus, if the element="sometag" expression corresponds to some pair of tags at some level(<sometag>...</sometag>) of the original file, only the elements that lie at the same level or in greaterdepth can be sent to the output ports and mapped to metadata on the output:

Note that the "noA", "noB", ..., "noD", etc. are the numbers of the output ports of the component surroundedby double quotes through which data is sent out.

Now we must explain what expressions should be used to designate "eAA", "eAB", ..., "eDD",etc. They are all sequences of tags (xmlFields and parentKey) or metadata fields (clover-Fields or generatedKey) separated by semicolon. These sequences are surrounded by doublequotes. (For example, you can have: xmlFields="firstname;lastname;salary;address"cloverFields="fname;lname;slr;addr".)

• Some metadata fields on the output port must belong to the level that lies at the next deeper level of the originalXML file. In other words, this concerns its children. However, it concerns only those parts of the childrenlevel that can offer some values to fill the selected fields being themselves of the <sometag>somevalue</sometag> type.

These various sometag-s are the mentioned xmlFields. It is only on your decision how many tags fromthis part of the children level you want to select as the xmlFields that should be sent to the output.

In case that the Use nested nodes attribute is set to true, you can also select as xmlFields the tags that arelocated deeper in the original file.

The selected fields can be renamed (mapped) by setting xmlFields="eCA" cloverFields="eCB".This way, these xmlFields (names of tags) that are expressed by the eCA are assigned to cloverFields(names of metadata fields) that are expressed by the eCB.

Remember that if the xmlFields are the same as the cloverFields, you do not need to do any mappingbetween them. In such a case, you would only need the expression of the following type:

Readers

143

<Mapping element="tagC" outPort="noC" parentKey="eCC" generatedKey="eCD">

...

</Mapping>

• The other metadata fields on the output port belong to the next higher level of the original XML file. In otherwords, this concerns the parent level. However, it concerns only such parts of the parent level offering somevalues to fill the selected fields being themselves of the <sometag>somevalue</sometag> type.

This is the mentioned parentKey. It is only on your decision how many tags from this parent level you wantto select as the parentKey that should be sent to the output.

These fields can be renamed (mapped) by setting parentKey="eDC" generatedKey="eDD". This way,this parentKey (serie of names of tags) that is expressed by the eDC is assigned to generatedKey (somecombination of names of metadata fields) that is expressed by the eDD.

Remember that if the next higher level in the original file does not contain any structure of the<sometag>somevalue</sometag> type, you do not have parentKey nor generatedKey. Thus,you have only the expression of the following type:

<Mapping element="tagD" outPort="noD" xmlFields="eDA" cloverFields="eDB">

...

</Mapping>

• Remember that the number of selected xmlFields and cloverFields contained in these expressions mustequal to each other. But, what is the most important: You do not need to map xmlFields to cloverFieldsin case both the tags at the selected level in the original XML file and the field names in metadata on the outputport have identical names. In such a case, they are mapped to each other automatically according to their names.Remember that you do not need to use all of the tags. You can limit yourself to some of them only.

• Now we must mention one more possibility concerning what you can do when mapping some XML file to someof the output ports. If you set the Use nested nodes attribute to true, some of the tags that are at deeper levelsand (at the same time) consist of a serie of <sometag>somevalue</sometag> expressions can be sent tothe same port as the original level. Their sometag-s are new xmlFields and can also be renamed (mapped)to some cloverFields (metadata fields) on the output.

• Thus, xmlFields are sequences of names of the tags at some levels of the original XML file. Also the par-entKey are sequences of names of the tags. But, cloverFields can be any other names of metadata fields.They are the metadata field names. You can also change the generatedKey names if you want. Even youcan concatenate some serie of tags contained in the parentKey="eEC" (where eEC is a sequence offields separated by semicolon) into one field for the eED which will be of the string data type and will belook like the following: generatedKey="nameED".

• Also, if you do not obtain a unique identification of the outgoing records with the help of the mentioned keys, youcan use a pair of the following expressions: sequenceField="metadatafieldAofthesequence"sequenceId="identificationofthesequence". And for the next deeper level ofthe mapping, you can use such new artificial field name as its parentKey (valuesof this new, artificial sequenceField): parentKey="metadatafieldAofthesequence"generatedKey="metadatafieldBofthesequence".

Example

For example, if you have an <employees> and </employees> root tags, if between <employees> atthe start and </employees> at the end there is a serie of structures describing a group of employees of somecompany (for example, 100 employees) and if data concerning every employee is described and located between

Readers

144

a pair of <employee> and </employee> tags in your original XML file, you can assign all employees to thefirst port by the following expression:

<Mapping element="employee" outPort="0" and the rest of expressions as shownabove/>

(Remember that the ports are numbered starting from 0.)

You must remember that all the information about each employee from the whole serie of employees will beconverted to one record consisting of some number of fields that are sent out through the selected output port (inthis case, the first port has been selected). Thus, if there are 100 employees, you will have 100 records flowingout through the first output port. The number of records flowing through the selected port equals to the number ofindividual employees that are delimited by pairs of <employees> and </employees> tags. Thus, for a serieof 100 pairs of <employee>...</employee> tags (between a pair of <employees>...</employees>tags), 100 records describing employees will be created and sent out through the selected output port.

XMLXPathReader

This component has one optional input port and at least one output port. Whenever you connect an edge to anyoutput port, a new output port is created. You must create metadata on the output port(s) by hand or select someprepared.


This component reads data from so called XML file or any other text file with XML-like nested tree structure.

It is not so fast as XMLExtract, but it can do more with the file nodes. It can do better mapping of the file structure.It uses DOM technology.

When you select this component, you must specify which file should be read (File URL). Sometimes you wantto skip some amount of records. You can do it by specifying this Skip mappings attribute. By default it is 0.You can also select how many records should be sent out from the file (Max number of mappings), otherwiseXMLXPathReader would read and send out all data records.

These two attributes above limit the outgoing records on the outputs and send the records to the output ports oneby one. Some number of the outgoing records is skipped (Skip mappings), some number of the others is sent outthrough the output ports one by one (Max number of mappings).

The most important is to define some mapping of the original file to the output ports. The number of output portsis not defined exactly. It depends on the selected Mapping.

You can also set the Data policy attribute to Strict, Controlled or Lenient. Strict is the default value. If youswitch to controlled data policy, you can send the log information about errors to stdout.



Because the original XML nested tree files have the structure consisting of some pairs of tags surrounding eitherother pairs of tags or some text representing data, also the mapping of the original XML file to other data file ordatabase table or any other data resources must have very similar structure. Each mapping must use some tags, butits nesting must be solved in a different way. However, the structure must be very similar in both the original file

Readers

145

and the mapping. But, now the mapping uses XML XPath language in contrast with the XMLExtract component.XPath language makes more simple the way of locating any tag at any level of the original XML file.

• If you have some XML file which is located between some root start-tag <roottag> and the correspondingend-tag </roottag>, its mapping must have a similar structure that starts with <Context someexpres-sions> as its start-tag and terminates with </Context> as its end-tag. All other tags of mapping must belocated between these two terminal tags. Their structure must correspond to the structure of the original file.The serie of tag pairs that are at the same level in the original XML file must also be at the same level andbetween the surrounding tags in the mapping. They also must create a serie of mapping tags of the same level.The tag pairs that are located deeper in the original file must also be located deeper in the mapping and mustbe at the corresponding place between the corresponding mapping tags. They also must be a serie of mappingtags creating different levels.

The start-tag of the mapping must look like this: <Context xpath="/roottag/tagA1/.../tagAj"outPort="0">. The number of tags in the selected serie of levels depends on you, but (at the same time) itdefines the level where you want to start the mapping. And remember that you can map only the tags that arelocated at the j-th level or deeper, independently on how deep they could be located. And note that you cannotmap any tag that is located higher.

• Now, once you have selected the j-th level and the port number in the expression above, you can select sometags at the levels below the j-th and assign (map) them to some clover fields (names of the fields in metadataon the selected output port).

These can be mapped in the following two ways:

Either the tags that are located next below the j-th level surround some values looking like this:<sometag>somevalue</sometag>. In such a case, you can map these tags to clover fields in the follow-ing way:

<Mapping nodeName="tagAj+1" cloverField="metadatafieldA"/>

<Mapping nodeName="tagBj+1" cloverField="metadatafieldB"/>

Or some other tags that are located more deeper (at the j+k-th level) are those that look like that:<sometag>somevalue</sometag>. In such a case, you can map these tags to clover fields in the follow-ing way:

<Mapping xpath="tagCj+1/.../tagCj+k" cloverField="metadatafieldC"/>

<Mapping xpath="tagDj+1/.../tagDj+k" cloverField="metadatafieldD"/>

Here, metadatafieldD in cloverField="metadatafieldD" is the metadata field name to whichtagDj+k is mapped.

Remember that you must map such tags (nodeNames) to clover fields only if you want to rename them inmetadata. If you do not want to rename them and (at the same time) metadata contain such a name, both willbe mapped to each other automatically.

It is on your decision how many tags from the original file you want to be selected and sent to the output. Youcan limit to some of them only.

And note that these series of mappings for all selected tags of one level must be surrounded by some <Contextxpath="tagEA/tagEB/.../.../tagEG" outPort="noE"> and </Context> pair of tags.

• If you want to assign (map) some of the tags to some of the other output ports, you must do it in the followingway:

Between this pair of terminal tags mentioned above and destined for mapping ( <Context xpath="/roottag/tagA1/.../tagAj" outPort="0"> and </Context>), there must be a serie consistingof the following two expressions:

Readers

146

• Closed ("empty") tag like this:

<Context xpath="tagEj+1/.../tagEj+m" outPort="noE" parentKey="eEA"generatedKey="eEB"/>

• A nested structure like this:

<Context xpath="tagFj+1/.../tagFj+m" outPort="noF" parentKey="eFA"generatedKey="eFB">

A serie of mappings like the most above, a serie of some closed tags asmentioned above or a serie of some nested structures like this one or alltypes of structures can be here.

</Context>

In the last case, the elements in the middle must be a serie of the two structures mentioned above. They can beeither a serie of closed ("empty") tags or a serie of nested structures or a mixture of the two types. Remember thatthe levels and nesting of the mapping must correspond to the levels and nesting of the original file. Rememberalso that the numbers of the output ports must differ in each of these mapping expressions. For each selectedelement, different output port must be assigned. The parentKey is always taken from the level that is next, buthigher than the level containing the <tagGj+m> tag. Remember that there can also be the xpath="tagHj+1"expression only. There is no need to go deeper than necessary.

Note that the "noA", "noB", ..., "noF", etc. are the numbers of the output ports of the component surroundedby double quotes through which data is sent out.

Now we must explain what expressions should be used to designate "eAA", "eAB",..., "eFB", etc. They are all sequences of tags (parentKey) or metadata fields (gen-eratedKey) separated by semicolon. These sequences are surrounded by double quotes.(For example, you can have: parentKey="firstname;lastname;salary;address"generatedKey="fname;lname;slr;addr".)

Thus, if the xpath="tagEA/tagEB/.../tagEG" expression ends at the j+m-th level and there are correspondsto some pair of tags at some level (<sometag>...</sometag>) of the original file:

• Some metadata fields on the output port belong to the next higher level of the original XML file. However, itconcerns only such parts of the parent level that offer some values to fill the selected fields being themselvesof the <sometag>somevalue</sometag> type.

This is the mentioned parentKey. It is only on your decision how many tags from this parent level you wantto select as the parentKey. Remember that you do not need to use all of the tags. You can limit yourself tosome of them only.

These fields can be renamed (mapped) by setting parentKey="eGA" generatedKey="eGB".

This way, this parentKey (sequence of names of tags separated by semicolon) that is expressed by the eGA ismapped to generatedKey (sequence of names of metadata fields separated by semicolon) that is expressedby the eGB.

• Remember that you do not need to map tags to cloverFields in case both the tags at the selected level inthe original XML file and the field names in metadata on the output port have identical names. In such a case,they will be mapped to each other according their names automatically. But, you do not need to use all of thetags. You can limit yourself to some of them only.

• Also, if you do not obtain a unique identification of the outgoing recordswith the help of the mentioned keys, you can use in your mapping apair of the following expressions sequenceField="metadatafieldAofthesequence"sequenceId="identificationofthesequence". And for the deeper part of the original

Readers

147

file and its mapping, you can use such new artificial field name as its parentKey (val-ues of this new, artificial sequenceField): parentKey="metadatafieldAofthesequence"generatedKey="metadatafieldBofthesequence".

JMSReader

This component has at least one output port. Whenever any output port is connected, a new output port is created.

It receives Java messages and sends out data records. The component implements the JmsMsg2DataRecordinterface.

Once you have created the connection, you only need to specify it in the component (JMS connection).

To create such a connection, you must first specify all of the following: name of the connection, Initial contextfactory class, available libraries, URL, Connection factory JNDI name, Destination JNDI, User and Password.(For more information about how to create JMS connection, see corresponding section.)

Sometimes you also need to specify some JMS message selector.

You can also define the processing transformation by specifying one of the following three attributes: Processorclass, Processor code or Processor URL. (Processor class is a path and a file name of some class, jar or zip filelocated outside the graph. Processor code is the transformation defined in the graph itself with the help of theJava language. Processor URL is a path and a file name of some file written in Java language.)

If you want to define the Processor class attribute, you must click its item row, after which a button appears there,and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (See Section"Open Type Wizard" for more information.)

If you want to define the Processor code attribute, you must click its item row, after which a button appears there,and, when you click this button, an Edit value wizard opens. In this wizard you can define the transformation inJava language. (See Section "Edit Value Wizard" for more information.)

If you want to define the Processor URL attribute, you must click its item row, after which a button appears there,and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section"Locating Files with URL File Dialog" for more information.)

It is also important to decide whether you want to limit the number of received messages and/or time of processing.This can be done by specifying the maximal number of messages (Max msg count), the timeout (Timeout) orboth. The two of these attributes are set to 0. This means that processing should never stop. It is not limited bynumber of messages. Neither it is limited by time. But, if you want to limit any of the two properties, you must doit by defining some positive number(s). Thus, the processing will be limited by number of messages, or time ofprocessing (the attribute is in milliseconds), or both of these attributes. When the specified number of messagesis received, or when the process lasts some defined time, the process stops.

Of course, you can also specify what character type (Charset) should be used when reading external ProcessorURL.

Of course, as in the case of the other components, you can change the phase of parsing data (Phase), set the visualname located on the component (Component name) and enable/disable the component (Enable).

Readers

148

LDAPReader

This component has at least one output port. Whenever any output port is connected, a new output port is created.

The metadata on the output must precisely describe the structure of the read object.

This component reads information from LDAP directory converting it to Clover data records. It provides the logicfor extracting the results of a search and converts them into Clover data records. The results of the search musthave the same objectClass.

When you select this component, you must specify which directory should be read (Ldap URL). It has the fol-lowing form: ldap://hostname:portnumber. You must also define the base distinguished name (BaseDN). It is a sequence of attribute and value pairs separated by comma. For example, it can look like this:dc=example,dc=com.

You need to specify what filter should be used to search (Filter). It is defined by specifying a combination ofsome attribute and value pairs. For example, sn=*.

You can also decide whether one object should be searched (object), or a level next below the distinguished name(onelevel), or the whole subtree below the distinguished name (subtree). It can be done by defining one of thementioned values of the Scope attribute.

Sometimes you may need define your user name (User) and your password (Password). Your username can besimilar to the following: cn=john.smith,dc=example,dc=com.


149

Chapter 16. WritersWriters are the final components of the transformation graph. Each writer must have at least one input port throughwhich the data flows to this graph component from some of the others. The writers serve to write data to files ordatabase tables located on disk or to send data using some FTP, LDAP or JMS connection. Among the writers,there is also the Trash component which discards all of the records it receives.

In all writers it is important to decide whether you want either to append data to the existing file or sheet or databasetable (Append attribute for files, for example), or to replace the existing file or sheet or database table by a newone. The Append attribute is set to false by default. That means "do not append data, replace it".

It is important to know that you can also write data to one file or one database table by more writers of the samegraph, but in such a case you should write data by different writers in different phases.

Remember that (in case of most writers) you can see some part of resulting data when you right-click a writer andselect the View data option. After that, you will be prompted with the same View data dialog as when debuggingthe edges (For more details see Section "Viewing the Data Flowing through the Edges".). This wizard allows youview the written data (it can only be used after graph has already been run).

File URLIn order to work with the components, you must set File URL in some of them.

These are some examples of the File URL attributes for writing data.

• /path/filename.out

• /path/filename1.out;/path/filename2.out This way you can write two files on your disk.

• /path/filename$.out This way you can write the files named according to the specified pattern on yourdisk. The dollar sign means the numbering of the files from 0 up to 9.

• /path/filename$$.out This way you can write the files named according to the specified pattern on yourdisk. The dollar signs mean the numbering of the files from 00 up to 99.

• zip:/path/file$.zip This way you can write data to a compressed zip file(s) whose name(s)correspond(s) to the specified pattern.

• zip:/path/file$.zip#filename.out This way you can write specified file into the compressed zipfile(s) on your disk.

• gzip:/path/file$.gz This way you can write data to a compressed gzip file(s) whose name(s)correspond(s) to the specified pattern. Remember that CloverDataWriter cannot write data to any gzip file.

• ftp://user:password@server/path/filename.out This way you can write specified file(s) intosome remote server. Remember that CloverDataWriter cannot use ftp.

• port:$0.FieldName:discrete If this URL is used, output port must be connected. The :discreteexpression is optional here (since it is discrete by default) and can be omitted. The specified field of such datarecords that are sent out through this optional output port represents one particular data source. Output data typeof this FieldName must be one of the following three: string, byte or cbyte.

• - This way you can write data to stdout. Remember that CloverDataWriter cannot write to stdout.

File WritersThese components write data to files. Only Trash does not write data, it discards all incoming records. One com-ponent writes data to flat files: UniversalDataWriter. The others write data into internal Clover format (Clover-

Writers

150

DataWriter), Excel files (XLSDataWriter), DBase files (DBFDataWriter) and StructuredDataWriter caneven write records with a defined structure.

Unlike CloverDataWriter (and Trash, of course), other file writers can also send out data through their optionaloutput port.

Partitioning Data Flow into Different Output FilesThree components allow you to part the incoming data flow and distribute the records among different output files.These components are the following: UniversalDataWriter, XLSDataWriter and StructuredDataWriter.

If you want to part the data flow and write the records into different output files depending on a key value, youmust specify the key for such a partition (Partition key). It has the form of a sequence of record field namesseparated by semicolon.

You must also decide whether the individual output files should be numbered or whether they should be namedaccording to some field values. This can be done by choosing either the Number file tag or Key file tag valueof the Partition file tag attribute.

The output files are numbered by default (Number file tag option is the default value of the Partition file tagattribute.)

In both cases, the File URL value will only serve as the base name for the output file names.

• If you want to give numbers to the output files, you must set the Partition file tag attribute to Number file tag.

If the File URL is the following: path/filebasename, the output file names will be constructed accordingto the following pattern: path/filebasename#.

The number of hashes depends on how many output files could be created. One hash corresponds to one digit.The files could also be the following: path/filebasename## or path/filebasename###. The filesare numbered starting from 0.

Thus, the output files can be created according to the following pattern: path/filebasename0, path/filebasename1, ..., path/filebasename892, for example.

• If you want to give some explicit names to the output files, you must set the Partition file tag attribute to theKey file tag.

If the File URL is the following: path/filebasename, the output file names will be constructed accordingto the following pattern: path/filebasenamedistinguishingnames.

If the Partition key attribute is of the following form: field1;field2;...;fieldN andthe values of these fields are the following: valueofthefield1, valueofthefield2, ...,valueofthefieldN, all the values of the fields are converted to strings and concatenat-ed. The resulting values are used as distinguishingnames. They will be the following:valueofthefield1valueofthefield2...valueofthefieldN.

Thus, the output files can be created according to the following pattern: path/filebasenamevalueofthefield1valueofthefield2...valueofthefieldN.

For example, if you have the File URL attribute like the following: path/out, and iffirstname;lastname is the Partition key, you can have the output files as follows: path/outjohn-smith, path/outmarksmith, path/outmichaelgordon, etc.

In addition to this, you can also select other names for output files. This can be done by using some lookup table(Partition lookup table) and specifying the Partition output fields attribute. This way, new names are given tothe same output files. Partition output fields is a sequence of fields taken from Partition lookup table separatedfrom each other by semicolon. If some Partition key values are not contained in Partition lookup table, suchrecords will be written to the unassignedfilebasename file.

Writers

151

Trash

This component has one input port.

This is the most simple component. It discards all of the records it receives. Nevertheless, it still can write incomingdata to a file (Debug print and Debug file URL attributes) or send it to stdout.

You may also want to set the character type that should be used for encoding data that will be written to the outputfile(s) or sent to stdout (Charset).

You must also decide whether the data should be appended to the debug file or whether the file should be replaced(Debug append). Default is false, what means "do not print data, discard it".

It is also possible to change the phase of parsing data (Phase), set the visual name located on the component(Component name) and enable/disable the component (Enable).

Flat File WritersOne component writes data to flat files: UniversalDataWriter. It can also part the incoming data flow and writeit into different output files.

This file writer can also send data out through its optional output port.

UniversalDataWriter

This component has one input port and one optional output port.

This component writes data to flat files. It can write delimited, fixed length and/or mixed data records dependingon metadata on its input port.

When you select this component, you must specify the file(s) to which the data should be written (File URL).

If you connect an edge to the optional output port of the component, you must set the File URL attribute toport:$0.FieldName[:processingType]. Here processingType is optional and can only be set to:discrete. (You can see the meaning of these attribute values in the section "File URL" above.) Output datatype of this FieldName must be one of the following three: string, byte or cbyte.

You may also want to set the character type that should be used for encoding data that will be written to the outputfile(s) (Charset).

If you want, you can write the names of the fields (Write field names) to the first row of the output file(s). Itis set to false by default.

It is very important to decide whether the records should be appended to the existing file (Append) or whetherthe file should be replaced. This attribute is set to false by default ("do not append, replace the file").

You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. If you did not specify theseattributes, UniversalDataWriter would write all incoming data records to output file(s).

Writers

152

You can also limit the number of records that can be contained in one file as a maximum (Records per file) and/or the file size in bytes (Bytes per file). In such a case, if you want to write incoming data records to more outputfiles, and not only one, you must use dollar signs in the output file base name (in File URL). This way, outputfiles will be more and data records will be written to different output files.

If you want to part the data flow and distribute the incoming records among different output files, you must definethe Partition key attribute and select the value of Partition file tag (either Number file tag or Key field tagvalues). The default value of this attribute is Number file tag. If you want to give other names to these outputfiles, you must specify Partition lookup table and Partition output fields.


Other Type File WritersThe other four components write data to other type files. (CloverDataWriter, XLSDataWriter, DBFDataWriterand StructuredDataWriter.) These files are internal Clover format files (CloverDataWriter), Excel files (XLS-DataWriter), dBase files (DBFDataWriter) and files with more complicated structure (StructuredDataWriter).

Unlike CloverDataWriter, the others of these file writers can also send out data through their optional output port.

The last component (XMLWriter) writes data to XML files, but this is a more advanced question and it is dealtin "Advanced Writers" Section below.

XMLWriter can also send out data through its optional output port.

CloverDataWriter


This component writes data in our internal binary Clover data format. With this component, you can write datain this internal format that allows faster access to data.

When you select this component, you must specify the output file(s) to which the data should be written (FileURL).

Remember that CloverDataWriter cannot work with ftp, http and https protocols and gzip files andneither it can send data to stdout. It can only write data to common files or to compressed zip files.

When you write such a file, you can also create and save an index file which allows you to subsequently selectindividual records from the data file. You can also create and save the metadata file. You can specify whether youwant to save the index file (Save index) and/or save the metadata file (Save metadata). Both attributes are set tofalse by default. In addition to this, all file(s) can be compressed in one zip archive.


You can also limit the number of records that can be contained in one file as a maximum (Records per file). Insuch a case, if you want to write incoming data records to more output files, and not only one, you must use dollarsigns in the output file base name (in File URL). This way, output files will be more and data records will bewritten to different output files.

You can also decide whether you want to compress the output data (Compress data). (This attribute is not re-quired.)

Writers

153

• If you set this attribute to true, CloverDataWriter will compress the created file(s) into one output file, inde-pendently on whether the output file name (File URL) contains the zip extension or not.

• If you set this attribute to false, CloverDataWriter will not compress the created file(s), independently onwhether the output file name (File URL) contains the zip extension or not. It will save all created file(s) sep-arately.

• If you do not specify this attribute, CloverDataWriter will compress the created file(s) into one output file onlyif the output file name (File URL) contains the zip extension. Otherwise, it will save all created file(s) separately.

If you do not compress the created file(s), the file(s) will be saved separately with the followingname(s): datafilename (for the file with data), datafilename.idx (for the file with index) anddatafilename.fmt (for the file with metadata). In all of the created name(s), datafilename contains itsextension in all of the three created file(s) names.

If you compress the created file(s), you can also set the compression level (Compress level). (The Compresslevel attribute can be set to a number from 0 to 9 where 0 equals to "without compression".) Higher number meansbetter compression, but writing is slower.

If the created data file has the following name: datafilename, the final output file will have the following in-ternal structure: DATA/datafilename, INDEX/datafilename.idx and META/datafilename.fmt.Here, datafilename includes its extension in all of the three names. For example: DATA/employees.clv,INDEX/employees.clv.idx, META/employees.clv.fmt.

If you set the Compress level attribute to 0, all created file(s) will be contained in the same output file with thesame internal structure (see above), but the created file(s) will not be compressed.

You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. It is unlimited by default. Thus,if you did not specify these attributes, CloverDataWriter would write all incoming data records to output file(s).


XLSDataWriter


When you select this component, you must specify the file to which the data should be written (File URL).


This component writes data to an excel file. You must first specify the sheet to which you want to write. You cando it by specifying the Sheet number or the Sheet name. If you specify both attributes, only the Sheet name willbe applied. If such sheet does not exist, it will be created with given name and number. But the Sheet numberattribute will be ignored.

You can use as the Sheet name some serie of clover fields preceded by a dollar sign and separated by colon,semicolon or pipe. Thus, if different combination of clover fields is selected, a new sheet will be created.

You can also specify to which row you want to write the names of the columns (Metadata row). It is 0 by default.That means that names of the columns would not be written to the sheet. You can specify the rows and columns

Writers

154

from where you want to start writing. The records will be written starting from the Start row and from the Startcolumn. They are 1 and A by default.

It is very important to decide whether the records should be appended to the existing sheet (Append to the sheet)or whether the sheet should be replaced. This attribute is set to false by default ("do not append, replace the sheet").Note that this attribute does not concern the whole file now, but only a sheet!

You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. If you did not specify theseattributes, XLSDataWriter would write all incoming data records to output file(s).

You can also set a limit to the number of all the records that should be contained in one file as a maximum (Recordsper file). If you want to write incoming data records to more output files, and not only one, you must use dollarsigns in the output file base name (in File URL). This way, output files will be more and data records will bewritten to different output files.



StructuredDataWriter

This component has one, two, or three input port(s) and one optional output port. The second and third input portsare optional. These can serve to receive data for writing the header and/or the footer, respectively.

When you select this component, you must specify the file to which the data should be written (File URL).



This component writes data to files according to some pattern defined in its Body mask attribute. You must specifythis attribute and you can also define some text that should be written at the head of the file (Header mask) and/or at the end of the file (Footer mask). This component can have up to 3 input ports. The first is for body of thefile and the second and the third can be for header and footer, respectively. (If there are any second and/or thethird.) If any of the last two ports is not connected, you can type yourself any other static header and/or staticfooter. Even for the body of the file you can write any structure you want independently on that the first port withdata is connected. But, you can also define the structure of what should be written to the output file using the dataincoming through the first input port.

But, if you have connected all the ports or some of them, you can define some mask in the following way: Whenyou click the Body mask, the Header mask or the Footer mask attributes, a button appears on the right sideof the line and after clicking this button, a Mask wizard opens. In this wizard, you can see the Metadata andMask panes. At the bottom, you can see the Auto XML button. If you click it, a simple XML structure appearsin the Mask pane.

Writers

155

Figure 16.1. Create Mask Wizard

You only need to remove the fields you do not want to save to the output file and you can also rename the suggestedleft side of the matchings. These have the form of matchings like this: <sometag=$metadatafield/>. Bydefault after clicking the Auto XML button, you will obtain the XML structure containing expressions like this:<metadatafield=$metadatafield/>. Left side of these matchings can be replaced by any other, but theright side must remain the same. You must not change the field names preceded by a dollar sign on the right sideof the matchings. They are the names of the data fields.

But remember that you even do not need to use any XML file as a mapping. The mapping structure you selectcan be of any other type. But always you must use the metadata fields preceded by a dollar sign. They representthe values of the corresponding data fields.


You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. If you did not specify theseattributes, StructuredDataWriter would write all incoming data records to output file(s).

You can also limit the number of records that can be contained in one file as a maximum (Records per file) and/or the file size in bytes (Bytes per file). If you want to write incoming data records to more output files, and notonly one, you must use dollar signs in the output file base name (in File URL). This way, output files will be moreand data records will be written to different output files.



Database WritersSo far we have talked about file writers, but often you want to write data to databases instead of files. In such casesyou can write data using either some client or utility that connects to database or some JDBC driver.

Using JDBC DriversNow we will describe a component that uses JDBC drivers - DBOutputTable. When using this component, youdo not need any database client or other utility on your computer, but working with DBOutputTable is slowerthan working with database bulk loaders.

Writers

156

DBOutputTable

This component has one input port and two optional output ports. These output ports can be used for records thathave been rejected by database table (first one) and/or for so called autogenerated columns (second one) (supportedby some database systems only). Metadata on the first optional output port can be the same as on the input port.Or they can have an additional string field at the end containing an error message generated when parsing therecord. Metadata on the second optional output port must correspond to the autogenerated columns of the selecteddatabase.

This component writes data to databases. Unlike database writers for different database systems based on usingclient-server architecture or using special DB utilities, DBOutputTable can be used for various database systemsdepending on selected JDBC driver.

First, you need to define a database connection. You must choose some of the available database connections (DBconnection). To create such a connection, you must specify all of the following: host name of database server,database name, user name, access password, JDBC specific and JDBC driver that should be used to connect suchdatabase. Sometimes you also need to define the port number.

You must also specify the database table and define some mapping of Clover fields (names of the fields in meta-data) to database fields. You can do it in the following way:

Database table can be specified as one of the attributes of the component (DB table) or in a query. The query canbe defined in the graph (SQL query) or in some file outside the graph (Query URL). You should define onlyone of these three attributes.

However, if you define not only DB table, but also SQL query or Query URL or both, the DB table attributewill be ignored. And if you specify SQL query along with Query URL, only the SQL query will be applied.

• If you define some query (independently on whether it is SQL query in the graph or Query URL outside thegraph), you have two possibilities how to map Clover fields to DB fields.

• You can use selected Clover fields, each of them preceded by a dollar sign, in the query itself.

In such a case, you do not need to define any mapping: the Field mapping, Clover fields and DB fieldsattributes will not be specified. Even if you specified any of them, it would be ignored.

• You can use question marks as placeholders in DB table specified in the query for such Clover fields thatshould be mapped to these DB fields.

If you specify the Field mapping attribute, it will be used to map specified Clover fields to specified DBfields. The resulting values of DB fields will be inserted to the DB table specified in the query.

If you want to map specified Clover fields to specified DB fields, but you do not use the Field mappingattribute, Clover fields will be mapped to DB fields automatically, according to their mutual order in theClover fields and DB fields attributes. The resulting values of DB fields will be inserted to the DB tablespecified by the query.

If you specify the Clover fields attribute alone, these fields will be mapped (in their order in the attribute)to DB fields represented in the query by question marks.

Remember that if you specify both the Clover fields attribute and the DB fields attribute, these Clover fieldswill not be mapped to the question marks directly, they will be mapped (in their mutual order) to these DBfields in the DB fields attribute first and (after that) these values of DB fields will be inserted into the DBtable specified in the query into corresponding columns.

Writers

157

(Field mapping is a sequence of expressions of the form $cloverField:=dbField, each of them isfollowed by semicolon. The last semicolon is optional, it can be omitted.)

(Clover fields or DB fields attributes are sequences of Clover field names or DB column names, respectively,separated by semicolon. Even the last field can be followed by semicolon, but such delimiter is optional andcan be omitted.)

If you do not specify Field mapping, nor Clover fields, nor DB fields, Clover fields will be mapped to DBfields according to the order of Clover fields in metadata. The number of Clover fields must equal to thenumber of DB fields.

Remember that if you specify Field mapping along with these other two attributes, only Field mappingwill be applied.

• If you define the DB table attribute:

• If you want to map Clover fields to DB fields and do not define any mapping, Clover fields will be mappedto DB fields automatically, according to their order in metadata. The number of Clover fields must equal tothe number of DB fields.

• If you want to map Clover fields in different order or if you want to map only some of them, you must definesome mapping.

If you want to define such mapping, you must specify either the Field mapping attribute alone, or both theClover fields (Clover fields attribute) and DB fields (DB fields attribute) that should be mapped to each otheror the Clover fields attribute alone.

Remember that if you specify the Field mapping attribute, both the Clover fields and DB fields attributeswill be ignored.

Remember also that if you do not specify the Field mapping attribute, but you define the Clover fields andDB fields instead, these fields will be mapped to each other by the order of their appearance in the mentionedattributes. The resulting values of DB fields will be inserted to the DB table specified as the DB table attribute.

If you specify the Clover fields attribute alone, these fields will be mapped to DB columns in the order oftheir appearance in this attribute.

Remember that if you specify Field mapping along with these other two attributes, only Field mappingwill be applied.

You can define maximum number of errors (Max error count) after which the process stops. If you set thisattribute to -1, all errors will be ignored. Default value is 0.

You can also define what should be done if error occurs. The Action on error attribute can be set to ROLLBACKor COMMIT. Default is COMMIT. You can specify the number of records that should be committed at once(Commit). If an error occurs, the last batch can be committed or rolled back.

If your database and/or JDBC driver support batch mode of sending statements to database, you should set theBatch mode attribute to true since this can speed data loading. You can also specify how many records should beloaded in one batch update (Batch size). It is set to 25 by default.

It is deprecated now, but in case that the SQL query attribute contains only one query, you can also set the Autogenerated columns attribute.

For Oracle and DB2 databases, they are the names of database columns that should be returned.

For Informix and MySQL, they are the field names of incoming records along with one additional field called"AUTO_GENERATED" that should be returned.

Remember, that Batch mode makes impossible to generate a key.

Writers

158

You may also want to set the character type (Charset) that should be used when reading external Query URL.


Using Database Bulk LoadersNow we will describe such database components that do not need to use JDBC drivers. They are all faster thanDBOutputTable. But you need to have installed specific database client or other utility on your computer so asto connect to some specific database server.

DB2DataWriter

This component has one optional input port and one optional output port. It can read data through the input portor from some file. If the input port is not connected to any other component, data must be contained in other filethat should be specified in the component.

If you connect some other component to the optional output port, it can serve to log the information about errors.Metadata on this error port must have three fields: the number of incorrect record (integer), either the number ofincorrect field (for delimited records) or the offset of incorrect field (for fixed length records) (both are of integerdata type) and the error message (string).

This component writes data to databases. It can only be used for DB2 database system.

First, you need to install the DB2 database client on the computer with CloverGUI. Only then you can use thiscomponent. You must specify all of the following: database name (Database), name of database table (Databasetable) you want to work with, your user name (User name), your access password (Password) and the mode toload data (Load mode). You must select as the Load mode attribute one of the following: insert, replace, restart,terminate. The default value is insert.

If the component does not receive data through the input port, you must specify the file from which it should readthe desired data (Data file URL). When you read data from such an external file, you must define its metadata (Filemetadata). All of the columns are separated from each other by a one char delimiter. The last column is delimitedby a line feed character ( \n ). The delimiter of the columns is defined in the Column delimiter attribute. You canalso define this delimiter in the Parameters attribute as the coldel variable. If you define both, the Columndelimiter attribute is applied. Remember that the delimiter must not be contained in record fields as their part.

This component allows you to assign the original metadata fields (clover fields) to the database fields. You mustdefine either the Field mapping attribute or both the clover fields (Clover fields) and database fields (DB fields)that should be assigned to each other. And they are assigned by their order of appearance in the mentioned at-tributes. Their number must be equal in both of the attributes. Remember that if you have defined the Field map-ping attribute, all fields listed in the Clover fields and DB fields attributes will be ignored. Field mapping is asequence of expressions of the form $cloverField:=dbField, each of them is followed by semicolon. Thelast semicolon is optional, it can be omitted. Note that you can map clover fields to database fields even withoutlisting the database fields. But, the number of clover fields must equal to the number of database fields.

If you read data from the input port, you can specify how many records should be skipped before writing to database(Number of skipped records). Remember that this is not valid for reading from the file specified in the Loaderinput file attribute. You can also set a limit to the Max number of records. If you set the rowcount variable in theParameters attribute, the value of this rowcount will be applied instead of the Max number of records attribute.

You can also set the Max warning count. This attribute limits the number of error messages and/or warnings.

You can also set the Max error count.

Writers

159

If you want to save rejected records, you must specify the Rejected records URL (on server) attribute. In thisplace - on database server - all of the rejected records will be saved.

The DB2 command interpreter attribute serves to define the interpreter that should execute the DB2 commands(connect, load, disconnect). It has the following form: interpreter [parameters] ${} [pa-rameters]. The name of the script file should be used as the ${} expression.

Batch file URL defines the file where db2 commands should be stored. Remember that the path must not containwhite spaces.

If you are working on Linux, you may also Use pipe transfer. You can send the data incoming through the inputport to pipe instead of temporary file. It is false by default.

You may also want to set some serie of parameters that can be used when working with DB2 database system(Parameters). All of the parameters must have the form of key=value or key only (if its value is true).Individual parameters must be separated from each other by colon, semicolon or pipe. Note that colon, semicolonor pipe can also be a part of some parameter value, but in this case the value must be double quoted.


InformixDataWriter

This component has one optional input port and one optional output port. It can read data through the input portor from some file. If the input port is not connected to any other component, data must be contained in other filethat should be specified in the component. If you connect some other component to the optional output port, itcan serve to log the rejected records and the information about errors. Metadata on this error port must have thesame metadata fields as the incoming or read records plus two additional fields at its end: number of row(integer) and error message (string).

This component writes data to databases. It can only be used for Informix database system.

First, you need to install the Informix dbload database utility on the computer with CloverGUI. In addition tothis, it is very import to have the server with the database on the same computer as both the dbload databaseutility and CloverGUI and be logged as root user. Only then, you can use the dbload database utility. Otherwise,in order to load data to database, you must use load2 free library instead of dbload utility. The load2 freelibrary can even be used in case of server located on a remote computer.

You must specify the name of database (Database), sometimes you need to specify the name of database server(Host) and you must select one of the following: either the database table (Database table) or control script(Control script). If you do not specify some control script, default script will be used. If you specify both databasetable and control script, the control script will be used. But remember that control script will be ignored if you usethe load2 library. In such a case you must specify the database table.

You must also set up the Path to dbload utility attribute. It is the path to the dbload.exe or dbload exe-cutable utilities. If you have the path to the utility in your PATH variable, you only need to set this attribute todbload.exe for Windows or dbload for Linux.

If you have not connected any component to the input port, you must specify the file that should be read (Loaderinput file).

If you want to log the process of loading data to database, you can select the name of log file in the Error logURL attribute. If you do not specify any other name, default name of log file (error.log) will be used.

When using the dbload utility, you can also specify the Ignore rows and Max error count attributes meaningthe number of rows that should be skipped and the number of errors after which the process stops, respectively.

Writers

160

The Max error count attribute is set to 10 by default. The Ignore rows attribute applies only when working withthe dbload utility.

You can set the Commit interval attribute which is 100 rows by default. You can also set the Column delimiterwhich is a pipe by default. But remember that it must not be contained in record fields as their part. Columndelimiter is used only when working with the dbload utility.

You may want to change the utility for loading data into database. You may want to prefer loading data with thehelp of the load2 free library instead of dbload utility. You must do it by setting the Use load utility attributeto true. In such a case, you need to specify the following four properties: your user name (User name), your accesspassword (Password) and the Ignore unique key violation attribute. The last property is set to false by default.The Use insert cursor attribute is set to true by default. This doubles data transfer performance. It is used onlywhen working with the load2 library,


MSSQLDataWriter

This component has one optional input port and one optional output port. It can read data through the input port orfrom some file. If the input port is not connected to any other component, data must be contained in other file thatshould be specified in the component. If you connect some other component to the optional output port, it can serveto log the rejected records and information about errors. Metadata on this error port must have the same metadatafields as the records plus three other fields at its end: number of incorrect row (integer), number ofincorrect column (integer), error message (string).

This component writes data to databases. It can only be used for MSSQL database system.

First, you need to install the MSSQL database client on the computer with CloverGUI. Only then you can use thiscomponent. You must specify all of the following: database name (Database), either the name of database table(Database table) or the name of database view (Database view) you want to work with, your user name (Username) and your access password (Password). If you are not the owner of database table or database view, you mustalso specify the name of database owner (Database owner). If you were the owner, this would not be necessary.

You must also set up the Path to bcp utility attribute. It is the path to the bcp.exe or bcp executable utilities.If you have the path to the utility in your PATH variable, you only need to set this attribute to bcp.exe forWindows or bcp for Linux.

If the component does not receive data through the input port, you must specify the file from which it shouldget the desired data (Loader input file). You can select the Column delimiter. Default delimiter is the tabulatorcharacter. But remember that it must not be contained in record fields as their part.

You may also want to set some serie of parameters that can be used when working with MSSQL (Parameters).For example, you can set the number of the port, etc. All of the parameters must have the form of key=valueor key only (if its value is true). Individual parameters must be separated from each other by colon, semicolonor pipe. Note that colon, semicolon or pipe can also be a part of some parameter value, but in this case the valuemust be double quoted.

Among the optional parameters, there can also be set userName, password or fieldTerminator for Username, Password or Column delimiter attributes, respectively. If some of the three attributes (User name, Pass-word and Column delimiter) will be set, corresponding parameter value will be ignored.


Writers

161

MySQLDataWriter

This component has one optional input port and one optional output port. It can read data through the input port orfrom some file. If the input port is not connected to any other component, data must be contained in other file thatshould be specified in the component. If you connect some other component to the optional output port, it can serveto log the rejected records and information about errors. Metadata of this error port must have three fields: numberof incorrect row (integer), number of incorrect column (integer), error message (string).

This component writes data to databases. It can only be used for MySQL database system.

First, you need to install the MySQL database client on the computer with CloverGUI. Only then you can use thiscomponent. You must specify all of the following: host name of database server (Host), database name (Database),name of database table (Database table) you want to work with, your user name (User name) and your accesspassword (Password).

You must also set up the Path to mysql utility attribute. It is the path to the mysql.exe or mysql executableutilities. If you have the path to the utility in your PATH variable, you only need to set this attribute to mysql.exefor Windows or mysql for Linux.

If the component does not receive data through the input port, you must specify the file from which it shouldget the desired data (Loader input file). You can select the Column delimiter. Default delimiter is the tabulatorcharacter. But remember that it must not be contained in record fields as their part. You can also specify howmany rows from data file should be ignored when working with database (Ignore rows). And you can specifythe Path to control script.

You may also want to set some serie of parameters that can be used when working with MySQL (Parameters).For example, you can set the number of port, etc. All of the parameters must have the form of key=value orkey only (if its value is true). Individual parameters must be separated from each other by colon, semicolonor pipe. Note that colon, semicolon or pipe can also be a part of some parameter value, but in this case the valuemust be double quoted.


OracleDataWriter


This component writes data to databases. It can only be used for Oracle database system.

First, you need to install the Oracle sqlldr database utility on the computer with CloverGUI. Only then you canuse this component. You must specify all of the following: name of database table (Oracle table) you want towork with, name of Transparent Network Substrate (TNS name) and your user name (User name). Optionallyyou can specify the access password as well (Password).

Note that you can also specify the names of database table columns (DB column names).

You must also set up the Path to sqlldr utility attribute. It is the path to the sqlldr.exe or sqlldr exe-cutable utilities. If you have the path to the utility in your PATH variable, you only need to set this attribute tosqlldr.exe for Windows or sqlldr for Linux.

Writers

162

Of course, you must decide what should be done with the database table. You have four options: Insert, Append,Replace, Truncate. This property can be set by the Append attribute. Its value is Append, by default.

And you can specify the Path to control script attribute. If you do not specify your proper Path to control script,the default will be used.

You can also log the process of loading data to database. To do that, you do not have at your disposal any output portbut you can specify the file to which the log should be written (Log file name). Its default name is loader.log.You can also specify the file to which incorrect records should be written (Bad file name). Its default name isloader.bad. And you can set the Discard file name attribute for writing the records that did not meet thedesired criteria. Its default name is loader.dis.


PostgreSQLDataWriter

This component has one optional input port. It can read data through the input port or from some file. If the inputport is not connected to any other component, data must be contained in other file that should be specified in thecomponent.

This component writes data to databases. It can only be used for PostgreSQL database system.

First, you need to install the PostgreSQL database client on the computer with CloverGUI. Only then you canuse this component. You must specify all of the following: host name of database server (Host), database name(Database), name of database table (Database table) you want to work with and your user name (User name).

You must also set up the Path to psql utility attribute. It is the path to the psql.exe or psql executable utilities.If you have the path to the utility in your PATH variable, you only need to set this attribute to psql.exe forWindows or psql for Linux.

If the component does not receive data through the input port, you must specify the file from which it shouldget the desired data (Loader input file). You can select the Column delimiter. Default delimiter is the tabulatorcharacter. But remember that it must not be contained in record fields as their part. You can also specify howmany rows from data file should be ignored when working with database (Ignore rows). And you can specifythe Path to control script.

You may also want to set some serie of parameters that can be used when working with PostgreSQL (Parameters).For example, you can set the number of port, etc. All of the parameters must have the form of key=value orkey only (if its value is true). Individual parameters must be separated from each other by colon, semicolonor pipe. Note that colon, semicolon or pipe can also be a part of some parameter value, but in this case the valuemust be double quoted.


Writers

163

Advanced WritersThe components described above can write data to files of different types and to databases. But some informationis not contained in these two data resources or it is contained in more complicated structured files. CloverGUIoffers you three additional advanced writers: XMLWriter, JMSWriter and LDAPWriter. The XMLWritercreates XML files, JMSWriter sends Java messages and LDAPWriter puts information into LDAP directories.

XMLWriter

This component writes data to an XML file with a nested tree structure. It can have more input ports as resourcesfor various levels of the resulting file. It uses SAX technology.

If you select this component, you must specify to which file the outgoing data should be written (File URL).


When specifying the output file name, you can use a pattern containing a dollar sign meaning any number from0 to 9. Thus, if you want to split the output file to some number of subfiles, you can do it with the help of dollarsigns. If you do not limit the output file size, it is not necessary. Thus, for example, if you have set the outputfile names to filename_$$.dat, when running the graph, filename_00.dat, filename_01.dat, etc.will be created if they are necessary.

You can set the number of mappings that can be contained in one output file as a maximum (Mappings per file).By default, all mappings are written to one file.

You can also set a limit to the total number of written mappings as well. This can be done by specifying Maxnumber of mappings. It is unlimited by default.

If you want to skip some records from the beginning, you can do it by setting the Number of skipped mappingsattribute. The default value of this attribute is 0.

You can set the Whole output to single line attribute to true. In such a case, the whole output XML file will bewritten to a single line. By default, this attribute is set to false.

By default, any output file uses as the root element the following tag: <root>. If you want to use other rootelement, you can specify the Name of root element in output XML file attribute. The root element is used onlyif more mappings are written to one output file.

If you do not want to use any root element, you can switch the Use root element attribute to false. Remember thatthe output file without any root element is invalid XML file.

If you use root element, you can specify Default namespace for root element and some other namespaces:Namespaces for root element. The Default namespace for root element is an URI of some namespace. Theother Namespaces for root element has the following form:

prefix1="URI1";...;prefixN="URIN"

Only if you use some root element, you can also specify DTD. This can be done by setting the following twoattributes: DTD public Id and DTD system Id. After that, the resulting output XML file will contain the followingDTD:

<!DOCTYPE [rootElement] PUBLIC "[dtdPublicId]" "[dtdSystemId]">

Writers

164

If you use some root element, you may also want to define XSD schema location. Remember that among names-paces should be one with the xsi prefix. Example: xsi="http://www.w3.org/2001/XMLSchema-in-stance"

It is important to define some mapping of the original files incoming through the input ports to the output XMLfile(s). The number of input ports is not defined exactly. It depends on the selected Mapping of ports to XMLstructure attribute. The tags in both mapping and output file(s) have the form of <element> (in this case,Default namespace is used) or <someprefix:element> (if Namespace corresponding to this someprefixis specified).

You may also want to set the character type that should be used when writing data to the output file(s) (Charset).



Because the XML nested tree files have the structure of some pairs of tags surrounding either other pairs of tagsor some text representing data, you must define the mapping of some amount of input files to the resulting outputXML file in a similar way. Each mapping must use some tags.

• If you have some number of input files incoming through input ports, you must decide which of the files shouldrepresent parents and which should represent children.

• Then you must decide whether some specific parent can have more children of some type or one child at most.For example, one customer can make more orders, but one order can be made by one customer only.

• If you want to map the input files to the output XML file, you must unite pairs of incoming files and interconnectthem mutually in the following way:

• First you must decide whether a parent can have more children of some type or at most one.

• If the parent can have more children, you must define the following structure of mapping for the children:

<Mapping element="ctagA" inPort="noA" key="cAA" keyToParent="cAB"fieldsAs="coptA" fieldsAsExcept="cAD" fieldsIgnore="cAE">

Some other mapping can be here. Its structure must be similar to this one.

</Mapping>

(Here, ctagA is some tag of a child, noA is the number of the input port, cAA is some key expression(sequence of fields separated by semicolon), coptA is an option (either elements or attributes), cADand cAE are some expressions of the child (sequences of field names separated by semicolon).)

(For parent, there would be ptagB, pBA, poptB, etc. See below.)

In this case, one parent can have more children. This is the reason why the keyToParent expression canbe found here. This keyToParent is some key from some child. It must identify the children uniquely andits values must equal to the values of some other key in the parent. It is of no importance whether the namesof the keys in parent and children are the same. But their values are those that must be the same.

Thus, then you will continue to the nested tree structure of parent and children:

<Mapping element="ptagB" inPort="noB" key="pBA" theotherattributesB>

<Mapping element="ctagC" inPort="noC" key="cCA" keyToParent="cCB(pBA)"theotherattributesC>

(Deeper levels of XML file expressed in a similar way.)

Writers

165

</Mapping>

</Mapping>

Note that the pBA in the key attribute of the parent must have the same values as the cCB(pBA) in thekeyToParent attribute which is the key in the children.

If you want to create two different children of one parent at the same level, you must do it by creating a serieof tag pairs like this:

<Mapping element="ctagD" inPort="noD" key="cDA" keyToParent="cDB(pBA)"theotherattributesD>


</Mapping>

<Mapping element="ctagE" inPort="noE" key="cEA" keyToParent="cEB(pBA)"theotherattributesE>


</Mapping>

Thus, you have found some common fields or group of fields in both parent and children, they have the namepBA in the parent and cDB(pBA) or cEB(pBA) in the children. But they must have the same values.

The meaning of the keyToParent="cDB(pBA)" expression is the following: In the children level thereis a cDB key that has the same values as the pBA key in the parent. They do not need to have the same namesbut they must have the same values.

Here, ptagB is a tag of the parent and ctagC, ctagD and ctagE are tags of the children.

• If a parent can have one child only, you must define the following structure of mapping for this child:

<Mapping element="ctagF" inPort="noF" key="cFA" keyFromParent="pBA(cFA)"fieldsAs="coptF" fieldsAsExcept="cFD" fieldsIgnore="cFE">

Some other mapping can be here. Its structure must be similar to this one.

</Mapping>

In this case, one parent can have at most one child. This is the reason why the keyFromParent expressioncan be found here.

The meaning of the keyFromParent="pBA(cFA)" expression is the following: In the parent level thereis a pBA key that has the same values as the cFA key in the child. They do not need to have the same namesbut they must have the same values.

Thus, you will continue to the nested tree structure of parent and child:


<Mapping element="ctagF" inPort="noF" key="cFA" keyFromParent="pBA(cFA)"theotherattributesF>

(Deeper levels of XML fields expressed in a similar way.)

</Mapping>

Writers

166

</Mapping>

But, if you want to create two different unique children of one parent at the same level of XML file, you mustdo it by creating a serie of tag pairs like this:

<Mapping element="ctagF" inPort="noF" key="cFA" keyFromParent="pBA(cFA)"theotherattributesF>

</Mapping>

<Mapping element="ctagG" inPort="noG" key="cGA" keyFromParent="pBA(cGA)"theotherattributesG>

</Mapping>

Note that the ceGA in the key of the child must have the same values as the pBA(cGA) in thekeyFromParent (which is the key of the parent).

Finally, you can also combine unique children with different children at the same level of XML file.


<Mapping element="ctagH" inPort="noH" key="cHA" keyFromParent="pBA(cHA)"theotherattributesH>

</Mapping>

<Mapping element="ctagJ" inPort="noJ" key="cJA" keyToParent="cJB(pBA)"theotherattributesJ>

</Mapping>

</Mapping>

• This way you can define mapping for parent and its children or parent and its child.

Now we will describe what the fieldsAs, fieldsAsExcept and fieldsIgnore expressions mean.

When you select this component, you must also decide whether some values should be parsed as tags or as at-tributes. If you want them to be attributes, you must set up fieldsAs="attributes", otherwise it is setto elements by default. These options (attributes or elements) are the mentioned optA and optB inthe fieldsAs="coptA" and fieldsAs="coptB", respectively. By the fieldsAsExcept="cYZ"you are specifying a list of fields that should be processed as the other value. If fieldsAs is set to ele-ments, all of the fields from the list in the cYZ will be processed as attributes and vice versa.

If you do not want to save some fields independently on whether it should be from children or parents (theymay have been already mentioned in other places -parents or children), you can make a list of them separatedby semicolons and quoted by double quotes and put it as the value of the fieldsIgnore attribute. This wayyou can hide the listed fields in the selected element (parent or child/children) that you do not want to save.

Note also that if you have customers and orders among your input fields, you can setelement="customers" and element="orders". Or you can use other names. These are only names,you can change them to whatever other name. But, if you have selected the mentioned names, you will obtainin the output XML file the <customers>, </customers>, <orders> and </orders> tags.

Of course, ports are numbered starting from 0. Thus, you can start from inPort="0".

• As you can see above, if you have the following two expressions:

Writers

167

key="pAB" ... (in the parent),

key="cCD" keyFromParent="pAB(cCD)" (in the child),

the mentioned keyFromParent contains a sequence of field names from the parent contained in the parentkey and (at the same time) the values of these parent fields equal to the values of the fields from the childkey. Thus, pAB and cCD are different sequences of field names (from parent and child, respectively), buttheir values are the same.

And, the parent can have only one child of this type (with the same ctagC).

• As you can see above, if you have the following expressions:

key="pAB" ... (in the parent)

key="cCD" keyToParent="cCD(pAB)" (in the child)

The mentioned keyToParent contains a sequence of field names from the child key and (at the same time)the values of these child fields equal to the values of the fields from the parent key. Thus, pAB and cCD aredifferent sequences of field names (from parent and child, respectively), but their values are the same.

And, the parent can have more children of this type (with the same ctagC).

JMSWriter


It receives data records and sends out Java messages. The component implements the DataRecord2JmsMsginterface.

Once you have created the connection, you only need to specify it in the component (JMS connection).

To create such a connection, you must first specify all of the following: name of the connection, Initial contextfactory class, available libraries, URL, Connection factory JNDI name, Destination JNDI, User and Password.(For more information about how to create JMS connection, see corresponding section.)

You can also define the processing transformation by specifying one of the following three attributes: Processorclass, Processor code or Processor URL. (Processor class is a path and a file name of some class, jar or zip filelocated outside the graph. Processor code is the transformation defined in the graph itself with the help of theJava language. Processor URL is a path and a file name of some file written in Java language.)

If you want to define the Processor class attribute, you must click its item row, after which a button appears there,and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (See Section"Open Type Wizard" for more information.)

If you want to define the Processor code attribute, you must click its item row, after which a button appears, and,when you click this button, an Edit value wizard opens. In this wizard you can define the transformation in Javalanguage. (See Section "Edit Value Wizard" for more information.)

If you want to define the Processor URL attribute, you must click its item row, after which a button appears,and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section"Locating Files with URL File Dialog" for more information.)

Of course, you can also specify what character type (Charset) should be used when reading from external sources.

Writers

168


LDAPWriter

This component has one input port and one optional output port. If the optional output port is created, rejectedrecords are sent to it. Therefore, metadata on this optional output port must be the same as the metadata on theinput port. But, you cannot propagate them through the component. You must select them separately.

The component writes information to LDAP directory. It provides the logic for updating the information on theLDAP directory.

The metadata on the input must precisely match the LDAP object attribute name. The Distinguished Name meta-data attribute is required. As the LDAP attributes are multivalued, their values can be separated by pipe. For thisreason only strings can be correctly handled. Thus, the only metadata type supported is string.

When you select this component, you must specify to which directory the information should be written (LdapURL). Its form corresponds to the following pattern: ldap://hostname:portnumber.

You can also decide whether some entry should be added (Add entry) or removed (Remove entry) and whethersome attribute should be replaced (Replace attributes) or removed (Remove attributes). This can be done bydefining the value of the Action attribute.

Sometimes you may need to log in with the help of your user name (User) and your password (Password). Yourusername can be similar to the following: cn=john.smith,dc=example,dc=com.


169

Chapter 17. TransformersThese components have both input and output ports. They can put together more data flows with the same metadata,split one data flow into more flows, intersect two data flows (even with different metadata on inputs) and evenmake more complicated transformations of data flows.

Metadata can be propagated through some of these transformers, whereas the same is not possible in such compo-nents that transform data flows in a more complicated manner. You must have the output metadata defined priorto configuring these components.

Some of these transformers use transformations that have been described above.

Copying, Filtering and SortingThese components have one input port and at least one output port. They send data to one or more output ports(SimpleCopy, this is the only one of these components that can change metadata but even in this component itis not necessary), filter the data (Dedup or ExtFilter) according to some criterion or sort it (ExtSort) accordingto the selected key.

SimpleCopy

This component has one input port and at least one output port. Whenever you connect an edge to any output port,a new output port is created. It is possible to propagate metadata through this component.

This component does not necessarily change metadata. But, it can change them if you want. You can transformfixed length metadata to delimited metadata and viceversa. But the number of fields and their data types mustbe preserved.

It copies all of the records that enter the component, sending them to all of the output ports. Thus, if you want tocreate more identical data flows, you can do it using SimpleCopy component.

When you select this component, you can change the phase of parsing data (Phase), set the visual name locatedon the component (Component name) and enable/disable the component (Enable).

SpeedLimiter

This component has one input port and at least one output port. Whenever you connect an edge to any output port,a new output port is created.

(This component still belongs to the Others group in the palette of components, but it is similar to SimpleCopyexcept for the Delay attribute. For this reason it is described in "Transformers" Section of this manual.)

You can propagate metadata through this component. It does not change metadata. However, the output metadatamust at least have nearly the same structure as the input metadata (the number of fields, data types and sizes).Metadata name and even the field names may differ.

It can delay the data flow on its way through the component. It delays every record by the same value. Thus, thetotal execution time depends on the number of records going through the SpeedLimiter.

Transformers

170

When you select this component, you must specify this delay (Delay). The delay should be expressed in millisec-onds.

Also in case of this component, you can change the phase of parsing data (Phase), set the visual name located onthe component (Component name) and enable/disable the component (Enable).

ExtSort


ExtSort does not change metadata. Metadata can be propagated through the component.

It sorts all of the records according to some selected key (combination of field names). When you specify the keyfield names, order of the selected names is of importance.

You can select the key field names using the Edit key dialog. Click the row of the Sort key attribute. After that, abutton appears. When you click this button, an Edit key dialog opens. There you can select the fields that shouldcreate the Sort key attribute.

Select the fields you want and drag and drop all the selected fields to the pane on the right. (You can also usethe Arrow buttons.) The highest field name has the highest sorting priority. Then the sorting priority goes downgradually towards the end of the list of the selected field names. The lowest field name has the lowest sortingpriority.

When you click the OK button, the selected field names will turn to the sequence of the same field names separatedby semicolon. This sequence can be seen in the Sort key attribute row. (In this sequence, the highest sortingpriority has the first field name of the sequence. The priority goes down towards the end of the sequence.)

When you select this component, you must define both the key for sorting (Sort key) and specify the order forsorting (Sort order, to be Ascending or Descending). Click the button that appears in the Sort key attribute rowand define the Sort key by clicking the arrows buttons or dragging and dropping. When you select any item inthe Field pane on the left and move it to the Key column in the Key parts pane on the right., the default Sortorder (Ascending) appears in the corresponding column.

Figure 17.1. Defining Sort Key and Sort Order

The resulting Sort key is a sequence of field names and an a or a d letter in parentheses separated by semicolon.It can look like this: FieldM(a);...FieldN(d).

Transformers

171

You can specify the buffer capacity for sorting records in memory (Buffer capacity), select the directory fortemporary files that should be created (Temp directories URL), the number of temporary files (Number of tapes,an even number greater than two), specify the initial capacity for sorting (Sorter initial capacity, number ofrecords). The Number of tapes attribute is set to 6 by default. The order for sorting is Ascending by default. Itis sufficient to select the order by specifying its initial letter. The temporary directories are specified as a list ofnames separated by semicolon.


Dedup

This component has one input port and one or two output ports. The optional second output port can be used forrejected data if it is connected to some other component.

The component does not change metadata. Metadata can be propagated through the component.

It serves to remove duplicate records. You must select some key (combination of field names) according to whichthe records should be considered duplicate. It is very important that the input records be sorted according to theselected key. Otherwise, duplicate records within only one adjacent group would be removed whereas the othergroups with the same key would be considered as a distinct group. Thus, it is necessary to sort the records accordingto the selected key before the duplicate records should be removed.

When you select this component, you must specify the key for deduplicating (Dedup key) (a combination offield names separated from each other by semicolon), decide which duplicate records should remain on the output(Keep) (the options are First, Last, Unique) along with the Number of duplicates attribute defining the amountof records that should be sent out, you can also decide whether two or more records with some dedup fields beingnull should be considered equal (Equal NULL). This attribute is set to true by default.

If you set the Keep attribute to First and the Number of duplicates to 5, at most five records starting from thebeginning will be sent to the first output port. If you set the Keep attribute to Last and the Number of duplicatesto 10, at most ten records from the end will be sent to the first output port. If you choose Unique, the Number ofduplicates is ignored because only unique records are sent to the first output port. The rejected data will be sentto the optional second output port if there is a component connected to it. The first records are those that shouldbe kept by default. It is sufficient to specify any of the options by the first letter of the selected option.


ExtFilter

This component has one input port and one or two output ports. The optional second output port can be used forrejected data if it is connected to some other component.

It filters records according to a logical expression. It sends all of the records corresponding to the filter expressionto the first output port and all of the rejected records to the second port if it is connected.

This component does not change metadata. Metadata can be propagated through the component.

When you select this component, you must specify the expression according to which the filtering should be per-formed (Filter expression). The filtering expression consists of some number of subexpressions connected with

Transformers

172

logical operators (logical and and logical or) and parentheses for expressing precedence. For these subexpres-sions there exists a set of functions that can be used and set of comparison operators (greater than, greater than orequal to, less than, less than or equal to, equal to, not equal to). The latter can be selected in the Filter editor wizardas the mathematical comparison signs (>, >=, <, <=, ==, !=) or also their textual abbreviations can be used (.gt.,.ge., .lt., .le., .eq., .ne.). All of the record field values must be expressed by their name preceded bydollar sign. For example, $employeeid.


Concatenating, Gathering and MergingThese components put together the records incoming through various input ports with equivalent metadata, sendingthe result to the output port while preserving metadata structure.

Concatenate

This component has at least one input port and a common output port. Whenever you connect any edge to anyinput port, a new input port is created.

All input ports must receive data with the same metadata structure, however, there is no need to use only onemetadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either of thetwo metadata must have identical structure (the number of fields, field names, data types and sizes). Only metadataname may differ. The component does not change metadata structure.

It receives all of the records that enter the component, mixturing them (if the component has various input ports)and sending the result to the common output port while preserving metadata on the output port.

First, the component receives all of the records incoming through the first input port, sends all of them to thecommon output port and, subsequently, adds to them all of the records incoming through the next input port. Ifthe component has more input ports than two, the records are received and sent to the output according to theorder of the input ports.

If some of the input ports contains no records, such port is skipped.

When the last input port is reached and all of its records have been received and sent to the output port, the processterminates.


SimpleGather

This component has at least one input port and a common output port. Whenever you connect an edge to any inputport, a new input port is created.

All input ports must receive data with the same metadata structure, however, there is no need to use only onemetadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either ofthe two metadata must have nearly identical structure (the number of fields, and data types). Metadata name andeven the field names may differ. The component does not change metadata structure.

Transformers

173

It receives all of the records that enter the component, mixturing them (if the component has various input ports)and sending the result to the common output port while preserving metadata on the output port.

First, the component receives only one record incoming through the first input port, sends it to the common outputport and, subsequently, adds to it only one record incoming through the next input port. If the component has moreinput ports than two, all records are received from the input ports cyclically, one by one, going through all of theinput ports in the ascending order of their numbers, and are sent to the common output port.

When the last input port is reached, the process returns to the first input port or to the first input port through whichthe component can still receive some records.

If some of the input ports contains no more records, such port is skipped.

When the component receives the last record and sends it to the common output port, the process terminates.


Merge

This component has at least two input ports and a common output port.

All input ports must receive data with the same metadata structure, however, there is no need to use only onemetadata for the two input ports, nevertheless, if you want to use two metadata for the two input ports, either of thetwo metadata must have identical structure (the number of fields, field names, data types and sizes). Only metadataname may differ. The component does not change metadata structure.

It receives all of the records that enter the component, mixturing and sending the result to the common output portwhile preserving metadata on the output port.

This component must receive only sorted records that enter its input ports and that are sorted according to somekey (Merge key), sending the result to the common output port in a sorted manner according to the same key.The order of selecting field names is of importance.

If some of the input ports contains no records, such port is skipped.

When the component reads the last record and sends it to the common output port, the process terminates.


Transformers

174

Partitioning and IntersectionThese components receive the data that enter through one or two input ports distributing the flow among moreoutputs (Partition or DataIntersection components).

Partition


This component evaluates all incoming records according to some specified criterion, splits this data flow anddistributes different records among different output ports. To part the data flow, you can define some data fieldranges, some exact values, etc.

For example, the records in which some defined date is before the specified day are sent to one output port, theother records are sent to another output. You can also split the incoming data flow depending on the alphabeticalorder of names, etc. You can specify any combination of field value ranges or other definitions expressed byPartition key, Ranges or some partitioning defined by Partition class, Partition code or Partition URL.

Remember that this component does not require any partitioning definition if Ranges or Partition key are defined.

The component does not change metadata. It is possible to propagate metadata through this component.

When you select this component, you must define the way how the incoming data flow should be parted and therecords should be distributed among different output ports.

You can do it in the following ways:

• If you define any of the three attributes that can specify the way how the incoming data flow should be partedand the records should be distributed among the output ports (Partition class, Partition or Partition URLattributes), such partitioning transformation will be used. In this case, you do not need to define the Partitionkey and/or the Ranges attributes. If you define some Partition key and/or Ranges, these attributes will beignored. (Partition class is a path and a file name of some class, jar or zip file located outside the graph.Partition is the transformation defined in the graph itself with the help of the Java language or the internalClover transformation language. Partition URL is a path and a file name of some file written in Java or in theinternal Clover transformation language.)

• If you do not define any partitioning transformation but (at the same time) you define both the Partition key(some sequence of the fields separated by semicolon) and the Ranges (ranges of values of the key fields)attributes, the records will be distributed among all of the output ports depending on the values of the key fields.The records in which the values of the fields are inside the same range will be sent to the same output port. Thenumber of the output port corresponds to the order of the range within all values of the fields. The ranges mustbe defined with the help of pairs of values separated by comma and surrounded by braces. Round parenthesesor angle brackets mean that the boundary value is excluded/included from/in the range, respectively.

• If you do not define any partitioning transformation but (at the same time) you only define the Partition keywithout defining the Ranges, hash value will be calculated and used to part data flow and distribute the recordsamong all of the output ports.

• If you do not define any partitioning transformation but (at the same time) you only define the Ranges withoutdefining the Partition key, RoundRobin algorithm will be used to part data flow and distribute the recordsamong all of the output ports.

Transformers

175

Figure 17.2. Ranges Editor

(Partition class is a path and a file name of some class, jar or zip file located outside the graph. Partition is thetransformation defined in the graph itself with the help of the Java language or the internal Clover transformationlanguage. Partition URL is a path and a file name of some file written in Java or in the internal Clover transfor-mation language.)

If you want to define the Partition class attribute, you must click its item row, after which a button appears there,and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (See Section"Open Type Wizard" for more information.)

If you want to define the Partition attribute, you must click its item row, after which a button appears there, and,when you click this button, a Transform editor opens. There you can define the transformation by writing it inClover transformation language or Java language.

If you want to define the Partition URL attribute, you must click its item row, after which a button appears there,and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section"Locating Files with URL File Dialog" for more information.)

You may also want to set the character type (Charset) that should be used when reading transformation definitionfrom external Partition URL.

And you may want to decide whether some local specific rules should be used (Use internationalization andLocale attributes). The first attribute is set to false by default and the second is set to the system value by default.


Transformers

176

Transformations

Here is an example of how the Source tab for defining the transformation looks.

Figure 17.3. Source Tab of the Transform Editor in the Partition Component

If you want to define some partitioning transformation using Clover transformation language, independently onwhether it is contained in the graph itself (Partition) or in some file outside the graph (Partition URL), youmust use the following function: getOutputPort(). This function is required. It returns integers. It does nottransform the records, it does not change them nor remove them, it only assigns numbers to individual records.These numbers means the output ports to which individual records should be sent.

The function can be defined in the Transform editor using if or switch statements or using any other wayof selecting output ports:

function getOutputPort() { if (condition0) return 0 else if (condition1) return 1 ... else if (conditionN) return N}

Above you have used if statements.

You can also use the switch statement to decide and select the numbers of output ports.

function getOutputPort() { switch (expression) { case(expression0):return0 case(expression1):return1 ... case(expressionN):return N [default:return N+1] }

Transformers

177

}

You must define the conditions or expressions mentioned above. They allow to select which output port shouldbe assigned to every individual record. Remember that the ports are numbered starting from 0.

For example, you can define the following partitioning transformation:

function getOutputPort() {if ($temperature > 0) return 0else return 1}

In addition to this required function, you can define another function: init(). If you want to declare and initializesome variables, if you want to anything what should be done at the beginning of data processing by the component,you should do it within the init() function. This function is called only once. The init() function is calledat the beginning. Unlike the init() function, the required getOutputPort() function is called many times.It is called after init().

You can open the transformation definition as a third (or higher) tab of the graph (in addition to the Graph andSource tabs of Graph Editor) by clicking corresponding button at the upper right corner of the tab.

Once you have written your transformation, you can also convert it to Java language code by clicking correspond-ing button at the upper right corner of the tab.

DataIntersection

This component has two input ports and three output ports.

Metadata structure of incoming data records can differ in both input ports. Data records incoming through differentinput ports can even have different number of fields. However, some part of them (Join key and Slave overridekey) must be comparable.

The component does not change metadata structure on the first and third output ports, but it is not possible topropagate metadata through this component to these ports. Metadata on the second output port may be different.You must first create metadata on the second output port according to the desired result or select some prepared.Only then you can define the transformation.

When you select this component, you must specify the key according to which the records from both input portsshould be compared and intersected (Join key). It can be defined as a sequence of field names separated bysemicolon.

The records that enter only through the first input port will be sent to the first output port, the records that enteronly through the second input port will be sent to the third output port. Those records that enter through boththe first and the second input ports are sent to the second output port. Remember that the records are comparedaccording to the fields that are used to define Join key only. It is of no importance if the other fields are different.

You do not need to have the same key field names in both input ports, you may want to use some different fieldnames for the second input port (Slave override key). This can be done by clicking the Slave override keyattribute row. After it, an Override key wizard opens. On it, you can see two panes: Slave fields pane on theleft and Master key pane on the right. You can select some of the fields from the Master key pane by clicking,push the left mouse button, drag to the Slave fields pane and release the button. This way you can assign anyof the fields from the Join key on the first input port to the corresponding fields on the second input port. Youcan also use the buttons on the wizard if you want to make Auto pairing or reset some/all assignment you havemade (reset and reset all buttons).

Transformers

178

You must also decide whether the records with duplicate key values should also be used (Allow key duplicates).This attribute is set to false by default, the duplicate records are not allowed. By default these records are discardedand only the last of them is sent to the transform() function.

You must also specify some transformation by defining one of the following three attributes: Transform class,Transform or Transform URL. (Transform class is a path and a file name of some class, jar or zip file locatedoutside the graph. Transform is the transformation defined in the graph itself with the help of the Java languageor the internal Clover transformation language. Transform URL is a path and a file name of some file written inJava or in the internal Clover transformation language.)

If you want to define the Transform class attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (SeeSection "Open Type Wizard" for more information.)

If you want to define the Transform attribute, you must click its item row, after which a button appears there,and, when you click this button, a Transform editor opens. There you can define the transformation by definingeasy transformation mapping, or writing the transformation in Clover transformation language or Java language.

If you want to define the Transform URL attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (For moreinformation see Section "Locating Files with URL File Dialog" above.)

You must also decide whether two or more records with some fields being null should be considered equal (EqualNULL). This attribute is set to true by default.

You may also want to set the character type (Charset) that should be used when reading transformation definitionfrom external Transform URL.


Transformations


Figure 17.4. Source Tab of the Transform Editor in the DataIntersection Component

Transformers

179

If you want to define some intersection using Clover transformation language, independently on whether it iscontained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must usethe following function: transform(). This function is required. It maps intersected data records to the secondoutput port.

The records incoming through the first input port are sent to the first output port if the Join key values of theserecords are not contained in the Slave override key of the records incoming through the second input port. Therecords incoming through the second input port are sent to the third output port if the Slave override key valuesof these records are not contained in the Join key of the records incoming through the first input port. But thedifference is that you must create new records for the second output port to which records with the same Join keyand Slave override key values should be sent. In other words, you must define some mapping of metadata onboth input ports to the second output port. Remember that the ports are numbered starting from 0.

To define the transformation, you must have the output metadata defined.

In addition to this required function, you can define other two functions: init() and finished(). If youwant to declare and initialize some variables, if you want to anything what should be done at the beginning ofdata processing by the component, you should do it within the init() function. If you want to free memory,delete some temporary files, you should define it within the finished() function. Either of these functions iscalled only once. The init() function is called at the beginning, the finished() function is called at the end.Unlike them, the required transform() function is called many times. It is called after init() and beforefinished().



Pure TransformersThese components transform data flowing through them and change metadata on the output ports.

KeyGenerator

This component has one input port and one output port.

It is used before the ApproximativeJoin component. The newly created key serves as the Matching key in theApproximativeJoin component.

It changes metadata by creating a new additional field named key and adding this key to the end of the outgoingrecords. You must first create metadata on the component output by hand according to the desired result. Thesemetadata are the same as the metadata on the input with one added field of defined data type at the end.

When you select this component, you must specify the field name(s) from which the key should be generated(Matching key) and also decide for all of the selected fields the characters, their number and the way that should beused to generate the key. The Matching key attribute has the form of a sequence of specified expressions separatedby semicolon. Each of these expressions has the following form: fieldname [number][parameters].

When you want to specify the properties of the key, you must click the Matching key attribute row, then a buttonappears. By clicking this button, an Edit key wizard opens. You can see the following three panes on it: Fields,Key parts and Matching key properties. First, you must select some field from the Fields pane by clicking itsitem, then you need to click the Right arrow button, after which the selected field name transfers to the Key partspane. This way you must transfer all the fields that should generate the key from the Fields pane to the Key parts

Transformers

180

pane. When you click any of the fields in the Key parts pane, you can see the possible options in the Matchingkey properties pane. There you must decide the following:

You must specify the number of characters from the selected field that should be used in the generated Matchingkey. This must be done by typing the desired number in the number of letters to create key textarea.

If you want to use only alphabetic or numeric characters in the generated Matching key, you must check theAlpha/numeric check box. After it, you will be able to decide whether you want to use only alphabetic characters(this will be marked as the a letter in the parameters sequence), only numeric characters (marked as the n letter)or both of them (an). This can be done by checking the corresponding checkboxes.

By checking the remove blank space and/or strip diacritic checkboxes you can decide whether the white spaces(s) and the diacritical signs (d), respectively, should be removed when generating the key. The result can bestandard Latin letters, etc.

If you want to change the case of alphabetic characters in the generated Matching key, you must check the Casecheckbox and choose what case of characters selected from the field should be used in the key - either upper case(u), or lower case (l). You must choose the desired case by checking its corresponding radio button.

As the result of such selection, you will obtain the sequence of expressions separated by semicolon as was men-tioned above. In it, the number meaning the amount of characters that should be used in the generated Matchingkey will be used as the number. As the parameters, the letters mentioned above will be used.

For example, if you want to use only two characters from the customer field, both alphabetic and numeric,remove blank spaces and diacritic signs and change the case of alphabetic characters to upper case and for theorder field only three alphabetic and numeric upper-case characters, you will obtain the following sequence ofexpressions: customer 2ansdu; order 3anu.


Aggregate


It changes metadata by aggregating groups of records on the input port, applying some of the provided functionsto the whole group and creating new records on the output port.

You must first create metadata on the component output according to the desired result or select some prepared.Only then you can create the transformation.

You must define the aggregation key (Aggregate key). It is a sequence of field names separated by semicolon.The records with the same value of the key are considered to be the group to which a function from the providedlist should be applied. But it is not necessary to sort the data before this component.

You may want to specify whether the data on the input is sorted or not (Sorted input). This attribute is set totrue by default.

You can also decide whether two or more records with some fields being null should be considered equal (EqualNULL). This attribute is set to false by default.

You must also specify either the Aggregation mapping or the Old aggregation mapping attributes.

The latter attribute must be specified in the release 2.1 of CloverEngine or older. It must be defined by hand. Itworks with a new version too, but its use is deprecated now.

The first attribute must be used in newer releases of CloverEngine.

Transformers

181

When you click the Aggregation mapping attribute row, an Aggregation mapping wizard opens. In it, you mustdefine both the mapping and the aggregation. The wizard consists of two panes. You can see the Input field paneon the left and the Aggregation mapping pane on the right. You must select some input fields in the left paneand map them to the output field names in the right pane. You can do it by clicking the selected item from theleft pane, holding down the left mouse button, dragging to the Mapping column in the right pane at the row ofthe desired output field name and releasing the button. After that, the selected input field appears in the Mappingcolumn. This way you can map all the desired input fields to the output fields. In addition to it, you must clicksome row in the Function column and select some function from the provided list. This can be repeated untilyou define all of the desired functions. These functions will be applied to all records of each group and the resultwill be sent to the output.

You may also want to set the character type that should be used in the data flow (Charset).


Reformat


In principle, the component preserves the number of records contained in the data flow on its way from the inputport to the output port(s). It can change number of fields. It can change format of date data type, concatenate somefields, reorder them, split some fields, cut off some parts of data, change letter cases, convert different data typesfrom one type to another or replace some field values by some other identification. This component can do manycomplicated things using a defined transformation.

It changes metadata. You must first create metadata on the component output(s) by hand according to the desiredtransformation or select some prepared. Only then you can create the transformation. Different outputs can evenhave different metadata.

When you select this component, you must specify the way how the records should be reformated on their waythroughout the component (Transform class, Transform or Transform URL attributes). (Transform class is apath and a file name of some class, jar or zip file located outside the graph. Transform is the transformation definedin the graph itself with the help of the Java language or the internal Clover transformation language. TransformURL is a path and a file name of some file written in Java or in the internal Clover transformation language.)

The transformation must implement the RecordTransform interface or inherit from the DataRecordTransformsuperclass. In the last case you only need to implement the transform() method.



If you want to define the Transform URL attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (SeeSection "Locating Files with URL File Dialog" for more information.)

You may also want to set the character type (Charset) that should be used when reading from external TransformURL.

Transformers

182


Transformations


Figure 17.5. Source Tab of the Transform Editor in the Reformat Component

If you want to define some transformation using Clover transformation language, independently on whether it iscontained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must imple-ment the transform() function in the following way (for example):

function transform() { $0.Name := $0.fname+" "+$0.lname; $0.address := $0.address;}

Above you are assigning input field values to output field values. This transformation changes the format ofrecords. The transform() function is required. It maps inputs to outputs.





Transformers

183

Denormalizer


It changes metadata by composing various records on the input port into one record on the output port. You mustfirst create metadata on the component output by hand according to the desired result or select some prepared.Only then you can create the transformation.

The component receives records whose metadata are not convenient for some purposes. User may want to changemetadata, put together various records of one data flow into one new records whose metadata will differ fromthose on the input. Different fields can be transferred to different new records.

For example, if you have a data flow consisting of records collected during some amount of years in which airtemperature and pressure during the twelve months of every individual year are stored as two fields with the yearand month as the other two fields, you can put together these records into other data flow in which every recordwill contain the information about temperature and pressure for the whole year. Thus, you will obtain recordswith 25 fields. The information about every individual month will be expressed only by order of some data fieldwithin the 25 fields of one record. The first field can contain information about a year and the other 24 contain theinformation about air temperature and pressure for 12 months. Thus, the amount of records will be twelve timesless and number of fields will be 25 instead of 4 only. The information about some individual month is expressedby the order of the field in a record. The months can be ordered from January to December starting from the secondfield to 25th field of the record. The counterpart of this process is normalization.

This component must receive data that is sorted according to some specified key. For this reason, when youselect this component, you must specify such key (Key, a sequence of field names separated from each other bysemicolon) and the order of the incoming data (Sort order).

You can create the Key with the help of the Edit key wizard.

The Sort order attribute can be set to Ascending, Descending, Auto or Ignore. You can select the desired valueby clicking the Sort order attribute row and choosing from the presented list. Since the Denormalizer workswith the groups of records equally if their Keys are equal and (at the same time) if the records of each group aregrouped together, it is important that the incoming records be ordered according to such a Key. The Sort ordercan be ascending or descending (Ascending or Descending values). You may also want that Clover itself makeautodetection of the order of the incoming data. In this case the Sort order attribute must be set to Auto. Youcan set this attribute to Ignore as well. But remember that if the records are not ordered according to the Key onthe input port, the records with some Key value that are not grouped together are parsed as if they were differentgroups of records.

In addition to all this, you must specify the desired transformation by defining one of the following attributes:Denormalize class, Denormalize or Denormalize URL. (Denormalize class is a path and a file name of someclass, jar or zip file located outside the graph. Denormalize is the transformation defined in the graph itself withthe help of the Java language or the internal Clover transformation language. Denormalize URL is a path and afile name of some file written in Java or in the internal Clover transformation language.)

If you want to define the Denormalize class attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (SeeSection "Open Type Wizard" for more information.)

If you want to define the Denormalize attribute, you must click its item row, after which a button appears there,and, when you click this button, a Transform editor opens. There you can define the transformation by writingit in Clover transformation language or Java language.

If you want to define the Denormalize URL attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (SeeSection "Locating Files with URL File Dialog" for more information.)

Transformers

184

You may also want to set the character type (Charset) that should be used when reading from external Denor-malize URL.


Transformations


Figure 17.6. Source Tab of the Transform Editor in the Denormalizer Component

If you want to define some denormalization using Clover transformation language, independently on whether itis contained in the graph itself (Denormalize) or in some file outside the graph (Denormalize URL), you mustdo it in the following way:

When you want to define some transformation from input to output, you must use the following two functions:addInputRecord() and getOutputRecord(). These functions are required.

First of all, you must declare some variables. You can also initialize them.

Then you must assign the values of the fields of incoming records to these variables. This must be done withinthe addInputRecord() function. The purpose of this function is to remember the group of input records. Thiscan be done using some variables as described above or in some other way.

Finally, you only need to assign the set of the defined variables to the output fields. This must be done within thegetOutputRecord() function.

You need to assign the field values to variables using the equal sign (variable = $inputfield) where-as assignment of variables to the field values must be done using together the colon and equal sign ($output-field := variable).

To define the denormalization, you must have the output metadata defined.

See the following example:

int yearA;string monthB;

Transformers

185

int temperaturejan;int pressurejan;int temperaturefeb;int pressurefeb;

function addInputRecord() { yearA = $year; if ($month == "January") { temperaturejan = $temperature; pressurejan = $pressure; } if ($month == "February") { temperaturefeb = $temperature; pressurefeb = $pressure; }}

function getOutputRecord() { $field1 := yearA; $field2 := temperaturejan; $field3 := pressurejan; $field4 := temperaturefeb; $field5 := pressurefeb;}

In addition to these required functions, you can define other three functions: init(), finished() andclean(). If you want to declare and initialize some variables, if you want to anything what should be doneat the beginning of data processing by the component, you should do it within the init() function. (Thus itwould be better if the variables above were declared and initialized within the init() function.) If you wantto free memory, delete some temporary files, you should define it within the finished() function. Either ofthese functions is called only once. The init() function is called at the beginning, the finished() functionis called at the end. Unlike them, the required addInputRecord() and getOutputRecord() functions arecalled many times. They are called after init() and before finished(). However, if you want to reset valuesof some variables and/or delete some temporary files between parsing groups of records with different key values,you can do it within the clean() function. It is called many times, but once after parsing each group of recordsand sending the resulting outgoing record to output port.



Normalizer


It changes metadata by decomposing each record on the input port into various records on the output port. Youmust first create metadata on the component output by hand according to the desired result. Only then you candefine the transformation.

The component receives records whose metadata are not convenient for some purposes. User may want to changemetadata, split one record into more new records whose metadata will differ from those on the input. Differentfields can be transferred to different new records.

Transformers

186

For example, if you have a data flow consisting of records collected during some amount of years in which airtemperature and pressure during the twelve months of every individual year are stored with the year as anotherfield, you can split these records into other data flow in which every record will contain the information abouttemperature and pressure for one month only. The information about the month can be contained in a new field.Thus you will have records with 4 fields. Two describe information about the year and month, the other two containthe information about air temperature and pressure. Thus, the amount of records will be twelve times greater andnumber of fields will be only 4 instead of 25. The counterpart of this process is denormalization.

When you select this component, you must specify the desired transformation by defining one of the followingthree attributes: Normalize class, Normalize or Normalize URL. (Normalize class is a path and a file name ofsome class, jar or zip file located outside the graph. Normalize is the transformation defined in the graph itselfwith the help of the Java language or the internal Clover transformation language. Normalize URL is a path anda file name of some file written in Java or in the internal Clover transformation language.)

If you want to define the Normalize class attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired type. (SeeSection "Open Type Wizard" for more information.)

If you want to define the Normalize attribute, you must click its item row, after which a button appears there, and,when you click this button, a Transform editor opens. There you can define the transformation by writing it inClover transformation language or Java language.

If you want to define the Normalize URL attribute, you must click its item row, after which a button appears there,and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section"Locating Files with URL File Dialog" for more information.)

You may also want to set the character type (Charset) that should be used when reading from external NormalizeURL.


Transformations


Figure 17.7. Source Tab of the Transform Editor in the Normalizer Component

Transformers

187

If you want to define some normalization using Clover transformation language, independently on whether it iscontained in the graph itself (Normalize) or in some file outside the graph (Normalize URL), you must do it inthe following way:

When you want to define some transformation from input to output, you must use the following two functions:count() and transform(idx). These functions are required.

The count() function is a simple function defined in the transformation code in the following way:

function count () { return N;}

Here, N is the number of records to which each of the incoming records must be split. The function only gives riseto the index that serves to define how many outgoing records should be created from one incoming record.

The transform(idx) function accepts all values of this index (idx) and defines some mapping from input tooutput. The index has N integer values starting from 0 to N-1.

If you want to use some variables in your code, you must declare them first. If you define some mapping, it mustbe at the end of any of the functions or at the end of the whole program. In principle, at the very end of anyclosed block.

Inside the transform(idx) function, you must define the transformation using index values. You can use ifor switch statements to select what should be done with all of the individual parts of the incoming records.

To define the normalization, you must have the output metadata defined.

See the following example:

function map1() { $year := $Field1; $month := "January"; $temperature := $Field2; $pressure := $Field3;}

function map2() { $year := $Field1; $month := "February"; $temperature := $Field4; $pressure := $Field5;}

function transform(idx) { switch (idx) { case 0:map1(); case 1:map2(); }}

function count() { return 2;}

In addition to these required functions (count() and transform()), you can define other three functions:init(), finished() and clean(). If you want to declare and initialize some variables, if you want to any-thing what should be done at the beginning of data processing by the component, you should do it within theinit() function. (Thus it would be better if the variables above were declared and initialized within the init()

Transformers

188

function.) If you want to free memory, delete some temporary files, you should define it within the finished()function. Either of these functions is called only once. The init() function is called at the beginning, the fin-ished() function is called at the end. Unlike them, the required count() and transform() functions arecalled many times. They are called after init() and before finished(). However, if you want to reset valuesof some variables and/or delete some temporary files between parsing individual incoming records, you can do itwithin the clean() function. It is called many times, but once after parsing each incoming record and sendingthe resulting group of outgoing records to output port.



XSLTransformer


It can transform incoming data records based on the specified Xslt or Xslt file attributes. The first of them mustbe edited in the Edit value wizard, the second must be specified in the File URL dialog.

You can also define some Mapping using the following wizard.

Figure 17.8. XSLT Mapping

Assign the input fields from the Input fields pane on the left to the output fields by dragging and dropping themin the Input field column of the right pane. Select which of them should be transformed by setting the Transformdata option to true. By default, fields are not transformed.

The resulting Mapping can look like this:

Figure 17.9. An Example of Mapping

You may also want to set the character type (Charset) that should be used when reading the Xslt file.


189

Chapter 18. JoinersThese components have both input and output ports. They serve to put together the records with a different meta-data (including with a different number of fields) according to the specified key and the transformation. They canjoin the records incoming through more input ports and they can also join the records incoming through input portswith those from lookup table and/or database table. Metadata cannot be propagated through these components.You must first select the right metadata or create them by hand according to the desired result. Only then you candefine the transformation. For some of the output edges you can also select the metadata on the input, but neitherthese metadata can be propagated through the component. These components use some transformations that aredescribed in the section concerning transformers.

Join TypesThese components can work under the following three processing modes:

Inner JoinIn this processing mode, only the driver records which correspond to some slave record(s) are processed.

Left Outer JoinIn this processing mode, also the driver records with no corresponding slave are processed.

Full Outer JoinIn this processing mode, the transformation method is also called for the slave records without a correspondingdriver.

Joining ComponentsThe following five components serve to join data flow with different metadata: ApproximativeJoin, Ex-tHashJoin, ExtMergeJoin, LookupJoin and DBJoin.

In each component some transformation must be defined. You can use external .class or .java files or writethe transformations in the graph itself using the Transform editor.

Joiners

190

Transformations


Figure 18.1. Source Tab of the Transform Editor in Joiners

In all of these Joiners you can define some transformation. You can do it using easy transformation mapping,Clover transformation language or Java language.

If you want to define some transformation using Clover transformation language, independently on whether it iscontained in the graph itself (Transform) or in some file outside the graph (Transform URL), you must use thetransform() function. This function is required.

Within this function you must define the selection of records based on driver and slave key values and createmapping of incoming records (and records from lookup table or database) to the output port to which recordssharing the same key values are sent.





Joiners

191

ApproximativeJoin

This component has two input ports and between two and four output ports. The third and fourth output ports areoptional. They do not need to be connected. If the third and the fourth output ports are connected, they serve tosend out the records incoming through the first and the second input ports, respectively. The first input port servesas a driver, the second input port serves as a slave.

The component does not need to receive the same metadata on the two input ports. They do not even need to havethe same number of fields. But, if the third output port is connected, it has the same metadata as the first inputport. The fourth output port has the same metadata as the second input port too. Nevertheless, neither of the inputmetadata can be propagated through the component to the third and fourth output edges. The metadata on the firstand second output ports differ. You must first create metadata on the first and second output ports according tothe desired result or select some prepared. Only then you can define the transformation.

Metadata on the first and second output ports can contain two additional fields of numeric data type. Their namesmust be the following: "_total_conformity_" and "_keyName_conformity_". In the last field name,you must use one of the fields of the Join key attribute as the keyName in the last of the two mentioned additionalfield names. To these additional fields the values of computed conformity will be written.

To the first and second output ports, the joined records with greater and smaller conformity, respectively, willbe sent.

The component receives the records incoming through the input ports, reads them and (for each driver record)looks up the corresponding slave record. If such a slave record is not found, the driver record is sent out throughthe third output port. After that, the component computes a conformity of the driver and slave pair. The pairs ofdriver and slave whose conformity is greater than specified limit are joined and sent out through the first outputport. The pairs whose conformity is smaller but with the same matching key value are joined and sent out throughthe second output port. Finally, the slave records without a driver are sent out through the fourth output port. (Theconformity is computed as the Levenstein distance.)

You must define the key that should be used to join the records (Join key). You can define the key with the help ofthe Join key wizard. When you open the Join key wizard, you can see two tabs: Master key tab and Slave key tab.

Figure 18.2. Join Key Wizard (Master Key Tab)

Joiners

192

In the Master key tab, you must select the driver (master) fields in the Fields pane on the left and drag and dropthem to the Master key pane on the right. (You can also use the buttons.)

Figure 18.3. Join Key Wizard (Slave Key Tab)

In the Slave key tab, you can see the Fields pane (containing all slave fields) on the left and the Key mappingpane on the right.

You must select some of these slave fields and drag and drop them to the Slave key field column at the right fromthe Master key field column (containing the master fields selected in the Master key tab in the first step). Inaddition to these two columns, there are other six columns that should be defined: Maximum changes, Weight andthe last four representing strength of comparison.

The maximum changes property contains the integer number that equals to the number of letters that should bechanged so as to convert one data value to another value. The maximum changes property serves to compute theconformity. The conformity between two strings is 0, if more letter must be changed so as to convert one stringto the other.

The weight property defines the weight of the field in computing the similarity. Weight of each field difference iscomputed as the quotient of the weight defined by user and the sum of the weights defined by user.

The strength of comparison can be identical, tertiary, secondary or primary.

If it is identical, only identical letters are considered equal.

If it is tertiary, upper and lower case letters are considered equal.

If it is secondary, diacritic letters and their Latin equivalents are considered equal.

If it is primary, letters with additional features like as peduncle, pink, circle and their Latin equivalents are con-sidered equal.

You can change any boolean value by simply clicking. This switches true to false and viceversa. You can alsochange any numeric value by clicking and typing the desired value.

When you click OK, you will obtain a sequence of assignments of driver (master) fields and slave fields precededby dollar sign and separated by semicolon. Each slave field is followed by parentheses containing six mentionedparameters separated by white spaces. The sequence will look like this:

$driver_field1=$slave_field1(parameters);...;$driver_fieldN=$slave_fieldN(parameters)

Joiners

193

Figure 18.4. An Example of the Join Key Attribute in ApproximativeJoin Component

(When you create the Join key using the wizard, a semicolon is also added to the end of the sequence. However,this last semicolon is optional and can be omitted.)

You must also define the matching key for comparing driver and slave records (Matching key).

The Matching key need to be generated before the ApproximativeJoin component using the KeyGeneratorcomponents.

You can define the Matching key using the Matching key wizard. You only need to select the desired master(driver) field in the Master key pane on the left and drag and drop it to the Master key pane on the right in theMaster key tab. (You can also use the buttons.)

Figure 18.5. Matching Key Wizard (Master Key Tab)

In the Slave key tab, you must select one of the slave fields in the Fields pane on the left and drag and drop it tothe Slave key field column at the right from the Master key field column (containing the master field selectedin the Master key tab) in the Key mapping pane.

Figure 18.6. Matching Key Wizard (Slave Key Tab)

Joiners

194

The result is a mapping expression of the following form: $driver_field=$slave_field. It can also befollowed by semicolon and hash, but these two signs are optional and can be omitted.

As the two input ports do not need to receive the same metadata, maybe their fields bear different names and youmay also want to specify the join key fields for the slave records (Slave override key) and the matching fieldnames for these slaves (Slave override matching key). If you want to define these keys, you must select thecorresponding attribute row, click it, after that a button appears. When you click this button, an Edit key wizardopens. In this wizard you can define corresponding keys. However, this is deprecated now.

You must also define the limit of conformity (Conformity limit (0,1)). The defined value distributes the incomingrecords according to their conformity. The conformity can be greater or smaller. You must define transformationsfor either of these two groups. The records with smaller conformity can be marked as "suspicious".

For the records with greater conformity you must specify some transformation by defining one of the followingthree attributes: Transform class, Transform or Transform URL.

For the records with smaller conformity (suspicious) you must also specify some transformation by defining oneof the following three attributes: Transform class for suspicious, Transform for suspicious or Transform URLfor suspicious.

(Transform class is a path and a file name of some class, jar or zip file located outside the graph. Transformis the transformation defined in the graph itself with the help of the Java language or the internal Clover transfor-mation language. Transform URL is a path and a file name of some file written in Java or in the internal Clovertransformation language.)




You must do the same for the suspicious group.



ExtHashJoin

This component has at least two input ports and only one output port. Whenever the second or higher order inputport is connected, a new input port is created. The first input port serves as a driver, the other input port(s) serveas slave(s).

This component does not need to receive the same metadata on the driver and the slave input ports. They do noteven need to have the same number of fields. Nor the slave ports need to receive the same metadata. And, no input

Joiners

195

metadata can be propagated through the component to the output edge. You must first create metadata of the outputedge by hand according to the desired result or select some prepared. Only then you can define the transformation.

The component first receives the records incoming through the slave input ports, reads them and creates hashtables from these records. These hash tables must be sufficiently small. After that, for each driver record incomingthrough the driver input port the component looks up the corresponding records in these hash tables. For everyslave input port one hash table is created. The records on the input ports do not need to be sorted. If such record(s)are found, the tuple of the driver record and the slave record(s) from the hash tables are sent to transformationclass. The transform method is called for each tuple of the driver and its corresponding slave records.

You can select the join type (Join type attribute). You can choose one of the following three options: Inner join,Left outer join, Full outer join. The default value is Inner join.

If this attribute is set to Inner join (the default processing mode), only the driver records for which all slave recordsexist are processed. If it is set to Left outer join, also the driver records with no slave are processed. If the attributeis set to Full outer join, the transformation method is called also for the slave records without a driver record.

You must also decide whether the slave records with duplicate key values should also be used to create the hashtable (Allow slave duplicates). This attribute is set to false by default, the duplicate records are not allowed. Bydefault these records are discarded and only the last of them is used for join.

You may also want to change the number of records that can be stored in one hash table (Hash table size attribute).The default size is 512. If there are more records than 512, they can be parsed, however, such table must berehashed, which slows down the whole process.

The incoming records do not need to be sorted, but the initialization of the hash tables is time consuming and itmay be good to specify how many records can be stored in hash tables. If you decide to specify this attribute, itwould be good to set it to the value slightly greater than needed. Nevertheless, for small sets of records it is notnecessary to change the default value.

You must define the key that should be used to join the records (Join key). You can define the Join key by typingor in the Hash Join key wizard.

The Join key attribute is a sequence of mapping expressions for all slaves, each of them is followed by hash. Thelast hash is optional, it can be omitted. Each mapping expression is a sequence of field names from driver andslave records (in this order) put together using equal sign followed by semicolon. The last semicolon is optional,it can be omitted.

Figure 18.7. An Example of the Join Key Attribute in ExtHashJoin Component

Order of these mappings must correspond to the order of the slave input ports. If some of these mappings is emptyor missing for some of the slave input ports, the mapping of the first slave input port is used instead.

Each of these mappings is a sequence of matchings separated by colon, semicolon orpipe. For example: driver_field1=slave_field1|driver_field2=slave_field2|...|driver_fieldN=slave_fieldN. If some slave_fieldj is missing (in other words, if the subexpres-sion looks like this: driver_fieldj=), it is supposed to be the same as the driver_fieldj. If somedriver_fieldk is missing, driver_fieldk from the first mapping is used instead. (You can use semi-colons instead of pipes which are shown above.)

Driver (Master) key can be different for different slaves.

You can also use the mentioned Hash Join key wizard. When you click the Join key attribute row, a buttonappears in this row. By clicking this button you can open the mentioned wizard.

Joiners

196

Figure 18.8. Hash Join Key Wizard

In it, you can see the tabs for all of the slave input ports. In each tab there are two panes. The Master fields paneon the left and the Key mapping pane on the right. In the left pane you can see the list of driver field names. In theright pane you can see two columns: Slave key field and Master key field mapped. The left column contains thefield names of the corresponding slave input port. If you want to map some driver field to some slave field, youmust select the driver field in the left pane by clicking its item, and by pushing the left mouse button, draggingto the Master key field mapped column in the right pane and releasing the button you can transfer the driverfield to this column. The same must be done for each slave. Note that you can also use the Auto mapping buttonor other buttons in each tab.

(If you create the Join key using the Hash Join key wizard, this wizard also adds semicolon and hash to the endof the mentioned mappings. See example above. It also adds dollar sign before either field name. Note that thelast semicolon and the last hash are optional, they can be omitted.)





Up to release 2.4 you had to be sure that your transformation could process even null records. From release 2.5that is not necessary any more. Now each null record is substituted by a special null record for which all ofthe getValue methods return null instead of throwing exception. If you want to take some action on a nullrecord, you can compare it to NullRecord.NULL_RECORD.


Joiners

197


ExtMergeJoin

This component has at least two input ports and only one output port. Whenever the second or higher order inputport is connected, a new input port is created. The first input port serves as a driver, the other input port(s) serveas slave(s).

The metadata on the driver and slave port(s) do not need to be the same. They do not even need to have the samenumber of fields. But, this component must receive the same metadata on all of the slave input ports. No inputmetadata can be propagated through the component to the output edge. You must first create metadata of the outputedge according to the desired result or select some prepared. Only then you can define the transformation.

The component receives the records incoming through the driver (master) and slave input ports and reads them.(The incoming records must be sorted according to the specified key.) After that, for each driver record incomingthrough the driver input port the component looks up the corresponding slave records. If such record(s) is(are)found, the driver record along with the slave record(s) are sent to transformation class. The transform method iscalled for each combination of the driver and the corresponding slave record(s). The component joins the recordsaccording to the specified Join key, transforms them and sends them to the output port.

You can also select the join type (Join type attribute). You can choose one of the following three options: Innerjoin, Left outer join, Full outer join. The default value is Inner join.

If this attribute is set to Inner join (the default processing mode), only the driver records for which all slave recordsexist are processed. If it is set to Left outer join, also the driver records with no slave are processed. If the attributeis set to Full outer join, the transformation method is called also for the slave records without a driver record.

You must also decide whether the slave records with duplicate key values should also be used (Allow slaveduplicates). This attribute is set to false by default, the duplicate records are not allowed. By default these recordsare discarded and only the last of them is used for join.

You must define the key that should be used to join the records (Join key). The records on the input ports mustbe sorted according to the corresponding parts of the Join key attribute. You can define the Join key by typingor in the Join key wizard.

The Join key attribute is a sequence of individual key expressions for the driver and all of the slaves followed byhash. The last hash is optional, it can be omitted. Order of these expressions must correspond to the order of theinput ports. Driver (master) key is a sequence of driver (master) field names (each of them should be preceded bydollar sign) separated by colon, semicolon or pipe. Each slave key is a sequence of slave field names (first of themshould be preceded by dollar sign) separated by colon, semicolon or pipe.

You can also use the mentioned Join key wizard. When you click the Join key attribute row, a button appearsthere. By clicking this button you can open the mentioned wizard.

In it, you can see the tab for the driver (Master key tab) and the tabs for all of the slave input ports (Slave key tabs).

(If you create the Join key using the Join key wizard, this wizard also adds semicolon and hash to the end of thementioned sequences. The last semicolon and the last hash are optional, they can be omitted.)

Joiners

198

Figure 18.9. Join Key Wizard (Master Key Tab)

In the driver tab there are two panes. The Fields pane on the left and the Master key pane on the right. You canselect the driver expression by selecting the fields in the Fields pane on the left and moving them to the Masterkey pane on the right with the help of the Right arrow button.

Figure 18.10. Join Key Wizard (Slave Key Tab)

In each of the slave tab(s) there are two panes. The Fields pane on the left and the Key mapping pane on the right.In the left pane you can see the list of the slave field names. In the right pane you can see two columns: Masterkey field and Slave key field. The left column contains the selected field names of the driver input port. If youwant to map some driver field to some slave field, you must select the slave field in the left pane by clicking itsitem, and by pushing the left mouse button, dragging to the Slave key field column in the right pane and releasingthe button you can transfer the slave field to this column. The same must be done for each slave. Note that youcan also use the Auto mapping button or other buttons in each tab.

Driver (Master) key must be unique for all slaves.

You must also specify some transformation by defining one of the following three attributes: Transform class,Transform or Transform URL. (Transform class is a path and a file name of some class, jar or zip file located

Joiners

199

outside the graph. Transform is the transformation defined in the graph itself with the help of the Java languageor the internal Clover transformation language. Transform URL is a path and a file name of some file written inJava or in the internal Clover transformation language.)




Up to release 2.4 you had to be sure that your transformation could process even null records. From release 2.5that is not necessary any more. Now each null record is substituted by a special null record for which all ofthe getValue methods return null instead of throwing exception. If you want to take some action on a nullrecord, you can compare it to NullRecord.NULL_RECORD.



LookupJoin

This component has one input port and one or two output ports. The second output port is optional. It does notneed to be connected.

The metadata on the input port and that of the lookup table do not need to be the same. They do not even need tohave the same number of fields. Some of the records incoming through the first input port can be sent out throughthe second optional output port if it is connected. Thus, the first input port and the second output port have thesame metadata. Nevertheless, the metadata on the input port cannot be propagated through the component to thisoutput edge. But, the metadata of the first input edge need only to be selected for the metadata of the second outputedge. The metadata of the first output edge must be created according to the desired result or you must select someprepared. Only then you can define the transformation.

The component receives the data through the input port (driver) and from the lookup table (slave). After that,for each driver record incoming through the input port the component looks up the corresponding slave recordsfrom the lookup table. If such record(s) is(are) found, the driver record along with the slave record(s) are sentto transformation class. The transform method is called for each pair of the driver and the corresponding slaverecord. The component joins the records according to the specified Join key, transforms them and sends them tothe first output port. Each driver record with no slave can be sent to the optional second output port if the port isconnected to some other component. Only if the component is switched to the left outer join mode, none of thedriver records can be sent to the optional output port because they are all processed.

When you select this component, you must first specify the lookup table that should be used as the resource ofslave records (Lookup table). You must also decide whether the data stored in memory should be lost after theprocess finishes (Free lookup table after finishing). It is set to false by default.

You must define the key that should be used to join the records (Join key). It is a sequence of field names fromthe input metadata separated by semicolon. You can define the key with the help of the Edit key wizard.

Joiners

200

Figure 18.11. Edit Key Wizard


When you define the transformation, data records that are get from lookup table are considered as if they wereincoming through the port 1 (which is virtual).




You can also change the join type (Left outer join attribute). You can select either left outer join (true) or innerjoin (false). The default value of this attribute is false.

By default the component uses the inner join type. It joins the records incoming through the input port with therecords from the lookup table, but only in case they have the same key value. The records incoming through theinput port that have the key value different from the values contained in the lookup table are not joined. Suchincoming records can be sent to the second optional output port if it is connected. If the second optional port isnot connected, the component discards the driver records that have no corresponding slave.

If you switch to the left outer join, even the driver records with no slave record are processed and none of themcan be sent to the second optional output port.



Joiners

201

DBJoin

This component has one input port and one or two output ports. The second output port is optional. It does notneed to be connected.

The metadata on the input port and that of the database table do not need to be the same. They do not even needto have the same number of fields. When the second output port is connected, it can receive some of the recordsincoming through the first input port. Thus, the first input port and the second output port have the same metadata.Nevertheless, the metadata on the input port cannot be propagated through the component to this output edge.But, the metadata of the first input edge need only to be selected for the metadata of the second output edge. Themetadata of the first output edge must be created by hand according to the desired result or you must select someprepared. Only then you can define the transformation.

The component receives the data through the input port (driver) and from the database (slave). After that, foreach driver record incoming through the input port the component looks up the corresponding slave records fromthe database table. If such record(s) is(are) found, the driver record along with the slave record(s) are sent totransformation class. The transform method is called for each pair of the driver and the corresponding slave record.The component joins the records according to the specified Join key and sends them out through the first outputport. Each driver record with no slave can be sent to the optional second output port if an edge is connected tothis port. Only if the component is switched to the left outer join mode, none of the driver records can be sent tothe optional output port because they are all processed.

When you select this component, you must first specify the database connection that should be used to connectto the database (DB connection). The component uses to connect to the database some JDBC driver. You mustalso define the query that should be sent to the database (SQL query). The database table serves as dynamic DBlookup table and the resource of slave records. You may also want to specify the metadata of the database table(DB Metadata). If you select no metadata, the component will get metadata with the help of the query.

You must define the key that should be used to join the records (Join key). It is a sequence of field names fromthe input metadata separated by semicolon. You can define the key with the help of the same Edit key wizardlike in the LookupJoin component (see above).


When you define the transformation, data records that are loaded from databases are considered as if they wereincoming through the port 1 (which is virtual).

If you want to define the Transform class attribute, you must click its item row, after which a button appearsthere, and, when you click this button, an Open Type wizard opens. In it, you must specify the desired file. (SeeSection "Open Type Wizard" for more information.)



If no transformation is defined, only the records from the database table (slaves) are sent to the output port. Butonly such slaves to which there exists some corresponding driver.

Joiners

202

You can also change the join type (Left outer join attribute). You can select either left outer join (true) or innerjoin (false). The default value of this attribute is false.

By default the component uses the inner join type. It joins the records incoming through the input port with therecords from the database table, but only in case they have the same key value. The records incoming throughthe input port that have the key value that differs from the values contained in the database table are not joined.Such incoming records can be sent to the second output port if it is connected. If the second optional port is notconnected, the component discards the driver records that have no corresponding slave.

If you switch to the left outer join, even the driver records wit no slave record are processed and none of themcan be sent to the second output port.

You may change the number of sets of records from database with different key values that can be stored inmemory (Cache size). The default value is 100.



203

Chapter 19. Other ComponentsThese components serve to fulfil some tasks that has not been mentioned already. We will describe them now.

Executing ComponentsThese components executes some system, Java or database commands or run Clover graphs.

SystemExecute

This component has one optional input port and one optional output port. The purpose of these ports is describedbelow in this section.

When you select this component, you can connect either of the two ports but (at the same time) you do not needto connect any of them. Either port can have different metadata. You must create both metadata by hand or selectsome prepared. Metadata cannot be propagated through the component.

In case you do not connect the output port and you want to get some output, you need to specify the file to whichdata should be written (Output file URL). In such a case, you must decide whether the data should be appendedto the output file (Append) or whether the file should be replaced by a new one. The value is false by default("do not append data, replace it").

You must specify some system commands that should be executed (System command). These can be definedwith the help of the Edit value wizard. If the Command interpreter attribute is defined, the commands are savedto a temporary file and this file is executed by the interpreter as a script.

You can also set the number of error lines (Number of error lines) that should be printed to the output file ifsome errors happen.

You may also want to define which command interpreter should be used (Command interpreter). This attributemust have the following form: interpretername [parameters] ${} [parameters]. If the commandinterpreter is defined, system commands are written to a temporary file and executed as a script by the interpreter.In such a case, the component replaces this ${} expression by the name of this script file.

If the command requires some data, it can be sent to the component through the optional input port from someother component. In such a case, the input port must be connected.


JavaExecute

This component has neither input port nor output port.

When you select this component, you must specify what should be executed.

You must define one of the following three attributes: Runnable class, Runnable or Runnable URL. (Runnableclass is a path and a file name of some class, jar or zip file located outside the graph. Runnable is the transformation

Other Components

204

defined in the graph itself with the help of the Java language. Runnable URL is a path and a file name of somefile written in Java language.)

If you want to define the Runnable class attribute, you must click its item row, after which a button appears there,and, when you click this button, an Open Type wizard opens. In it, you must specify the desired file. (See Section"Open Type Wizard" for more information.)

If you want to define the Runnable attribute, you must click its item row, after which a button appears there, and,when you click this button, an Edit value wizard opens. In this wizard you can define the transformation in Javalanguage. (See Section "Edit Value Wizard" for more information.)

If you want to define the Runnable URL attribute, you must click its item row, after which a button appears there,and, when you click this button, an URL File Dialog opens. In it, you can locate the desired file. (See Section"Locating Files with URL File Dialog" for more information.)

You can also specify some properties that should be used when executing the Java command (Properties).

You may also want to set the character type (Charset) that should be used when reading from external RunnableURL.


DBExecute

This component has one optional input port and one optional output port. The purpose of these ports is describedbelow in this section.

When you select this component, you can connect either of the two ports but (at the same time) you do not needto connect any of them. Either port can have different metadata. You must create both metadata by hand or selectsome prepared. Metadata cannot be propagated through the component.

When you select this component, remember that it uses a JDBC driver to connect to the database. Thus, you mustcreate the database connection and specify the corresponding attribute: DB connection.

You must also specify some SQL query. You can do it in one of the following two ways: As the SQL queryattribute or as the Query URL attribute. In other words, the query can be contained either in the graph itself oroutside the graph as a file containing the query. If you define both attributes, only SQL query will be applied.

In both cases you can decide whether the SQL commands should be sent to stdout (Print statements). Thisattribute is set to false by default.

If you set the SQL query attribute, you can also define the statement delimiter for the query (SQL statementdelimiter). The default delimiter is semicolon. A query may consist of statements separated from each other bysemicolon. These statements will be executed one by one.

You must decide whether the query should be executed in transaction (Transaction set). This attribute has thefollowing three possible values: One statement, One set of statements, All statements. The default value is Onestatement. (The releases of CloverEngine older than 2.4 have two possible boolean values of this attribute. Thedefault value for the older releases is false.) Remember that some database system does not support transactions.

• If the value of the attribute is One statement, commit is performed after each query execution.

• If the value of the attribute is One set of statements, all statements are executed for each input record. Commit isperformed after a set of statements. For this reason, if any error occurred during the execution of any statement,all statements would be rolled back for such a record.

Other Components

205

• If the value of the attribute is All statements, commit is performed after all statements only. For this reason,if any error occurred, all operations would be rolled back.

You may want to specify whether the query should be treated as a stored procedure/function calls that would usethe JDBC CallableStatement (Call as stored procedure). This attribute is set to false by default. If you switchthis attribute to true, you may have to define at least one of the following two series of parameters: Query inputparameters and Query output parameters.

• To call the stored procedure/function with input parameters, you must connect an edge to the input port, assignit some metadata fields and define which fields should be used as such parameters. They must be expressedwith the help of the Query input parameters attribute.

This attribute must be of the form: 1:=$inField1;...;n:=$inFieldN since the parameters are num-bered starting from 1. This way the mentioned input field names are mapped to the input parameters of the query.

• To call the stored procedure/function with returned value and/or output parameters, you must connect an edgeto the output port, assign it the necessary metadata fields and define to what fields such value(s) or parameter(s)should be mapped. They must be expressed with the help of the Query output parameters attribute.

This attribute must be of the form: 1:=$outField1;...;n:=$outFieldN since the parameters are num-bered starting from 1. This way the returned value and the mentioned output parameters of the query are mappedto the output field names. Returned value is the first output parameter. If command returns some set of output,you need to specify the sequence of output metadata field names separated from each other by semicolon (Re-sult set output fields).

You may also want to set the character type (Charset) that should be used when reading from external QueryURL.


RunGraph

This component has one optional input port and two optional output ports. When you select this component, youcan connect either port but (at the same time) you do not need to connect any of them. There are two ways how toconnect the ports. They depend on the configuring the component. If both output ports are connected, they havethe same metadata. The metadata structure is described in this section. If the first input port and the first outputport are connected, they have different metadata.

Either port can have different metadata. You must create all metadata by hand or select some prepared. Metadatacannot be propagated through the component. The metadata on the ports must have the structure as describedbelow in this section.

This component serves to execute any of the prepared Clover graphs.

When you select this component, you must define the graph that should be executed in the following two ways:

One way of doing it is to set the Graph URL attribute. In this case, you do not need to connect the input port butthe two output ports must be connected. The metadata on the output ports must have the structure as describedbelow. The Graph URL attribute is a path and a file that can be defined with the help of an URL File Dialog. Theinformation on whether the execution of the specified graph was successful or not is sent to them. The informationabout successful execution is sent to the first output port, whereas the information about the fail is sent to thesecond output port. The metadata of the edges that are connected to these output ports must have the following fivefields: graph, result, description, message, duration. The first four are of the string data type, thelast is of the decimal data type. (These metadata must be created by hand. See the corresponding section above.)

Other Components

206

The graph field contains the path and the name of the graph that was executed. The result field contains oneof the following: Finished OK, Aborted or Error. The description field contains a detailed descriptionif the graph fails. The message field is a string value of org.jetel.graph.Result. The duration fieldcontains the time of the graph execution in milliseconds.

The other way is to connect the input port through which the component will receive the data whose first fieldis of the string data type containing the path and the file name of the graph that should be executed. The secondinput field is optional. If it is defined and used, it contains a string Clover command line argument. If the firstinput port is connected, only the first output port needs to be connected as well. The information on whether theexecution of the specified graph was successful or not is sent to this output port. The metadata on the output portmust have the same structure as described above.

If any of the graphs specified in the input port fails, the Ignore graph fail attribute decides whether the executionwill continue or not. This attribute is set to false by default. By default, the execution stops if any of the graphs fails.

When you select this component you must also decide which JVM should execute the specified graph (The sameJVM). The value of this attribute is set to true by default. The graph is executed in the same instance of JVM bydefault. But, if you set this attribute to false, you can define the following attributes in addition to those mentionedalready:

First, you must define some other JVM (Alternative JVM command line). It is java by default. You may alsowant to define the name of the main class that will execute the specified graph (Graph execution class). Thedefault value of this attribute is org.jetel.main.runGraph. You can also specify some arguments of thecommand line (Command line arguments). You can also decide whether you want to log the result of the processto a file (Log file URL). If you want to log the process, you must specify the path and the file name of the logfile by using an URL File Dialog. In addition to it, you can specify that the log information should be appended(Append to log file). This value is set to false by default ("do not append, replace the file"). The information isonly string about the execution of the graph and about whether it was successful or not.


Non-Executing ComponentsThe following three components do not execute any tasks, but they do some other work.

CheckForeignKey

This component has two input ports and one or two output ports.

When you select this component, you must connect both input ports and at least one output port. The second outputport is optional. But you can connect this port if you want.

The metadata on both input ports can be different. The metadata on the output(s) can be the same as those on thefirst input port. They must at least have the same metadata structure (the number of fields, data types and sizes).Field names may differ. Nevertheless, metadata cannot be propagated through this component.

The component receives two data flows (the primary and the foreign). The foreign data flow is connected tothe first input port and the primary data flow is connected to the second input port. The keys of both flows arecompared. If some value of the foreign key is not found among the values of the primary key, default value isgiven to the foreign key instead of its invalid value. Then all of the foreign records are sent to the first output portwith the new foreign key values and the original foreign records with invalid foreign key values can be sent tothe optional second output port if it is connected.

Other Components

207

When you select this component, you must specify the foreign key (Foreign key).

In older versions of Clover you had to specify both the primary and the foreign keys using the Primary key andthe Foreign key attributes, respectively. They had the form of a sequence of field names separated from each otherby semicolon. However, the use of Primary key is deprecated from now.

The Foreign key is a sequence of individual assignments separated from each other by semicolon. Each of theseindividual assignments looks like this: $foreignField=$primaryKey. Even the last individual assignmentis followed by semicolon and hash, however, these terminal characters are optional and can be omitted.

To define Foreign key, you must select the desired fields in the Foreign key tab of the Foreign key definitionwizard. Select the fields from the Fields pane on the left and move them to the Foreign key pane on the right.

Figure 19.1. Foreign Key Definition Wizard (Foreign Key Tab)

When you switch to the Primary key tab, you will see that the selected foreign fields appeared in the Foreignkey column of the Foreign key definition pane.

Figure 19.2. Foreign Key Definition Wizard (Primary Key Tab)

You only need to select some primary fields from the left pane and move them to the Primary key column ofthe Foreign key definition pane on the right.

Other Components

208

Figure 19.3. Foreign Key Definition Wizard (Foreign and Primary Keys Assigned)

You must also define the default foreign key values (Default foreign key). This key is also a sequence of valuesof corresponding data types separated from each other by semicolon. The number and data types must correspondto metadata of the foreign key.

If you want to define the default foreign key values, you must click the Default foreign key attribute row andtype the default values of all fields.

You may also want to set the Hash table size attribute to some value. By default it is 512. Remember that thisvalue should be greater than the number of unique primary key values.


LookupTableReaderWriter


You can connect the input port or the output port(s) alone or both the input and the output port(s) at the same time.This component can be used as a writer, a reader or both reader and writer at the same time.

When it is used as a reader, it reads data from the lookup table and sends it to the connected output edge(s). Theinput port is not connected.

When it is used as a writer, it reads data from the connected input edge and writes it to the lookup table. Theoutput port is not connected.

When it is used both as a reader and a writer, it receives data from the connected input edge, updates the lookuptable, reads all data from the lookup table and sends it out through the output port(s). Both the input port and theoutput port(s) are connected.

Remember that metadata of the lookup table can be the same as the metadata of the edge(s). Both the lookup tableand the edge(s) must at least have nearly the same metadata structure (the number of fields, data types and sizes).Metadata name and even the field names may differ.

Other Components

209

When you select this component, you must specify the name of the lookup table (Lookup table). You must alsodecide whether the data stored only in memory should be lost after the process finishes (Free lookup table afterfinishing). It is set to false by default.


210

Chapter 20. DeprecatedThis category includes some older components whose use is deprecated now. We suggest you do not use them.However, four of these eight components have been used until recently. For this reason we describe them in thischapter: DelimitedDataReader, FixLenDataReader, DelimitedDataWriter and FixLenDataWriter.

The other four were removed to Deprecated category longer ago and we do not describe them here. They areSort, Filter, HashJoin and MergeJoin.

You should use UniversalDataReader instead of DelimitedDataReader and FixLenDataReader and also Uni-versalDataWriter instead of DelimitedDataWriter and FixLenDataWriter.

Flat File ReadersThe following two components (DelimitedDataReader and FixLenDataReader) read data from flat files eitherwith delimited, or fixed length metadata only. Delimiters and sizes are defined in metadata.

These file readers can also receive data through their optional input port.

DelimitedDataReader

The use of this component is deprecated, we suggest you use UniversalDataReader instead.

This component has one optional input port and at least one output port. Whenever you connect an edge to anyoutput port, a new output port is created. You can extract metadata from a flat file.


This component reads data from flat files in which both fields and records are separated from each other by socalled delimiters (a character or a sequence of characters). This is the reason why individual record fields must notcontain the same sequences as their parts. If a delimiter were contained in some fields, such fields would be splitinto parts or cut off because their inner part, their leading or trailing ends would be considered to be delimiter.When you are configuring this component, you must specify these delimiters of fields and records.

If you want to put a delimiter into some field, you can do it if you surround this field value by single or doublequotes. This way such delimiter can be located inside a field value.

You must also decide which file should be read (File URL), what character type is used in these records (Charset).Some files do not contain the names of the fields, whereas other files have them on the first line. In the latter case,you must set the Skip first line property to true. By default it is false. You can select how many records should beread from the file (Max number of records), otherwise the reader would read and send out all records. You canalso specify what to do in case of some incorrect records (Data policy). If you switch to controlled data policy,you can log information about errors. In this component the log information is sent to stdout.

A limited number of rows may only be header describing data and not data itself. In such a case, you must set theSkip rows attribute to the number of rows that must be skipped.

Sometimes there are white spaces between a field and delimiter, in such a case you may want to set the Trimstrings attribute to true - white spaces will be removed.

Deprecated

211



FixLenDataReader

The use of this component is deprecated, we suggest you use UniversalDataReader instead.

This component has one optional input port and at least one output port. Whenever you connect an edge to anyoutput port, a new output port is created. You can extract metadata from a flat file.


This component reads data from flat files in which all fields have exactly defined sizes. When you are configuringthis component, you must specify how many characters belong to each individual field of the records.

You must also decide which file should be read (File URL), what character type is used in these records (Charset).If the file contains the names of the fields on the first line, you must skip this line by setting the Skip first lineproperty to true. You can select how many records should be read from the file (Max number of records),otherwise the reader would read and send out all records. You can also specify what to do in case of some incorrectand/or empty fields (Data policy). If you switch to controlled data policy, you can log information about errors.In this component the log information is sent to stdout.

In addition to the attributes mentioned above, in this type of component, you also need to define the following:

Sometimes a limited number of rows is only header describing data and not data itself. In such a case, you mustset the Skip rows attribute to the number of rows that must be skipped.

The Byte mode attribute is set to false by default. You can switch to Byte mode by selecting or typing true inthis component wizard. After that, byte buffer will be used for data parsing. Otherwise, char buffer will be used.It can be effective only for byte or cbyte data type.

You must also define whether white spaces in the leading and/or trailing ends of the fields should be skipped (Skipleading blanks and/or Skip trailing blanks, respectively). Both of these attributes are true by default.

You also need to decide whether empty records should be skipped. You must set the Skip empty attribute to trueif you want. It is false by default.

Remember that if you select the Byte mode attribute to be true, the properties Skip leading blanks, Skip trailingblanks and/or Skip empty have no effect on the process of data parsing.

Some records may be incomplete. In such a case, you must decide whether you want to have such records parsedor not. If you want these records to be parsed, you must set the Enable incomplete attribute to true. By default,it is true for char mode and false for byte mode.

You may also want to Trim strings.

Deprecated

212

Remember that a delimiter can be set in metadata even here. But now it is used only to delimit individual records.Otherwise, you would have to specify how many fields are contained in one record instead of specifying the recorddelimiter.



Flat File WritersThe following two components (DelimitedDataWriter and FixLenDataWriter) writes data to flat files eitherwith delimited, or fixed length metadata only. Delimiters and sizes are defined in metadata.

These file writers can also send data out through their optional output port.

DelimitedDataWriter

The use of this component is deprecated, we suggest you use UniversalDataWriter instead.


This component writes data to flat files in which both fields and records are separated from each other by socalled delimiters (a character or a sequence of characters). This is the reason why individual record fields mustnot contain the same sequences as their parts. If a delimiter were contained in some fields, such fields would besubsequently (on their reading) split into parts or cut off because their inner part, their leading or trailing endswould be considered to be delimiter. When you are configuring this component, you must specify these delimitersof fields and records.

When you select this component, you must specify the file(s) to which data should be written (File URL).





You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. If you did not specify theseattributes, DelimitedDataWriter would write all incoming data records to output file(s).

You can also limit the number of records that can be contained in one file as a maximum (Records per file) and/or the file size in bytes (Bytes per file). In such a case, if you want to write incoming data records to more output

Deprecated

213

files, and not only one, you must use dollar signs in the output file base name (in File URL). This way, outputfiles will be more and data records will be written to different output files.



FixLenDataWriter

The use of this component is deprecated, we suggest you use UniversalDataWriter instead.


This component writes data to flat files in which all fields have exactly defined sizes. When you are configuringthis component, you must specify how many characters belong to each individual field of the records.

When you select this component, you must specify the file(s) to which the data should be written (File URL).


You may also want to set the character type (Charset) that should be used for encoding data that will be writtento the output file(s).

You can also specify what character should be used for padding fields (Field filler) and/or padding gaps betweenfields in output records (Record filler). Default field filler is a space, default record filler is an equal sign.

If you specify some fillers, you can decide whether data should be aligned to the left or to the right. You can setthe Left align attribute to false if you want to align data to the right. By default this attribute is set to true (datais aligned to the left).



You can also decide how many records should be skipped before writing to the output file(s) (Number of skippedrecords). It is 0 by default. You can set a limit to the Max number of records. If you did not specify theseattributes, FixLenDataWriter would write all incoming data records to output file(s).

You can also limit the number of records that can be contained in one file as a maximum (Records per file) and/or the file size in bytes (Bytes per file). In such a case, if you want to write incoming data records to more outputfiles, and not only one, you must use dollar signs in the output file base name (in File URL). This way, outputfiles will be more and data records will be written to different output files.

If you want to part the data flow and distribute the incoming records among different output files, you must definethe Partition key attribute and select the value of Partition file tag (either Number file tag or Key field tag

Deprecated

214

values). The default value of this attribute is Number file tag. If you want to give other names to these outputfiles, you must specify Partition lookup table and Partition output fields.


215

Appendix D. DefiningTransformations in JavaIn the same way as you can define transformations in Clover transformation language (see Part IV), you canalso define them in Java. If you want to write transformations in Java, you must add some jar files to build path.You must at least add the same two jar files that were added for creating metadata from dBase files. They arecloveretl.engine.jar and commons-logging.jar. These files are contained in the following folder:pathtotheeclipsefolder/eclipse/plugins/com.cloveretl.gui_2.0.0/lib/lib.

If you need to use connections, sequences and/or other tools, you must also add other appropriate jar files. SeeSection "Creating Metadata from a DBase File" for more detailed information on how to add the mentioned jars.

Part IV. Transformation Language

217

Chapter 21. Clover TransformationLanguageClover transformation language (CTL) is used to define transformation in some components. (in all Joiners,Partition, DataIntersection, Reformat, Denormalizer and Normalizer).

Program StructureEach program written in CTL must have the following structure:

ImportStatemenetsVariableDeclarationsFunctionDeclarationsStatementsMappings

Remember that the ImportStatements must be at the beginning of the program and the Mappings must beat its end. Both ImportStatements and Mappings may consist of more individual statements or mappingseither of which must be terminated by semicolon. The middle part of the program can be interspersed. Individualdeclaration of variables and functions and individual statements does not need to be in this order. But they alwaysmust use only declared variables and functions! Thus, first you need to declare variable and/or function beforeyou can use it in some statement or another declaration of variable and function.

CommentsThroughout the program you can use comments. These comments are not processed, they only serve to describewhat happens within the program.

The comments are of two types. They can be one-line comments or multiline comments. See the following twooptions:

• //This is an one-line comment.

• /* This is a multiline comment. */

ImportFirst of all, at the beginning of the program in CTL, you can import some of the existing programs in CTL. Theway how you must do it is the following:

• import 'fileURL';

• import "fileURL";

You must decide whether you want to use single or double quotes. Single quotes do not escape so called escapesequences. (For more detailes see Section "Literals" below.) For these fileURL, you must type the URL of someexisting source code file.

But remember that you must import such files at the beginning before any other declaration(s) and/or statement(s).

Clover Transformation Language

218

Data TypesIn any program, you can use some variables. Their data types can be the following:

• int

This data type serves to store integer numbers.

To store a value, 32 bits are used.

Its range is from Integer.MIN_VALUE to Integer.MAX_VALUE (according to the Java integer data type).From -2147483648 to +2147483647.

Its declaration look like this: int identifier;

The default value is 0.

If you add an l letter to the end of any integer number, you can cast it to the long data type

• long

This data type serves to store long numbers.


Its range is from Long.MIN_VALUE+1 to Long.MAX_VALUE (according to the Java long data type). From-9223372036854775807 to +9223372036854775807.

Its declaration look like this: long identifier;

The default value is 0.

Any integer number can be cast to this data type by adding an l letter to the end of the integer number.

• decimal

This data type serves to store decimal numbers with arbitrary precision.

Its declaration can look like this: decimal identifier;

or it can be: decimal (length,scale) identifier;

The default length and scale are 8 and 2, respectively.

The default values of DECIMAL_LENGTH and DECIMAL_SCALE are contained in theorg.jetel.data.defaultProperties file.

You can cast any float number to the decimal data type by adding the d letter to the end of the float number.

• number (double)

This data type serves to store double numbers.


The data type has the special following three values: NaN, Infinity, -Infinity.

Its declaration look like this: number identifier;

The default value is 0.0.


219

• string

This data type serves to store sequences of characters.

To store a string, each character is stored in 16 bits.

The declaration can look like this: string identifier;

The default value is an empty string.

• date

This data type serves to store date and time.

Its declaration look like this: date identifier;

The default value is the current date and time.

• boolean

This data type serves for values of logical expressions.

It can be either true or false.

Its declaration look like this: boolean identifier;

The default value is false.

• bytearray

This data type is an array of bytes of a length that can be up to Integer.MAX_VALUE as a maximum. Itbehaves similarly to the list data type (see below).

Its declaration can look like this: bytearray identifier;

or it can be: bytearray (size) identifier;

The default bytearray is an empty array.

• list

This data type is a container of any data type.

The list data type is indexed by integers.

Its declaration looks like this: list identifier;

The default list is an empty list.

Examples:

list list2; examplelist2[5]=123;

Assignments:

• list1=list2;

It means that both lists reference the same elements.

• list1[ ]=list2;

It adds all elements of list2 to the end of list1.


220

• list1[ ]="abc";

It adds the "abc" string to the list1 as its new last element.

• list1[ ]=NULL;

It removes the last element of the list1.

• map

This data type is a container of any data type.

The map is indexed by strings.

Its declaration looks like this: map identifier;

The default map is an empty map.

Example: map map1; map1["abc"]=true;

The assignments are similar to those valid for a list.

• record

This data type is a set of fields of data.

The structure of record is based on metadata.

Its declaration look like this: record (metadata) identifier;

Remember that metadata id must be used in record declaration. Do not use metadata name here!

The variable has not any default value.

It can be indexed by both integer numbers and strings.

LiteralsLiterals serve to write the data types mentioned above.

• Integer

These literals represent integer data type expressed in decimal form.

They can be marked using the following form: -[0-9]+ or [0-9]+. For example, -25487 or 25487.

• Octal integer

These literals represent integer data type expressed in octal form. In other words, in the base-8 numeral system.

They can be marked using the following form: 0[0-7]+. For example, 0644.

• Hexadecimal integer

These literals represent integer data type expressed in hexadecimal form. In other words, in the base-16 numeralsystem.

They can be marked using the following form: 0x[0-9A-F]+. For example, 0x2AF3.


221

• Long integer

These literals represent long data type. In other words, they represent integer numbers greater than 232.

They can be marked using the following form: [0-9]+L. For example, 956230781312312331287L.

• Number (Double)

These literals represent floating point numbers in double precision format.

They are stored in 64 bits.

They can be marked using the following form: [0-9]+.[0-9]+.

For example, 452.126 is representation of a double.

• Decimal

These literals represent decimal numbers with fixed precision.

They can be marked using the following form: [0-9]+.[0-9]+D.

For example, 235.32D is representation of a decimal.

• Double quoted string

These literals can represent strings.

They are sequences of characters surrounded by double quotes.

To express unprintable characters, you can use so called escape sequences like the following pairs: \t (tabu-lator), \n (line feed), \r (carriage return), etc. These pairs of characters are escaped to their correspondingunprintable characters.

You must not use a double quote sign inside any double quoted literal. However, if you need to use a doublequote inside, you can do it by using a double quote sign preceded by a backslash: \". This pair is escaped toa double quote character.

For example, "Hello\tworld!" is a double quoted representation of a string containing a tabulator.

• Single quoted string

These literals can represent strings.

They are sequences of characters surrounded by single quotes.

Unlike double quoted literals, they cannot express so called escape sequences.

Single quote alone must not be contained in a single quoted literal. However, it can be used if it is preceded bya backslash. This way the pair of backslash and single quote is escaped to a single quote character.

For example, 'Hello\tworld!' is a single quoted representation of a string. But the backslash andthe t letter are not converted together to a tabulator unlike the same two characters in double quoted literals.Pairs of backslash and a letter other than single quote remain pairs of characters.

• Date

These literals represent date.

They have the following form: yyyy-[M[M]]-[d[d]].


222

For example, 2008-06-12 is representation of a date.

• Datetime

These literals represent date and time.

They have the following form: yyyy-[M[M]]-[d[d]] [h[h]]:[m[m]]:[s[s]].

For example, 2008-06-12 17:21:15 is representation of a datetime.

• List of literals

These literals represent lists of other literals including lists, maps, records.

For example, ["Hello\tworld!", 9, 25.3, 2008-06-12, ['Hello\tworld!', 0x27,09]] is representation of a list of literals.

VariablesIf you define some variable, you must do it by typing data type of the variable, white space, the name of thevariable and semicolon.

Such variable can be initialized later, but it can also be initialized in the declaration itself. Of course, the value ofthe expression must be of the same data type as the variable.

Both cases of variable declaration and initialization are shown below:

• dataType variable;

...

variable=expression;

• dataType variable=expression;

OperatorsThe operators serve to create more complicated expressions within the program. They can be arithmetic, relationaland logical. The relational and logical operators serve to create expressions with resulting boolean value. Thearithmetic operators can be used in all expressions, not only the logical ones.

Arithmetic OperatorsThe following operators serve to put together values of different expressions (except those of boolean values).These signs can be used more times in one expression. In such a case, you can express priority of operations byparentheses. The result depends on the order of the expressions.

• Addition

+

The operator above serves to sum the values of two expressions.

But the addition of two boolean values or two date data types is not possible. To create a new value from twoboolean values, you must use logical operators instead.

Nevertheless, if you want to add any data type to a string, the second data type is converted to a string automat-ically and it is concatenated with the first (string) summand. But remember that the string must be on the first


223

place! Naturally, two strings can be summed in the same way. Note also that the concat() function is fasterand you should use this function instead of adding any summand to a string.

You can also add any numeric data type to a date. The result is a date in which the number of days is increasedby the whole part of the number. Again, here is also necessary to have the date on the first place.

The sum of two numeric data types depends on the order of the data types. The resulting data type is the sameas that of the first summand. The second summand is converted to the first data type automatically.

• Subtraction and Unitary minus

-

The operator serves to subtract one numeric data type from another. Again the resulting data type is the sameas that of the minuend. The subtrahend is converted to the minuend data type automatically.

But it can also serve to subtract numeric data type from a date data type. The result is a date in which the numberof days is reduced by the whole part of the subtrahend.

• Multiplication

*

The operator serves only to multiplicate two numeric data types.

• Division

/

The operator serves only to divide two numeric data types. Remember that you must not divide by zero. Divid-ing by zero throws TransformLangExecutorRuntimeException or gives Infinity (in case of anumber data type)

• Modulus

%

The operator can be used for both floating-point data types and integer data types. It returns the remainder ofdivision.

• Incrementing

++

The operator serves to increment numeric data type by one. The operator can be used for both floating-pointdata types and integer data types.

If it is used as a prefix, the number is incremented first and then it is used in the expression.

If it is used as a postfix, first, the number is used in the expression and then it is incremented.

• Decrementing

--

The operator serves to decrement numeric data type by one. The operator can be used for both floating-pointdata types and integer data types.

If it is used as a prefix, the number is decremented first and then it is used in the expression.

If it is used as a postfix, first, the number is used in the expression and then it is decremented.


224

Relational OperatorsThe following operators serve to compare some subexpressions when you want to obtain a boolean value result.Either of the mentioned signs can be used. If you choose the .operator. signs, they must be surrounded bywhite spaces. These signs can be used more times in one expression. In such a case you can express priority ofcomparisons by parentheses.

• Greater than

Either of the two signs below can be used to compare expressions consisting of numeric, date and string datatype. Both data types in the expressions must be comparable. The result can depend on the order of the twoexpressions if they are of different data type.

• >

• .gt.

• Greater than or equal to

Either of the three signs below can be used to compare expressions consisting of numeric, date and string datatype. Both data types in the expressions must be comparable. The result can depend on the order of the twoexpressions if they are of different data type.

• >=

• =>

• .ge.

• Less than

Either of the two signs below can be used to compare expressions consisting of numeric, date and string datatype. Both data types in the expressions must be comparable. The result can depend on the order of the twoexpressions if they are of different data type.

• <

• .lt.

• Less than or equal to

Either of the three signs below can be used to compare expressions consisting of numeric, date and string datatype. Both data types in the expressions must be comparable. The result can depend on the order of the twoexpressions if they are of different data type.

• <=

• =<

• .le.

• Equal to

Either of the two signs below can be used to compare expressions of any data type. Both data types in theexpressions must be comparable. The result can depend on the order of the two expressions if they are ofdifferent data type.

• ==

• .eq.


225

• Not equal to

Either of the three signs below can be used to compare expressions of any data type. Both data types in theexpressions must be comparable. The result can depend on the order of the two expressions if they are ofdifferent data type.

• !=

• <>

• .ne.

• Matches regular expression

The operator serves to compare string and some regular expression. The regular expression can look like this(for example): "[^a-d].*" It means that any character (it is expressed by the dot) except a, b, c, d (exceptionis expressed by the ̂ sign) (a-d means - characters from a to d) can be contained zero or more times (expressedby *). Or, '[p-s]{5}' means that p, r, s must be contained exactly five times in the string. For more detailedexplanation about how to use regular expressions see java.util.regex.Pattern.

• ~=

• .regex.

• Contained in

This operator serves to specify whether some value is contained in the list or in the map of other values.

• .in.

Logical OperatorsIf the expression whose value must be of boolean data type is complicated, it can consist of some subexpressions(see above) that are put together by logical conjunctions (AND, OR, NOT, .EQUAL TO, NOT EQUAL TO). Ifyou want to express priority in such an expression, you can use parentheses. From the conjunctions mentionedbelow you can choose either form (for example, && or and, etc.).

Every sign of the form .operator. must be surrounded by white space.

• Logical AND

• &&

• and

• Logical OR

• ||

• or

• Logical NOT

• !

• not

• Logical EQUAL TO

• ==


226

• .eq.

• Logical NOT EQUAL TO

• !=

• <>

• .ne.

Simple Statement and Block of StatementsSimple statement is an expression terminated by semicolon. Block of statements is a serie of simple statements(either is terminated by semicolon). The statements in a block can follow each other in one line or they can be inmore lines. But remember that each of the statements in such a block must be terminated by semicolon. Sometimesthis block of statements must be surrounded by curled braces (if it is part of some other statement and must beexecuted as one statement). In this case, no semicolon is used after the closing curled brace.

Control StatementsSome statements serve to control the process of the program.

Selection StatementsThese statements serve to branch out the process of the program.

If Statement

On the basis of the Condition value this statement decides whether the Statement should be executed. If theCondition is true, Statement is executed. If it is false, the Statement is ignored and process continuesnext after the If statement. Statement is either simple statement or a block of statements

• if (Condition) Statement

Unlike the previous version of the If statement (in which the Statementis executed only if the Condi-tion is true), other Statements that should be executed even if the Condition value is false can be added tothe If statement . Thus, if the Condition is true, Statement1 is executed, if it is false, Statement2is executed. See below:

• if (Condition) Statement1 else Statement2

The Statement2 can even be another If statement and also with else branch:

• if (Condition1) Statement1 else if (Condition2) Statement3 else Statement4

Switch Statement

Sometimes you would have very complicated statement if you created the statement of more branched out Ifstatement. In such a case, much more better is to use the Switch statement.

Now, the Condition is evaluated and according to the value of the Expression you can branch out theprocess. If the value of Expression equals to the value of the Expression1, the Statement1 are executed.The same is valid for the other Expression/Statement pairs. But, if the value of Expression does notequal to none of the Expression1,...,ExpressionN, nothing is done and the process jumps over the


227

Switch statement. And, if the value of Expression equals to the values of more ExpressionK, moreStatementK (for different K) are executed.

• switch (Expression) { case Expression1:Statement1 case Expression2:Statement2 ... case ExpressionN:StatementN}

In the following case, even if the value of Expression does not equal to the values of theExpression1,...,ExpressionN, StatementN+1 is executed.

• switch (Expression) { case Expression1:Statement1 case Expression2:Statement2 ... case ExpressionN:StatementN default:StatementN+1}

Iteration StatementsThese iteration statements repeat some processes during which some inner Statements are executed cyclicallyuntil the Condition that limits the execution cycle becomes false.

For Loop

First, the Initialization is set up, after that, the Condition is evaluated and if its value is true, the Statementis executed and finally the Iteration is made.

During the next cycle of the loop, the Condition is evaluated again and if it is true, Statement is executedand Iteration is made. This way the process repeats until the Condition becomes false. Then the loop isterminated and the process continues with the other part of the program.

If the Condition is false at the beginning, the process jumps over the Statement out of the loop.

• for (Initialization;Condition;Iteration) { Statement}

Do-While Loop

First, the Statement is executed, then the process depends on the value of Condition. If its value is true, theStatement is executed again and then the Condition is evaluated again and the subprocess either continues(if it is true again) or stops and jumps to the next or higher level subprocesses (if it is false). Since the Conditionis at the end of the loop, even if it is false at the beginning of the subprocess, the Statement is executed atleast once.

• do { Statement} while (Condition)

While Loop

This process depends on the value of Condition. If its value is true, the Statements is executed and thenthe Condition is evaluated again and the subprocess either continues (if it is true again) or stops and jumps to


228

the next or higher level subprocesses (if it is false). Since the Condition is at the start of the loop, if it is falseat the beginning of the subprocess, the Statements is not executed at all and the loop is jumped over.

• while (Condition) { Statement}

Jump StatementsSometimes you need to control the process in a different way than by decision based on the Condition value.To do that, you have the following options:

Break Statement

If you want to stop some subprocess, you can use the following word in the program:

• break

The subprocess breaks and the process jumps to the higher level or to the next Statements.

Continue Statement

If you want to stop some iteration subprocess, you can use the following word in the program:

• continue

The subprocess breaks and the process jumps to the next iteration step.

Return Statement

In the functions you can use the return word either alone or along with some expression. (See the followingtwo options below.) The return statement must be at the end of the function. If it were not at the end, all of thevariableDeclarations, Statements and Mappings located after it would be ignored and skipped. Thewhole function both without the return word and with the return word alone returns null, whereas the functionwith the return expression returns the value of the expression. Remember that the data type of theexpression must be the same as that of the declared return value.

• return

• return expression

FunctionsYou can also define your own functions in the following way:

• function functionName (arg1,arg2,...) { variableDeclarations Statements Mappings [return [expression]]}

You must put the return statement at the end (For more information about the return statement see Section "ReturnStatement" above.), right before it there can be some Mappings, the variableDeclarations and State-ments must be at the beginning, the variableDeclarations and Statements can even be interspersed,but you must remember that undeclared and uninitialized variables cannot be used. So we suggest that first youdeclare variables and only then specify the Statements.


229

EvalThe following two functions allow to parse, execute or insert some CTL expression into you CTL program.

The first function (eval(someExpression)) parses some expression and adds it to the place where theeval(someExpression) is executed.

The second function (eval_exp(someExpression)) parses and executes some CTL expression and cleanit once it is executed. If you want to evaluate some mathematical expression or perform a simple task, it is goodto use this function.

• eval()

• eval_exp()

ParametersThe parameters can be used in Clover transformation language in the following way: ${nameOfTheParame-ter}. If you want such a parameter is considered a string data type, you must surround it by single or doublequotes like this: '${nameOfTheParameter}' or "${nameOfTheParameter}".

SequencesIn your graphs you are also using sequences. You can use them in CTL by specifying the name of the sequenceand placing it as an argument in the sequence() function.

You have three options depending on what you want to do with the sequence. You can get the current numberof the sequence, or get the next number of the sequence, or you may want to reset the sequence numbers to theinitial number value.

See the mentioned following three options:

• sequence(nameOfTheSequence).current

• sequence(nameOfTheSequence).next

• sequence(nameOfTheSequence).reset

Although these expressions return integer values, you may also want to get long or string values. This can be donein one of the following ways:

• sequence(nameOfTheSequence,long).current

• sequence(nameOfTheSequence,long).next

• sequence(nameOfTheSequence,string).current

• sequence(nameOfTheSequence,string).next

Lookup TablesIn your graphs you are also using lookup tables. You can use them in CTL by specifying the name of thelookup table and placing it as an argument in the lookup(), lookup_next(), lookup_found() orlookup_admin() functions.

You have five options depending on what you want to do with the lookup table. You can create lookup table, getthe value of the specified field name from the lookup table associated with the specified key, or get the next value


230

of the specified field name from the lookup table, or (if the records are duplicated) count the number of the recordswith the same field name values, or you can destroy the lookup table.

Now, the key is a sequence of values of the field names separated by comma (not semicolon!). Thus, the key isof the following form: keypart1,keypart2,...,keypartN.

See the mentioned following five options:

• lookup_admin(nameOfTheLookupTable).init

• lookup(nameOfTheLookupTable,key).fieldName

• lookup_next(nameOfTheLookupTable).fieldName

• lookup_found(nameOfTheLookupTable)

• lookup_admin(nameOfTheLookupTable).free

Data FlowsThis section describes the way how the record fields should be marked. As you know, each component has someports. Both input and output ports are numbered starting from 0. And they can be marked by their names as well.The names of data flows are the names of metadata. And the names of data fields are the names of metadata fields.Thus, if you want to mark any field within any data flow, you must do it by marking both the data flow and thedata field. They must be separated by dot, the data flow name on the left from the dot, the data field on the rightfrom the dot. Each of them can be marked by either number or name. Therefore, you can have the following fourpossibilities how to mark record field(s) (of which you can use either their number or name):

• flowNumber.fieldNumber

• flowNumber.fieldName

• flowName.fieldNumber

• flowName.fieldName

MappingSimilarly like some existing files with some code must be imported at the beginning of the program, also mappingmust be at the end of the program.

When you want to do some mapping, you must do it in the following way:

Each mapping is an assignment of inputs to outputs. Since CTL is used in a component that have output ports,each mapping serves to assign (map) values to the output port(s).

The procedure is as follows: On the left side of any mapping, there is an output record field to which some value(s)are assigned. On the right side, there are the value(s). The left side and the right side are put together by colonand equal sign ( := ) On the right side, there is a sequence of expressions offering values separated by one whitespace, one colon and another white space. The number of these expressions is unlimited (but it can also be onlyone expression terminated by semicolon). The whole sequence is terminated by semicolon as well. (For moreinformation about how to mark the record fields see Section "Data Flows".)

When the mapping is being done, the expressions are evaluated going from left to right, and when some of theseexpressions is found to be successful, it is mapped to the left side of the mapping.

• recordField:=expression1 : expression2 : expression3 : ... : expressionN;

231

Appendix E. Clover TL FunctionsClover transformation language has at its disposal a set of functions you can use. We describe them here.

Conversion FunctionsSometimes you need to convert values from one data type to another. This can be done by using the followingfunctions:

• bytearray base64byte(string arg);

The base64byte(string) function takes one string argument in base64 representation and converts it toan array of bytes. Its counterpart is the byte2base64(bytearray) function.

• string bits2str(bytearray arg);

The bits2str(bytearray) function takes an array of bytes and converts it to a string. Its counterpart isthe str2bits(string) function.

• int bool2num(boolean arg);

The bool2num(boolean) function takes one boolean argument and converts it to either integer 1 (ifthe argument is true) or integer 0 (if the argument is false). Its counterpart is the num2bool(numeric)function.

• numerictype bool2num(boolean arg, typename numerictype);

The bool2num(boolean, typename) function accepts two arguments: the first is boolean and the otheris the name of any numeric data type. It takes them and converts the first argument to the corresponding 1 or 0in the numeric representation specified by the second argument. The return type of the function is the sameas the second argument. Its counterpart is the num2bool(numeric) function.

• string byte2base64(bytearray arg);

The byte2base64(bytearray) function takes an array of bytes and converts it to a string in base64representation. Its counterpart is the base64byte(string) function.

• string byte2hex(bytearray arg);

The byte2hex(bytearray) function takes an array of bytes and converts it to a string in hexadecimalrepresentation. Its counterpart is the hex2byte(string) function.

• long date2long(date arg);

The date2long(date) function takes one date argument and converts it to a long type. Its value equals tothe number of milliseconds elapsed from January 1, 1970, 00:00:00 GMT to the date specified asthe argument. Its counterpart is the long2date(long) function.

• int date2num(date arg, unit timeunit);

The date2num(date, unit) function accepts two arguments: the first is date and the other is any time unit.The unit can be one of the following: year, month, day, hour, minute, second, millisecond. Thefunction takes these two arguments and converts them to an integer. If the time unit is contained in the date, it isreturned as an integer number. If it is not contained, the function returns 0. Remember that months are numberedstarting from 0. Thus, date2num(2008-06-12, month) returns 5. And date2num(2008-06-12,hour) returns 0.

Clover TL Functions

232

• string date2str(date arg, string pattern);

The date2str(date, string) function accepts two arguments: date and string. The functiontakes them and converts the date according to the pattern specified as the second argument. Thus,date2str(2008-06-12, "dd.MM.yyyy") returns the following string: "12.6.2008". Its counter-part is the str2date(string, string) function.

• bytearray hex2byte(string arg);

The hex2byte(string) function takes one string argument in hexadecimal representation and converts itto an array of bytes. Its counterpart is the byte2hex(bytearray) function.

• date long2date(long arg);

The long2date(long) function takes one long argument and converts it to a date. It adds the argumentnumber of seconds to January 1, 1970, 00:00:00 GMT and returns the result as a date. Its counterpartis the date2long(date) function.

• boolean num2bool(numerictype arg);

The num2bool(numeric) function takes one argument of any numeric data type representing 1 or 0 andreturns boolean true or false, respectively.

• numerictype num2num(numerictype arg, typename numerictype);

The num2num(numerictype, typename) function accepts two arguments: the first is of any numericdata type and the second is the name of any numeric data type. It takes them and converts the first argument valueto that of the numeric type specified as the second argument. The return type of the function is the same as thesecond argument. The conversion is successful only if it is possible without any loss of information, otherwisethe function throws exception. Thus, num2num(25.4, int) throws exception, whereas num2num(25.0,int) returns 25.

• string num2str(numerictype arg);

The num2str(numeric) function takes one argument of any numeric data type and converts it to its stringrepresentation. Thus, num2str(20.52) returns "20.52" .

• string num2str(numerictype arg, int radix);

The num2str(numerictype, int) function accepts two arguments: the first is of any numeric data typeand the second is integer. It takes these two arguments and converts the first to its string representation in theradix based numeric system. Thus, num2str(31, 16) returns "1F".

• bytearray str2bits(string arg);

The str2bits(string) function takes one string argument and converts it to an array of bytes. Its coun-terpart is the bits2str(bytearray) function.

• boolean str2bool(string arg);

The str2bool(string) function takes one string argument and converts it to the corresponding booleanvalue. The string can be one of the following four: "true", "1", "false", "0". The first two strings areconverted to boolean true, the other two are converted to boolean false.

• date str2date(string arg, string pattern);

The str2date(string, string) function accepts two string arguments. It takes them and convertsthe first string to the date according to the pattern specified as the second argument. The pattern mustcorrespond to the structure of the first argument. Thus, str2date("12.6.2008", "dd.MM.yyyy")returns the following date: 2008-06-12 .

Clover TL Functions

233

• date str2date(string arg, string pattern, string locale, boolean lenient);

The str2date(string, string, string, boolean) function accepts three string arguments andone boolean. It takes the arguments and converts the first string to the date according to the pattern spec-ified as the second argument. The pattern must correspond to the structure of the first argument. Thus,str2date("12.6.2008", "dd.MM.yyyy") returns the following date: 2008-06-12 . The thirdargument defines the locale for the date. The fourth argument specify whether date interpretation should belenient (true) or not (false). If it is true, the function tries to make interpretation of the date even if it does notmatch locale and/or pattern. If this function has three arguments only, the third one is interpreted as locale (ifit is string) or lenient (if it is boolean).

• numerictype str2num(string arg);

The str2num(string) function takes one string argument and converts it to the corresponding numericvalue. Thus, str2num("0.25") returns 0.25 if the function is declared with double return type, but thesame throws exception if it is declared with integer return type. The return type of the function can be anynumeric type.

• numerictype str2num(string arg, typename numerictype);

The str2num(string, typename) function accepts two arguments: the first is string and the second is thename of any numeric data type. It takes the first argument and returns its corresponding value in the numericdata type specified by the second argument. The return type of the function is the same as the second argument.

• numerictype str2num(string arg, typename numerictype, int radix);

The str2num(string, typename, int) function accepts three arguments: string, the name of anynumeric data type and integer. It takes the first argument as if it were expressed in the radix based numericsystem representation and returns its corresponding value in the numeric data type specified as the secondargument. The return type is the same as the second argument. The third argument can be 10 or 16 for doubletype as the second argument, 10 for decimal type as the second argument and any integer number betweenCharacter.MIN_RADIX and Character.MAX_RADIX for int and long types as the second argument.

• string to_string(anytype arg);

The to_string(anytype) function takes one argument of any data type and converts it to its string rep-resentation.

• bool try_convert(anytype from, anytype to, string pattern);

The try_convert(anytype, anytype, string) function accepts three arguments: two are of anydata type, the third is string. The function takes these arguments, tries convert the first argument to the second. Ifthe conversion is successful, the second argument receives the value from the first argument. And the functionreturns boolean true. If the conversion is not successful, the function returns boolean false and the first andsecond arguments retain their original values. The third argument is optional and it is used only if any of thefirst two arguments is string. For example, try_convert("27.5.1942", dateA, "dd.MM.yyyy")returns true and dateA gets the value of the 27 May 1942.

Date FunctionsWhen you work with date, you may use the following functions:

• date dateadd(date arg, numerictype amount, unit timeunit);

The dateadd(date, numerictype, unit) function accepts three arguments: the first is date, thesecond is of any numeric data type and the last is any time unit. The unit can be one of the following: year,month, day, hour, minute, second, millisecond. The function takes the first argument, adds theamount of time units to it and returns the result as a date. The amount and time unit are specified as thesecond and third arguments, respectively.

Clover TL Functions

234

• int datediff(date later, date earlier, unit timeunit);

The datediff(date, date, unit) function accepts three arguments: two dates and one time unit. Ittakes these arguments and subtracts the second argument from the first argument. Then the function returns theresulting time difference expressed in time units specified as the third argument. Thus, the difference of two datesis expressed in defined time units. The result is expressed as an integer number. Thus, date(2008-06-18,2001-02-03, year) returns 7. But, date(2001-02-03, 2008-06-18, year) returns -7!

• date today();

The today() function accepts no argument and returns current date and time.

• date trunc(date arg);

The trunc(date) function takes one date argument and returns the date with the same year, month and day,but hour, minute, second and millisecond are set to 0.

• long trunc(numerictype arg);

The trunc(numerictype) function takes one argument of any numeric data type and returns its truncatedlong value.

• null trunc(list arg);

The trunc(list) function takes one list argument, empties its values and returns null.

• null trunc(map arg);

The trunc(map) function takes one map argument, empties its values and returns null.

Mathematical FunctionsYou may also want to use some mathematical functions:

• numerictype abs(numerictype arg);

The abs(numerictype) function takes one argument of any numeric data type and returns its absolute value.

• number e();

The e() function accepts no argument and returns the Euler number.

• number exp(numerictype arg);

The exp(numeric) function takes one argument of any numeric data type and returns the result of the ex-ponential function of this argument.

• number log(numerictype arg);

The log(numerictype) takes one argument of any numeric data type and returns the result of the naturallogarithm of this argument.

• number log10(numerictype arg);

The log10(numerictype) function takes one argument of any numeric data type and returns the result ofthe logarithm of this argument to the base 10.

• number pi();

The pi() function accepts no argument and returns the pi number.

Clover TL Functions

235

• number pow(numerictype base, numerictype exp);

The pow(numerictype, numerictype) function takes two arguments of any numeric data types (thatdo not need to be the same) and returns the exponential function of the first argument as the exponent withthe second as the base.

• number random();

The random() function accepts no argument and returns a random positive double greater than or equal to0.0 and less than 1.0.

• long round(numerictype arg);

The round(numerictype) function takes one argument of any numeric data type and returns the long thatis closest to this argument.

• number sqrt(numerictype arg);

The sqrt(numerictype) function takes one argument of any numeric data type and returns the square rootof this argument.

String FunctionsSome functions work with strings. Here are the functions:

• string char_at(string arg, numerictype index);

The char_at(string, numerictype) function accepts two arguments: the first is string and the otheris of any numeric data type. It takes the string and returns the character that is located at the position specifiedby the index.

• string concat(anytype arg1, ... ..., anytype argN);

The concat(anytype, ..., anytype) function accepts unlimited number of arguments of any datatype. But they do not need to be the same. It takes these arguments and returns their concatenation. If somearguments are not strings, they are converted to their string representation before the concatenation is done. Youcan also concatenate these arguments using plus signs, but this function is faster for more than two arguments.

• string get_alphanumeric_chars(string arg);

The get_alphanumeric_chars(string) function takes one string argument and returns only lettersand digits contained in the string argument in the order of their appearance in the string. The other charactersare removed.

• string get_alphanumeric_chars(string arg, boolean takeAlpha, boolean tak-eNumeric);

The get_alphanumeric_chars(string, boolean, boolean) function accepts three arguments:one string and two booleans. It takes them and returns letters and/or digits if the second and/or the third argu-ments, respectively, are set to true.

• int index_of(string arg, string substring);

The index_of(string, string) function accepts two strings. It takes them and returns the index of thefirst appearance of substring in the string specified as the first argument.

Clover TL Functions

236

• int index_of(string arg, string substring, int fromIndex);

The index_of(string, string, int) function accepts three arguments: two strings and one integer.It takes them and returns the index of the first appearance of substring counted from the character locatedat the position specified by the third argument.

• boolean is_ascii(string arg);

The is_ascii(string) function takes one string argument and returns a boolean value depending onwhether the string can be encoded as an ASCII string (true) or not (false).

• boolean is_blank(string arg);

The is_blank(string) function takes one string argument and returns a boolean value depending onwhether the string contains only white space characters (true) or not (false).

• boolean is_date(string arg, string pattern);

The is_date(string, string) function accepts two string arguments. It takes them, compares the firstargument with pattern and returns a boolean value depending on whether the first argument can be convertedto date using this pattern (true) or not (false).

• boolean is_date(string arg, string pattern, string locale, boolean lenient);

The is_date(string, string, string, boolean) function accepts three string arguments andone boolean. It takes them, compare the first argument with the second as a pattern, use the third (locale) andif it is comparable and can be converted to date, the function returns true independently on the fourth argument.If it were not possible, it would return false. But, it is possible to set the fourth argument to true and the functionwill try to judge whether the string is date. If it is successful, again true is returned.

• boolean is_integer(string arg);

The is_integer(string) function takes one string argument and returns a boolean value depending onwhether the string can be converted to an integer number (true) or not (false).

• boolean is_long(string arg);

The is_long(string) function takes one string argument and returns a boolean value depending onwhether the string can be converted to a long number (true) or not (false).

• boolean is_number(string arg);

The is_number(string) function takes one string argument and returns a boolean value depending onwhether the string can be converted to a double (true) or not (false).

• string join(string delimiter, anytype arg1, ... ..., anytype argN);

The join(string, anytype, ..., anytype) function accepts unlimited number of arguments. Thefirst is string, the others are of any data type. All data types do not need to be the same. The arguments that arenot strings are converted to their string representation and put together with the first argument as delimiter.

• string left(string arg, numerictype length);

The left(string, numerictype) function accepts two arguments: the first is string and the other is ofany numeric data type. It takes them and returns the substring of the length specified as the second argumentcounted from the start of the string specified as the first argument.

• string lowercase(string arg);

The lowercase(string) function takes one string argument and returns another string with cases convert-ed to lower cases only.

Clover TL Functions

237

• string remove_blank_space(string arg);

The remove_blank_space(string) function takes one string argument and returns another string withwhite spaces removed.

• string remove_diacritic(string arg);

The remove_diacritic(string) function takes one string argument and returns another string withdiacritic signs removed.

• string replace(string arg, string regex, string replacement);

The replace(string, string, string) function accepts three string arguments. The first is theoriginal string in which the second is searched and if it is found, it is replaced by the third string argument. Thesecond argument is some regular expression. The resulting string is returned. Thus, replace("Hello","[Ll]", "t") returns "Hetto".

• string right(string arg, numerictype length);

The right(string, numerictype) function accepts two arguments: the first is string and the other isof any numeric data type. It takes them and returns the substring of the length specified as the second argumentcounted from the end of the string specified as the first argument.

• string soundex(string arg);

The soundex(string) function takes one string argument and converts the string to another. The resultingstring consists of the first letter of the string specified as the argument and three digits. The three digits are basedon the consonants contained in the string when similar numbers correspond to similarly sounding consonants.Thus, soundex("word") returns "w600".

• list split(string arg, string regex);

The split(string, string) function accepts two string arguments. The second is some regular expres-sion. It is searched in the first string argument and if it is found, the string is split into the parts located betweenthe characters or substrings of such a regular expression. The resulting parts of the string are returned as a list.Thus, split("abcdefg", "[ce]") returns ["ab", "d", "fg"].

• string substring(string arg, numerictype fromIndex, numerictype length);

The substring(string, numerictype, numerictype) function accepts three arguments: the firstis string and the other two are of any numeric data type. The two numeric types do not need to be the same.The function takes the arguments and returns a substring of the defined length obtained from the original stringby getting the length number of characters starting from the position defined by the second argument. Ifthe second and third arguments are not integers, only the integer parts of them are used by the function. Thus,substring("text", 1.3, 2.6) returns "ex".

• string translate(string arg, string searchingSet, string replaceSet);

The translate(string, string, string) function accepts three string arguments. The number ofcharacters must be equal in both the second and the third arguments. If some character from the string specifiedas the second argument is found in the string specified as the first argument, it is replaced by a character takenfrom the string specified as the third argument. The character from the third string must be at the same positionas the character in the second string. Thus, translate("hello", "leo", "pii") returns "hippi".

• string trim(string arg);

The trim(string) function takes one string argument and returns another string with leading and trailingwhite spaces removed.

Clover TL Functions

238

• string uppercase(string arg);

The uppercase(string) function takes one string argument and returns another string with cases convert-ed to upper cases only.

Miscellaneous FunctionsThe rest of the functions can be denominated as miscellaneous. These are the following:

• void breakpoint();

The breakpoint() function accepts no argument and prints out all global and local variables.

• anytype iif(boolean con, anytype iftruevalue, anytype iffalsevalue);

The iif(boolean, anytype, anytype) function accepts three arguments: one is boolean and two areof any data type. Both argument data types and return type are the same.

The function takes the first argument and returns the second if the first is true or the third if the first is false.

• boolean isnull(anytype arg);

The isnull(anytype) function takes one argument and returns a boolean value depending on whether theargument is null (true) or not (false). The argument may be of any data type.

• anytype nvl(anytype arg, anytype default);

The nvl(anytype, anytype) function accepts two arguments of any data type. Both arguments must beof the same type. If the first argument is not null, the function returns its value. If it is null, the function returnsthe default value specified as the second argument.

• void print_err(anytype message);

The print_err(anytype) function accepts one argument of any data type. It takes this argument and printsout the message on the error port.

• void print_err(anytype message, boolean printLocation);

The print_err(type, boolean) function accepts two arguments: the first is of any data type and thesecond is boolean. It takes them and prints out the message and the location of the error (if the second argumentis true).

• void print_log(level loglevel, anytype message);

The print_log(level, anytype) function accepts two arguments: the first is a log level of the mes-sage specified as the second argument, which is of any data type. The first argument is one of the following:debug, info, warn, error, fatal. The function takes the arguments and sends out the message to alogger.

• void print_stack();

The print_stack() function accepts no argument and prints out all variables from the stack.

• void raise_error(string message);

The raise_error(string) function takes one string argument and throws out error with the messagespecified as the argument.

239

Appendix F. Clover TransformationLanguage LiteTransformations can be defined in Java or Clover transformation language, but you could use Clover transforma-tion language Lite as well. Nevertheless, its use is deprecated now and we suggest you write transformations inJava or CTL instead.

Here we will explain how CTL Lite looks like.

Input ports and output ports are represented by words in and out.

Ports are represented by numbers. They are numbered starting from 0.

Field names are used to represent fields of records.

You can get record values using the $ sign.

The resulting expressions look like the following two: ${in.numberofinputport.fieldname} for inputsand ${out.numberofoutputport.fieldname} for outputs.

Input field values can be assigned to output fields with the help of equal sign.

Here is an example: ${out.0.name} = ${in.0.fname} + ${in.0.lname}

Parameters and sequences can also be used in CTL Lite. Their values must be expressed this way:${par.parametername} and ${seq.sequencename}. The par and seq words means that parameterand sequence are used in these expressions and parametername and sequencename are the names of theparameter and the name of the sequence.

If you want to get the whole objects instead of field values, you can use the @ sign in-stead of $. The structure is the same: @{in.numberofinputport.fieldname} for inputs and@{out.numberofoutputport.fieldname} for outputs.

All of these expressions can be used in defining transformations in some components, but we once more suggestyou use Clover transformation language and/or Java instead of CTL Lite.

cloveretl/gui user's manual version 2.0

Documents