cloveretl training sample
DESCRIPTION
A sample from CloverETL training available on-site or onlineTRANSCRIPT
Training course goals
• Develop and support solutions based on CloverETL technology– Compose and debug transformations in CloverETL Designer– Connect to any number of data sources or sinks (files,
databases, web/cloud…)– Detect and react to errors in the data– Use CloverETL Server to process large amounts of data– Design and develop Job Flows to manage complex
processes– Support existing solutions using CloverETL
© 2013 Javlin; All rights reserved1
Course modules: basics
1. CloverETL introductionCloverETL product family, basic terminology
2. First steps in CloverETL DesignerBuilding transformations with basic components, reading and writing data
3. Error handlingProperly handling input data errors and runtime errors
4. Common components and CTL programmingCommonly used components and business rule development in CTL
5. DatabasesConnecting to databases and using them as data sources and targets
© 2013 Javlin; All rights reserved2
Course modules: advanced
6. Structured dataHandling complex data formats like XML or JSON, using web services
7. Advanced graph designComplex transformation components, Java transformations and more
8. CloverETL ServerIntroduction to CloverETL Server, its user interface and execution environment
9. JobflowsBuilding jobflows to manage your processes on CloverETL Server
10. Advanced CloverETL ServerAdvanced graph scheduling, using Launch Services and CloverETL Cluster
© 2013 Javlin; All rights reserved3
CloverETL product family
• CloverETL is a whole family of products– Support for broad range of usage scenarios– Purely Java-based – supported on many operating
systems• Windows, *nix, Linux, Mac
© 2013 Javlin; All rights reserved4
Java
CloverETL Engine
CloverETL Designer
CloverETL Server
Metadata
• Metadata describe record structure and format– Required for each edge used in the graph to define the format of the data flowing
through that edge
• Structure defines fields in the record– Unique (within record) field names– Data types to determine type of information which can be stored in the record.– Flat structure – no nesting is allowed
• Format defines rules for data input and output– Format of the record: delimited, fixed-length or mixed
• Delimiters only apply when working with files
– Parsing rules for readers and formatting rules for writers• Special formatting for numbers, date fields, …
© 2013 Javlin; All rights reserved5
© 2013 Javlin; All rights reserved6
Metadata types and fields
• Record type determines how to find the fields– Delimited: fields are separated by delimiters– Fixed-length: each fields has predefined number
of characters– Mixed: both types of fields in single record
• Fields can be of various types– Numeric: integer, long, number, decimal– Text: string– Boolean values: boolean– Date and time: date– Other: byte, cbyte– Containers: list or map of a primitive type
Transaction
transactionType
string
transactionId long
accountNumber long
timestamp date
amount decimal(20, 3)
3
1
2
5
4
© 2013 Javlin; All rights reserved7
1340817132,3293200814,D,59.940,201001021252431340817156,5357054331,C,6.720,20100116080136
1340817746,4270100470,D,194.920,20100323100706
Field ordering matters
• Ordering of the fields is very important– For parsing– For output formatting
• Data is read/written in the same order in which the fields are defined.
Transaction
transactionType
string
transactionId long
accountNumber long
timestamp date
amount decimal(20, 3)
3
1
2
5
4
Reformat and CTL code
• Transformation in Reformat can be written directly in CTL without using Visual Mode– Use all CTL features: control structures, error handling, logging…– Write comments explaining the complex parts of the code
• Editor supports syntax highlighting, autocomplete and on-the-fly code validation
© 2013 Javlin; All rights reserved8
© 2013 Javlin; All rights reserved9
Reformat code workflowinit
preExecute
backbacktransform
postExecute
transformOnError
Called during component initialization
Called before the first record is processed.
Main part of the transformation. Called once for each input record.Return value determines which port (if any) receives the result.
Called after the last record is processed immediately before component finishes.
Begin
End
Next
reco
rd
Error
Called only if transform caused an error.
© 2013 Javlin; All rights reserved10
Data denormalization
accountId customerId
balance created closed
9804568699
27345 2300.56 2011-11-14
1108193472
27345 -1739.05 2005-07-22
6054951154
27345 4500.60 2009-09-01 2010-04-30
9459175447
27345 3200.80 2011-03-08
customerId
totalBalance
accounts
27345 8262.91 [9804568699, 1108193472, 6054951154, 9459175447]
CustomerAccounts
AccountOriginal dataMultiple records grouped based on the key.
Denormalized dataSingle record containing values determined by processing the whole input group.
Denormalize
© 2013 Javlin; All rights reserved11
Denormalizer
• Converts data into denormalized form– Combine multiple records in a group into one output record– Output usually uses different metadata
• Required configuration– Transformation code
• Only CTL can be used, visual mode is not available
– Grouping• Group can be defined based on a key or group size• If key is used, data has to be sorted
© 2013 Javlin; All rights reserved12
Denormalize code workflowinit
preExecute
backbacktransform
postExecute
transformOnError
transform and append have their own error handler. Each handler interrupts the group and resumes processing as if the group was processed as a whole.append is called once for each
record in a group. It is typically used to update global variables which are then used in transform function.
transform is called once per group and is the only function which generates output records.
clean is called after each transform and can be used to clean-up internal variables.
Begin
End
Next
reco
rd
Error
backbackappend
back backclean
appendOnError
Next
gro
up