spoon 3 0 0 user guide
TRANSCRIPT
-
8/13/2019 Spoon 3 0 0 User Guide
1/265
Last Modified on October 26th, 2007
Pentaho Data Integration
Spoon 3.0 User Guide
Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their
respective oners. !or the latest information" please visit our eb site at.pentaho.org
http://www.pentaho.org/http://www.pentaho.org/http://www.pentaho.org/http://www.pentaho.org/ -
8/13/2019 Spoon 3 0 0 User Guide
2/265
1. Contents
#. Contents................................................................................................................................................. 2
2. About $his %ocument.............................................................................................................................. &
2.#. 'hat it is...................................................................................................................................... &
2.2. 'hat it is not................................................................................................................................ &
(. )ntroduction to *poon.............................................................................................................................. #0
(.#. 'hat is *poon+............................................................................................................................. #0
(.2. )nstallation................................................................................................................................... #0
(.(. ,aunching *poon........................................................................................................................... ##
(.-. *upported platforms...................................................................................................................... ##
(.. /non )ssues............................................................................................................................... ##
(.. *creen shots................................................................................................................................. #2
(.7. Command line options................................................................................................................... #(
(.1. Repository.................................................................................................................................... #
(.1.#. Repository Auto,ogin......................................................................................................... #
(.&. ,icense......................................................................................................................................... #
(.#0. %efinitions.................................................................................................................................. #7
(.#0.#. $ransformation %efinitions................................................................................................. #7
(.##. $oolbar....................................................................................................................................... #1
(.#2. 3ptions...................................................................................................................................... #&
(.#2.#. 4eneral $ab...................................................................................................................... #&
(.#2.2. ,ook 5 !eel tab................................................................................................................. 2#
(.#-. *earch 6eta data........................................................................................................................ 2-
(.#. *et environment variable............................................................................................................. 2-
(.#. 8ecution log history................................................................................................................... 2
(.#7. Replay........................................................................................................................................ 2
(.#1. 4enerate mapping against target step........................................................................................... 2
(.#1.#. 4enerate mappings e8ample.............................................................................................. 2
(.#&. *afe mode.................................................................................................................................. 27
(.20. 'elcome *creen.......................................................................................................................... 27
-. Creating a $ransformation or 9ob.............................................................................................................. (#
-.#. :otes........................................................................................................................................... (#
-.2. *creen shot.................................................................................................................................. (2
-.(. Creating a ne database connection............................................................................................... (2-.(.#. 4eneral.............................................................................................................................. ((
-.(.2. Pooling............................................................................................................................... ((
-.(.(. 6y*;,............................................................................................................................... (-
-.(.-. 3racle................................................................................................................................ (-
-.(.. )nformi8............................................................................................................................. (-
-.(.. *;, *erver......................................................................................................................... (
-.(.7. *AP R
-
8/13/2019 Spoon 3 0 0 User Guide
3/265
-.(.1. 4eneric.............................................................................................................................. (
-.(.&. 3ptions.............................................................................................................................. (7
-.(.#0. *;,.................................................................................................................................. (7
-.(.##. Cluster............................................................................................................................. (7
-.(.#2. Advanced......................................................................................................................... (1
-.(.#(. $est a connection.............................................................................................................. (1-.(.#-. 8plore............................................................................................................................. (1
-.(.#. !eature ,ist...................................................................................................................... (1
-.-. diting a connection...................................................................................................................... (1
-.. %uplicate a connection................................................................................................................... (1
-.. Copy to clipboard.......................................................................................................................... (1
-.7. 8ecute *;, commands on a connection......................................................................................... (1
-.1. Clear %= Cache option................................................................................................................... (&
-.&. ;uoting........................................................................................................................................ (&
-.#0. %atabase >sage 4rid................................................................................................................... (&
-.##. Configuring 9:%) connections....................................................................................................... -2
-.#2. >nsupported databases................................................................................................................ --
. *;, ditor.............................................................................................................................................. -
.#. %escription................................................................................................................................... -
.2. ,imitations.................................................................................................................................... -
. %atabase 8plorer................................................................................................................................... -
7. ?ops...................................................................................................................................................... -7
7.#. %escription................................................................................................................................... -7
7.#.#. $ransformation ?ops........................................................................................................... -7
7.#.2. 9ob ?ops............................................................................................................................ -7
7.2. Creating A ?op............................................................................................................................. -1
7.(. ,oops........................................................................................................................................... -1
7.-. 6i8ing ros@ trap detector............................................................................................................. -1
7.. $ransformation hop colors............................................................................................................. -&
1. ariables................................................................................................................................................ 0
1.#. ariable usage.............................................................................................................................. 0
1.2. ariable scope.............................................................................................................................. 0
1.2.#. nvironment variables......................................................................................................... 0
1.2.2. /ettle variables................................................................................................................... #
1.2.(. )nternal variables................................................................................................................ #
&. $ransformation *ettings........................................................................................................................... 2
&.#. %escription................................................................................................................................... 2
&.2. $ransformation $ab....................................................................................................................... 2
&.(. ,ogging........................................................................................................................................ 2
&.-. %ates........................................................................................................................................... (
&.. %ependencies............................................................................................................................... (
&.. 6iscellaneous............................................................................................................................... (
&.7. Partitioning................................................................................................................................... -
&.1. *;, =utton................................................................................................................................... -
#0. $ransformation *teps.............................................................................................................................
Pentaho Data Integration TM Soon !ser "#ide
$
-
8/13/2019 Spoon 3 0 0 User Guide
4/265
#0.#. %escription.................................................................................................................................
#0.2. ,aunching several copies of a step................................................................................................
#0.(. %istribute or copy+...................................................................................................................... 7
#0.-. *tep error handling...................................................................................................................... 1
#0.. Apache irtual !ile *ystem B!* support ..................................................................................... #
#0..#. 8ample@ Referencing remote Dob files................................................................................ ##0..2. 8ample@ Referencing files inside a Eip................................................................................ 2
#0.. $ransformation *tep $ypes........................................................................................................... (
#0..#. $e8t !ile )nput.................................................................................................................. (
#0..2. $able input....................................................................................................................... 72
#0..(. 4et *ystem )nfo................................................................................................................ 71
#0..-. 4enerate Ros................................................................................................................. 1#
#0... %eserialiFe from file Bformerly Cube )nput......................................................................... 12
#0... G=ase input...................................................................................................................... 1(
#0..7. 8cel input........................................................................................................................ 1-
#0..1. 4et !ile :ames.................................................................................................................. 1&
#0..&. $e8t !ile 3utput................................................................................................................ &0
#0..#0. $able output................................................................................................................... &(
#0..##. )nsert < >pdate............................................................................................................... &
#0..#2. >pdate........................................................................................................................... &7
#0..#(. %elete............................................................................................................................ &1
#0..#-. *erialiFe to file Bformerly Cube !ile 3utput........................................................................ &&
#0..#. G6, 3utput..................................................................................................................... #00
#0..#. 8cel 3utput................................................................................................................... #02
#0..#7. 6icrosoft Access 3utput................................................................................................... #0-
#0..#1. %atabase lookup.............................................................................................................. #0
#0..#&. *tream lookup................................................................................................................. #07
#0..20. Call %= Procedure............................................................................................................ #0&
#0..2#. ?$$P Client..................................................................................................................... ###
#0..22. *elect values................................................................................................................... ##2
#0..2(. !ilter ros....................................................................................................................... ##-
#0..2-. *ort ros........................................................................................................................ ##
#0..2. Add seHuence................................................................................................................. ##7
#0..2. %ummy Bdo nothing....................................................................................................... ##&
#0..27. Ro :ormaliser............................................................................................................... #20
#0..21. *plit !ields...................................................................................................................... #22
#0..(0. >niHue ros.................................................................................................................... #2
#0..(#. 4roup =y........................................................................................................................ #2
#0..(2. :ull )f............................................................................................................................. #21
#0..((. Calculator....................................................................................................................... #2&
#0..(-. G6, Add......................................................................................................................... #(#
#0..(. Add constants................................................................................................................. #(-
#0..(. Ro %enormaliser........................................................................................................... #(
#0..(7. !lattener......................................................................................................................... #(
#0..(1. alue 6apper.................................................................................................................. #(1
Pentaho Data Integration TM Soon !ser "#ide
%
-
8/13/2019 Spoon 3 0 0 User Guide
5/265
#0..(&. =locking step................................................................................................................... #(&
#0..-0. 9oin Ros BCartesian product.......................................................................................... #-0
#0..-#. %atabase 9oin................................................................................................................. #-2
#0..-2. 6erge ros..................................................................................................................... #--
#0..-(. *orted 6erge.................................................................................................................. #-
#0..--. 6erge 9oin...................................................................................................................... #-#0..-. 9ava*cript alues............................................................................................................. #-7
#0..-. 6odified 9ava *cript alue................................................................................................ #-
#0..-7. 8ecute *;, script........................................................................................................... #
#0..-1. %imension lookup
-
8/13/2019 Spoon 3 0 0 User Guide
6/265
#2.2.-. 9ob.................................................................................................................................. 20-
#2.2.. *hell................................................................................................................................ 20
#2.2.. 6ail.................................................................................................................................. 201
#2.2.7. *;,.................................................................................................................................. 2#0
#2.2.1. 4et a file ith !$P............................................................................................................. 2##
#2.2.&. $able 8ists...................................................................................................................... 2#(#2.2.#0. !ile 8ists....................................................................................................................... 2#-
#2.2.##. 4et a file ith *!$P......................................................................................................... 2#
#2.2.#2. ?$$P.............................................................................................................................. 2#
#2.2.#(. Create a file.................................................................................................................... 2#1
#2.2.#-. %elete a file.................................................................................................................... 2#&
#2.2.#. 'ait for a file.................................................................................................................. 220
#2.2.#. !ile compare................................................................................................................... 22#
#2.2.#7. Put a file ith *!$P......................................................................................................... 222
#2.2.#1. Ping a host..................................................................................................................... 22(
#2.2.#&. 'ait for.......................................................................................................................... 22-
#2.2.20. %isplay 6sgbo8 info......................................................................................................... 22
#2.2.2#. Abort Dob........................................................................................................................ 22
#2.2.22. G*, transformation.......................................................................................................... 227
#2.2.2(. Eip files.......................................................................................................................... 221
#2.2.2-. =ulkload into 6y*;,........................................................................................................ 22&
#2.2.2. 4et 6ails from P3P.......................................................................................................... 2(#
#2.2.2. %elete !iles..................................................................................................................... 2(2
#2.2.27. *uccess.......................................................................................................................... 2((
#2.2.21. G*% alidator.................................................................................................................. 2(-
#2.2.2&. 'rite to log..................................................................................................................... 2(
#2.2.(0. Copy !iles....................................................................................................................... 2(
#2.2.(#. %$% alidator................................................................................................................. 2(7
#2.2.(2. Put a file ith !$P........................................................................................................... 2(1
#2.2.((. >nFip.............................................................................................................................. 2(&
#2.2.(-. %ummy 9ob ntry............................................................................................................ 2-0
#(. 4raphical ie...................................................................................................................................... 2-#
#(.#. %escription................................................................................................................................. 2-#
#(.2. Adding steps or Dob entries........................................................................................................... 2-#
#(.2.#. Create steps by drag and drop........................................................................................... 2-#
#(.(. ?iding a step.............................................................................................................................. 2-2
#(.-. $ransformation *tep options Brightclick menu.............................................................................. 2-2
#(.-.#. dit step........................................................................................................................... 2-2
#(.-.2. dit step description.......................................................................................................... 2-2
#(.-.(. %ata movement................................................................................................................ 2-2
#(.-.-. Change number of copies to start....................................................................................... 2-2
#(.-.. Copy to clipboard.............................................................................................................. 2-2
#(.-.. %uplicate *tep................................................................................................................... 2-2
#(.-.7. %elete step....................................................................................................................... 2-2
#(.-.1. ?ide *tep......................................................................................................................... 2-2
Pentaho Data Integration TM Soon !ser "#ide
6
-
8/13/2019 Spoon 3 0 0 User Guide
7/265
#(.-.&. *ho input fields............................................................................................................... 2-2
#(.-.#0. *ho output fields........................................................................................................... 2-2
#(.. 9ob entry options Brightclick menu.............................................................................................. 2-2
#(..#. 3pen $ransformation
-
8/13/2019 Spoon 3 0 0 User Guide
8/265
2. 'bo#t This Doc#(ent
2.1. )hat it is
$his document is a technical description of *poon" the graphical transformation and Dob designer of the
Pentaho %ata )ntegration suite also knon as the /ettle proDect.
2.2. )hat it is not
$his document does not attempt to describe in great detail ho to create Dobs and transformations for all
possible situations. RecogniFing that different developers have different approaches to designing their data
integration solutions" *poon empoers users ith the freedom and fle8ibility to design solutions in the
manner they feel most appropriate to the problem at hand I and that is the ay it should beJ
3ther documentation
?ere are links to other documents that you might be interesting to go through hen you are building
transformations@
!lash demos" screen shots" and an introduction to building a simple transformation@
http@
-
8/13/2019 Spoon 3 0 0 User Guide
9/265
$. Introd#ction to Soon
$.1. )hat is Soon+
/ettle is an acronym for M/ettle .$.$.,. nvironmentN. $his means it has been designed to help you ith
your $$, needs@ the 8traction" $ransformation" $ransportation and ,oading of data.
*poon is a graphical user interface that allos you to design transformations and Dobs that can be run ith
the /ettle tools Pan and /itchen. Pan is a data transformation engine that is capable of performing a
multitude of functions such as reading" manipulating and riting data to and from various data sources.
/itchen is a program that can e8ecute Dobs designed by *poon in G6, or in a database repository. >sually
Dobs are scheduled in batch mode to be run automatically at regular intervals.
NOTE:!or a complete description of Pan or /itchen" please refer to the Pan and /itchen user guides.
$ransformations and 9obs can describe themselves using an G6, file or can be put in a /ettle database
repository. $his information can then be read by Pan or /itchen to e8ecute the described steps in the
transformation or run the Dob.
)n short" Pentaho %ata )ntegration makes data arehouses easier to build" update and maintainJ
$.2. Instaation
$he first step is the installation of *un 6icrosystems 9ava Runtime nvironment version #.- or higher. Oou
can donload a 9R for free at http@
-
8/13/2019 Spoon 3 0 0 User Guide
10/265
$.$. La#nching Soon
$o launch *poon on the different platforms these are the scripts that are provided@
Spoon.bat@ launch *poon on the 'indos platform.spoon.sh@ launch *poon on a >ni8like platform@ ,inu8" Apple 3*G" *olaris" ...
)f you ant to make a shortcut under the 'indos platform an icon is provided@ Mspoon.icoN to set the
correct icon. *imply point the shortcut to the *poon.bat file.
$.%. S#orted atfor(s
$he *poon 4>) is supported on the folloing platforms@
6icrosoft 'indos@ all platforms since 'indos &" including ista
,inu8 4$/@ on i(1 and 81Q- processors" orks best on 4nome
AppleLs 3*G@ orks both on PoerPC and )ntel machines
*olaris@ using a 6otif interface B4$/ optional
A)G@ using a 6otif interface
?P>G@ using a 6otif interface B4$/ optional
!ree=*%@ preliminary support on i(1" not yet on 81Q-
$.&. no/n Iss#es
Linux
3ccasional 96 crashes running *u* ,inu8 and /%. Running under 4nome has no problems. Bdetected on
*>* ,inu8 #0.# but earlier versions suffer the same problem
FreeBSD
Problems ith drag and drop. 'orkaround is to use the right click popup menu on the canvas. B)nsert ne
step
Please check the $racker lists at http@
-
8/13/2019 Spoon 3 0 0 User Guide
11/265
$.6. Screen shots
$he 6ain tree in the upperleft panel of *poon allos you to brose connections along ith the Dobs and
transformations you currently have open. 'hen designing a transformation" the Core 3bDects palate in the
loer leftpanel contains the available steps used to build your transformation including input" output"
lookup" transform" Doins" scripting steps and more. 'hen designing a Dob" the Core obDects palate contains
the available Dob entries. 'hen designing a Dob" the Core 3bDects bar contains a variety of Dob entry types.
Pentaho Data Integration TM Soon !ser "#ide
11
%esigning a $ransformation
%esigning a Dob
-
8/13/2019 Spoon 3 0 0 User Guide
12/265
$hese items are described in detail in the chapters belo@ -. %atabase Connections"7. ?ops"#0.
$ransformation *teps"#2. 9ob ntries"#(. 4raphical ie.
$.7. Co((and ine otions
$hese are the command line options that you can use hen starting the *poon application@
-file=filename
$his option runs the specified transformation B.ktr @ /ettle $ransformation.
-logfile=Logging Filename
$his option allos you to specify the location of the log file. $he default is the standard output.
-level=Logging Level
$he level option sets the log level for the transformation being run.
$hese are the possible values@
Nothing: %o not sho any output
Error: 3nly sho errors
Minimal: >se minimal logging
Basic: $his is the default basic logging level
Detailed: 4ive detailed logging output
Debug: *ho very detailed output for debugging purposes.
o!le"el: %etailed logging at a ro level. 'arning this ill generate a lot of data.
-rep=Repository name
Connect to the repository ith name MRepository nameN.
Note: Oou also need to specify the options Iuser" Ipass and Itrans described belo. $he repository
details are loaded from the file repositories.8ml in the local directory or in the /ettle directory@
?36
-
8/13/2019 Spoon 3 0 0 User Guide
13/265
>se this option to select the transformation to run from the repository.
-job=Job Name
>se this option to select the Dob to run from the repository.
#mportant Notes:
3n 'indos" e advise you to use the /option:valueformat to avoid command line parsing
problems by the 6*%3* shell.
!ields in italic represent the values that the options use.
)tVs important that if spaces are present in the option values" you use Huotes or double Huotes to
keep them together. $ake a look at the e8amples belo for more info.
Pentaho Data Integration TM Soon !ser "#ide
1$
-
8/13/2019 Spoon 3 0 0 User Guide
14/265
$.*. eositor
*poon provides you ith the ability to store transformation and Dob files to the local file system or in the
/ettle repository. $he /ettle repository can be housed in any common relational database. $his means that
in order to load a transformation from a database repository" you need to connect to this repository.$o do this" you need to define a database connection to this repository. Oou can do this using the
repositories dialog you are presented ith hen you start up *poon@
$he information concerning repositories is stored in a file called Mrepositories.8mlN. $his file resides in the
hidden directory M.kettleN in your default home directory. 3n indos this is C@S%ocuments and
*ettingsSTusernameUS.kettle
Note:$he complete path and filename of this file is displayed on the *poon console.
)f you donLt ant this dialog to be shon each time *poon starts up" you can disable it by unchecking theLPresent this dialog at startupL checkbo8 or by using the 3ptions dialog under the dit < 3ptions menu. *ee
also2.#-. 3ptions.
Note:$he default passord for theadminuser is also admin. Oou should change this default passord
right after the creation using the Repository 8plorer or the MRepositoryserN menu.
Pentaho Data Integration TM Soon !ser "#ide
1%
$he Repository login screen
-
8/13/2019 Spoon 3 0 0 User Guide
15/265
-
8/13/2019 Spoon 3 0 0 User Guide
16/265
$.10. Definitions
$.10.1. Transfor(ation Definitions
$alue:alues are part of a ro and can contain any type of data@ *trings" floating point :umbers"unlimited precision =ig:umbers" )ntegers" %ates or =oolean values.
o!: a ro e8ists of 0 or more values
Output stream:an output stream is a stack of ros that leaves a step.
#nput stream:an input stream is a stack of ros that enters a step.
%op:a hop is a graphical representation of one or more data streams beteen 2 steps. A hop
alays represents the output stream for one step and the input stream for another. $he
number of streams is eHual to the copies of the destination step. B# or more
Note:a note is a descriptive piece of information that can be added to a transformation
9ob %efinitions
&ob Entr':A Dob entry is one part of a Dob and performs a certain task
%op:a hop is a graphical representation of one or more data streams beteen 2 steps. A hop
alays represents the link beteen to Dob entries and can be set Bdepending on the type of
originating Dob entry to e8ecute the ne8t Dob entry unconditionally" after successful e8ecution or
failed e8ecution.
Note:a note is a descriptive piece of information that can be added to a Dob
Pentaho Data Integration TM Soon !ser "#ide
16
-
8/13/2019 Spoon 3 0 0 User Guide
17/265
$.11. Toobar
$he icons on the toolbar of the main screen are from left to right@
)con %escription
Create a ne Dob or transformation
3pen transformation
-
8/13/2019 Spoon 3 0 0 User Guide
18/265
$.12. Otions
/ettle options allo you to customiFe a number of properties related to the behavior and look and feel of
the graphical user interface. 8amples include startup options like hether or not to display tips and the
/ettle 'elcome Page" and user interface options like fonts and the colors. $o access the options dialog"
select ditW3ptions... from the menubar.
$.12.1. "enera Tab
!eature %escription6a8imum >ndo ,evel $his parameter sets the ma8imum number of steps that can be
undone Bor redone by *poon.
%efault number of lines in previe
dialog
$his parameter allos you to change the default number of
ros that are reHuested from a step during transformation
previes.
6a8imum nr of lines in the logging
indos
*pecify the ma8imum limit of ros to display in the logging
indo.
*ho tips at startup+ $his options sets the display of tips at startup.
*ho elcome page at startup+ $his option controls hether or not to display the elcome
page hen launching *poon.
Pentaho Data Integration TM Soon !ser "#ide
1*
3ptions 4eneral tab
-
8/13/2019 Spoon 3 0 0 User Guide
19/265
!eature %escription
>se database cache+ *poon caches information that is stored on source and target
databases. )n some cases this can lead to incorrect results
hen youVre in the process of changing those very databases.
)n those cases it is possible to disable the cache altogether
instead of clearing the cache every time.
NOTE:*poon automatically clears the database cache hen
you launch %%, B%ata %efinition ,anguage statements
toards a database connection. ?oever" hen using (rdparty
tools" clearing the database cache manually may be necessary.
3pen last file at startup+ nable this option to automatically Btry to load the last
transformation you used Bopened or saved from G6, or
repository.
Auto save changed files+ $his option automatically saves a changed transformation
before running.
3nly sho the active file in the main
tree+
$his option reduces the number of transformation and Dob
items in the main tree on the left by only shoing the currently
active file.
3nly save used connections to G6,+ $his option limits the G6, e8port of a transformation to the
used connections in that transformation. $his comes in handy
hile e8changing sample transformations to avoid having all
defined connections to be included.
Ask about replacing e8isting
connections on open
-
8/13/2019 Spoon 3 0 0 User Guide
20/265
!eature %escription
%isplay tootlips+ $his option controls hether or not to display tooltips for the
buttons on the main toolbar.
$.12.2. Loo3 4 5ee tab
!eature %escription
!i8ed idth font $his is the font that is used in the dialog bo8es" trees" input fields" etc.
!ont on orkspace $his is the font that is used on the graphical vie.
!ont for notes $his font is used in the notes that are displayed in the 4raphical ie.
=ackground color *ets the background color in *poon. )t affects all dialogs too.
'orkspace background
color
*ets the background color in the 4raphical ie of *poon.
$ab color $his is the color that is being used to indicate tabs that are
active
-
8/13/2019 Spoon 3 0 0 User Guide
21/265
!eature %escription
above the canvas.
%ialog middle percentage =y default" a parameter is dran at (X of the idth of the dialog"
counted from the left. Oou can change this ith this parameter.
Perhaps this can be useful in cases here you use unusually largefonts.
Canvas antialiasing+ *ome platforms like 'indos" 3*G and ,inu8 support antialiasing
through 4%)" Carbon or Cairo. Check this to enable smoother lines and
icons in your graph vie. )f you enable this and your environment
doesnLt ork any more afterards" change the value for option
MnableAntiAliasingN to M:N in file ?36
-
8/13/2019 Spoon 3 0 0 User Guide
22/265
$.1$. Search Meta data
$his option ill search in any available fields" connectors or notes of all loaded Dobs and transformations for
the string specified in the !ilter field. $he 6eta data search returns a detailed result set shoing the
location of any search hits. $his feature is accessed by choosing ditW*earch 6eta data from the menubar.
$.1%. Set eniron(ent ariabe
$he *et nvironment ariable feature allos you to e8plicitly create and set environment variables for the
current user session. $his is a useful feature hen designing transformations for testing variable
substitutions that are normally set dynamically by another Dob or transformation.
$his feature is accessible by choosing ditW*et nvironment ariable from the menubar.
Note: $his screen is also presented hen you run a transformation that use undefined variables. $his
allos you to define them right before e8ecution time.
Sho/ eniron(ent ariabes$his feature ill display the current list of environment variables and their values. )t is accessed by
selecting the ditW*ho environment variables option from the menubar.
Pentaho Data Integration TM Soon !ser "#ide
22
*earch 6eta data %ialog
*et nvironment ariable %ialog
-
8/13/2019 Spoon 3 0 0 User Guide
23/265
$.1&. 8ec#tion og histor
)f you have configured your 9ob or $ransformation to store log information in a database table" you can
vie the log information from previous e8ecutions by rightclicking on the Dob or transformation in the 6ain
$ree and selecting L3pen ?istory ieL. $his vie ill sho
NOTE:$he log history for a Dob or transformation ill also open by default each ne8t time you e8ecute the
file.
$.16. ea
$he Replay feature allos you to rerun a transformation that failed. Replay functionality is implemented for
$e8t !ile )nput and 8cel input. )t allos you to send files that had errors back to the source and have the
data corrected. 3:,O the lines that failed before are then processed during the replay if a .line file is
present. )t uses the date in the filename of the .line file to match the entered replay date.
Pentaho Data Integration TM Soon !ser "#ide
2$
$ransformation ?istory $ab
-
8/13/2019 Spoon 3 0 0 User Guide
24/265
$.17. "enerate (aing against target ste
)n cases here you have a fi8ed target table" you il l ant to map the fields from the stream to their
corresponding fields in the target output table. $his is normally accomplished using a *elect alues step in
your transformation. $he L4enerate mapping against targetL option provides you ith an easytouse dialog
for defining these mappings that ill automatically create the resulting *elect alues step that can bedropped into your transformation flo prior to the table output step.
$o access the L4enerate mapping against targetL option is accessed by rightclicking on the table output
step.
After defining your mappings" select 3/ and the *elect alues step containing your mappings ill appear on
the orkspace. *imply" attach the mapping step into your transformation immediatelyAttach the mapping
step into your transformation Dust before the table output step.
$.17.1. "enerate (aings e8a(e
?ere is an e8ample of a simple transformation in hich e ant to generate mappings to our target output
table@
=egin by rightclicking on the $able output step and selecting L4enerate mappings against targetL. Add all
necessary mappings using the 4enerate 6apping dialog shon above and click 3/. Oou ill no see a
$able output mapping step has been added to the canvas@
Pentaho Data Integration TM Soon !ser "#ide
2%
4enerate 6apping %ialog
*plit hop before generating mappings
-
8/13/2019 Spoon 3 0 0 User Guide
25/265
!inally" drag the generated $able output 6apping step into your transformation flo prior to the table
output step@
$.1*. Safe (ode
)n cases here you are mi8ing the ros from various sources" you need to make sure that these ro all
have the same layout in all conditions. !or this purpose" e added a Msafe modeN option that is available in
the *poon logging indo or on the 8ecute a $ransformation
-
8/13/2019 Spoon 3 0 0 User Guide
26/265
Pentaho Data Integration TM Soon !ser "#ide
26
$he elcome screen
-
8/13/2019 Spoon 3 0 0 User Guide
27/265
-
8/13/2019 Spoon 3 0 0 User Guide
28/265
&. Database Connections
A database connection describes the method by hich /ettle can connect to a database. Oou can create
connections specific to a 9ob or $ransformation or store them in the /ettle repository for reuse ithin
multiple transformations or Dobs.
&.1. Screen shot
&.2. Creating a ne/ database connection
$his section describes ho to create and create a ne database connection including a detailed description
of each connection property available in the Connection information dialog.
Oou begin creating a ne connection by rightclicking on the L%atabase ConnectionsL tree entry and
selecting L:eL or L:e Connection 'iFardL" by doubleclicking on L%atabase ConnectionsL" or simply by
pressing !(.
Pentaho Data Integration TM Soon !ser "#ide
2*
$he Connection information dialog
-
8/13/2019 Spoon 3 0 0 User Guide
29/265
$his ill launch the LConnection informationL dialog shon above. $he folloing topics describe the
configuration options available on each tab of the Connection information dialog.&.2.1. "enera
$he general tab is here you setup the basic information about your connection like the connection name"
type" access method" server name and login credentials. $he table belo provides a more detailed
description of the options available on the 4eneral tab@
!eature %escription
Connection :ame >niHuely identifies a connection across transformations and Dobs
Connection $ype $he type of database you are connecting to Bi.e. 6y*;," 3racle" etc.
6ethod of access $his ill be either :ative B9%=C" 3%=C" or 3C). Available access types are
dependent on the type of database you are connecting to
*erver host name %efines the host name of the server on hich the database resides. Oou can also
specify the host by )Paddress
%atabase name )dentifies the database name you ant to connect to. )n case of 3%=C" specify
the %*: name here
Port number *ets the $CPsername 3ptionally specifies the username to connect to the database
Passord 3ptionally specifies the passord to connect to the database
&.2.2. Pooing
$he pooling tab allos you to configure your connection to use connection pooling and define options
related to connection pooling like the initial pool siFe" ma8imum pool siFe and connection pool parameters.
$he table belo provides a more detailed description of the options available on the Pooling tab@
Pentaho Data Integration TM Soon !ser "#ide
2-
Creating a ne database connection
-
8/13/2019 Spoon 3 0 0 User Guide
30/265
!eature %escription
>se a connection pool Check this option to enable connection pooling.
$he initial pool siFe *ets the initial siFe of the connection pool.
$he ma8imum pool siFe. *ets the ma8imum number of connections in the connection pool.
Parameter $able Allos you to define additional custom pool parameters.
&.2.$. MS;L
=ecause by default" 6y*;, gives back complete Huery results in one block to the client B/ettle in this case
e had to enable Mresult streamingN by default. $he big draback of this is that it allos only # Bone single
Huery to be opened at any given time. )f you run into trouble because of that" you can disable this option in
the 6y*;, tab of the database connection dialog.
Another issue you might come across is that the default timeout in the 6y*;, 9%=C driver is set to 0. Bno
timeout $his leads to a problem in certain situations as it doesnLt allo /ettle to detect a server crash or
sudden netork failure if it happens in the middle of a Huery or open database connection. $his in turnleads to the infinite stalling of a transformation or Dob. $o solve this" set the Mconnect$imeoutN and
Msocket$imeoutN parameters for 6y*;, in the 3ptions tab. $he value to be specified is in milliseconds@ for
a 2 minute timeout you ould specify value #20000 B 2 8 0 8 #000 .
Oou can also revie other options on the linked 6y*;, help page by clicking on the L*ho help te8t on
option usageL button found on the 3ptions tab.
&.2.%. Orace
$his tab allos you to specify the default data and inde8 tablespaces hich /ettle ill use hen generating
*;, for 3racle tables and inde8es.
$his version of Pentaho %ata )ntegration ships ith the 3racle 9%=C driver version #0.2.0. )t is in general
the most stable and recent driver e could find. ?oever" if you do have issues ith 3racle connectivity or
other strange problems" you might ant to consider replacing the #0.2. 9%=C driver to match your database
server. Replace files MoDdbc#-.DarN and Morai#1n.DarN in the directory libe8t
-
8/13/2019 Spoon 3 0 0 User Guide
31/265
&.2.6. S;L Serer
$his tab allos you configure the folloing properties specific to 6icrosoft *;, *erver@
!eature %escription
*;, *erver instance name *ets the instance name property for the *;, *erver connection.
>se .. to separate schema and table nable hen using dot notation to separate schema and table.
3ther properties can be configured by adding connection parameters on the options tab of the Connection
information dialog. !or e8ample" you can enable single signon login by defining the domainoption on the
3ptions tab as shon belo@
!rom the D$%* !A; onhttp@
-
8/13/2019 Spoon 3 0 0 User Guide
32/265
&.2.7. S'P
-
8/13/2019 Spoon 3 0 0 User Guide
33/265
&.2.-. Otions
$his tab allos you to set database specific options for the connection by adding parameters to the
generated >R,. $o add a parameter" select the ne8t available ro in the parameter table" choose your
database type" then enter a valid parameter name and its corresponding value. !or more database specific
configuration help" click the Z*ho help te8t on option usageV button and a ne broser tab ill appear in*poon ith additional information about the configuring the 9%=C connection for the currently selected
database type@
&.2.10. S;L
$his tab allos you to enter a number of *;, commands immediately after connecting to the database.
$his is sometimes needed for various reasons like licensing" configuration" logging" tracing" etc.
&.2.11. C#ster
$his tab allos you to enable clustering for the database connection and create connections to the data
partitions. $o enable clustering for the connection" check the L>se Clustering+L option.
$o create a ne data partition" enter a partition )% and the hostname" port" database" username and
passord for connecting to the partition.
Pentaho Data Integration TM Soon !ser "#ide
$$
%isplay options help in a *poon broser
-
8/13/2019 Spoon 3 0 0 User Guide
34/265
&.2.12. 'danced
$his tab allos you configure the folloing properties for the connection@
!eature %escription
;uote all identifiers in database *pecifies the language to be used hen connecting to *AP.
!orce all identifiers to loer case *pecifies the system number of the *AP system to hich you ant to
connect.
!orce all identifiers to upper case *pecifies the three digit client number for the connection.
&.2.1$. Test a connection
$he L$estL button in the Connection information dialog allos you to test the current connection. An 3/
message ill be displayed if *poon is able to establish a connection ith the target database.
&.2.1%. 8ore
$he %atabase 8plorer allos you to interactively brose the target database" previe data" generate %%,
and much more. $o open the %atabase 8plorer for an e8isting connection" click the L8ploreL button found
on the Connection information dialog or rightclick on the connection in the 6ain tree and select L8ploreL.
Please see%atabase 8plorerfor more information.
&.2.1&. 5eat#re List
!eature list@ e8poses the 9%=C >R," class and various database settings for the connection such as the list
of reserved ords.
&.$. diting a connection
$o edit an e8isting connection" doubleclick on the connection name in the main tree or rightclick on the
connection name and select Ydit connectionY.
&.%. D#icate a connection
$o duplicate an e8isting connection" rightclick on the connection name and select Y%uplicateY.
&.&. Co to ciboard
Accessed by rightclicking on a connection name in the main tree" this option copies the G6, describing the
connection to the clipboard.
%elete a connection$o delete an e8isting database connection" rightclick on the connection name in the main tree and select
Y%eleteY.
&.6. 8ec#te S;L co((ands on a connection
$o e8ecute *;, command against an e8isting connection" rightclick on the connection name and select
Y*;, ditorY. *ee also*;, ditorfor more information.
Pentaho Data Integration TM Soon !ser "#ide
$%
-
8/13/2019 Spoon 3 0 0 User Guide
35/265
&.7. Cear D= Cache otion
$o speed up connections *poon uses a database cache. 'hen the information in the cache no longer
represents the layout of the database" rightclick on the connection in the 6ain tree and select the LClear %=
Cache...L option. $his is commonly used hen databases tables have been changed" created or deleted.
&.*. ;#oting'e had more and more people complain about the handling of reserved ords" field names ith spaces in
it" field names ith decimals B. in it" table names ith dashes and other special characters in it ... e
implemented a database specific Huoting system that allos you to pretty much use any name or character
that the database is comfortable ith.
Pentaho %ata )ntegration contains a list of reserved ords for many Bbut not all of the supported
databases. $o correctly implement Huoting" e had to go for a strict separation beteen the schema
Busersername 5
Passord
3racle :ative ReHuired 3racle database *)% ReHuired
B#2#
ReHuired
3%=C 3%=C %*: name ReHuired
3C) %atabase $:* name ReHuired
6y*;, :ative ReHuired 6y*;, database name 3ptional
B((0
3ptional
3%=C 3%=C %*: name 3ptional
A*
-
8/13/2019 Spoon 3 0 0 User Guide
36/265
%atabase Access 6ethod *erver :ame or )P
Address
%atabase :ame Port [
Bdefault
>sername 5
Passord
3%=C 3%=C %*: name ReHuired
Postgre*;, :ative ReHuired %atabase name ReHuired
B-(2
ReHuired
3%=C 3%=C %*: name ReHuired
)ntersystems
Cach\
:ative ReHuired %atabase name ReHuired
B#&72
ReHuired
3%=C 3%=C %*: name ReHuired
*ybase :ative ReHuired %atabase name ReHuiredB0
0#
ReHuired
3%=C 3%=C %*: name ReHuired
4upta *;, =ase :ative ReHuired %atabase :ame ReHuired
B2#
ReHuired
3%=C 3%=C %*: name ReHuired%base )))") or
.0
3%=C 3%=C %*: name 3ptional
!irebird *;, :ative ReHuired %atabase name ReHuired
B(00
ReHuired
3%=C 3%=C %*: name ReHuired
?ypersonic :ative ReHuired %atabase name ReHuired
B&00#
ReHuired
6a8%= B*AP %= :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
)ngres :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
=orland )nterbase :ative ReHuired %atabase name ReHuired
B(00
ReHuired
3%=C 3%=C %*: name ReHuired
8ten%= :ative ReHuired %atabase name ReHuired
B-(
ReHuired
3%=C 3%=C %*: name ReHuired
$eradata :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
3racle R%= :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
?2 :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
:eteFFa :ative ReHuired %atabase name ReHuired
B-10
ReHuired
Pentaho Data Integration TM Soon !ser "#ide
$6
-
8/13/2019 Spoon 3 0 0 User Guide
37/265
%atabase Access 6ethod *erver :ame or )P
Address
%atabase :ame Port [
Bdefault
>sername 5
Passord
3%=C 3%=C %*: name ReHuired
)=6 >niverse :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
*;,ite :ative ReHuired %atabase name ReHuired
3%=C 3%=C %*: name ReHuired
Apache %erby :ative optional %atabase name 3ptional
B#27
3ptional
3%=C 3%=C %*: name 3ptional
4eneric B] :ative ReHuired %atabase name ReHuired
BAny
ReHuired
3%=C 3%=C %*: name 3ptional
B] $he generic database connection also needs to specify the >R, and %river class in the 4eneric tabJ 'eno also allo these fields to be specified using a variable. $hat ay you can access data from multiple
database types using the same transformations and Dobs. 6ake sure to use clean A:*) *;, that orks on
all used database types in that case.
Pentaho Data Integration TM Soon !ser "#ide
$7
-
8/13/2019 Spoon 3 0 0 User Guide
38/265
-
8/13/2019 Spoon 3 0 0 User Guide
39/265
Note: )t is important that the information stored in this file in the simpleDndi directory mirrors the content
of your application server data sources.
&.11. !ns#orted databases
)f you ant to access a database type that is not yet supported" let us kno and e ill try to find a
solution. A fe database types are not supported in this release because of the lack of sample database
and
-
8/13/2019 Spoon 3 0 0 User Guide
40/265
6. S;L ditor
6.1. Descrition
$he *imple *;, ditor is an easytouse tool hen you need to e8ecute standard *;, commands for tasks
like creating tables" dropping inde8es and modifying fields. )n several places throughout *poon" the *;,
ditor is used to previe and e8ecute %%, B%ata %efinition ,anguage generated by *poon such as
Mcreate
-
8/13/2019 Spoon 3 0 0 User Guide
41/265
7. Database 8orer
7.1. Descrition
$he %atabase 8plorer provides the ability to e8plore configured database connections. )t currently
supports tables" vies and synonyms along ith the catalog and
-
8/13/2019 Spoon 3 0 0 User Guide
42/265
*. >os
*.1. Descrition
A hop connects one transformation step or Dob entry ith another. $he direction of the data flo is
indicated ith an arro on the graphical vie pane. A hop can be enabled or disabled Bfor testing purposes
for e8ample.
*.1.1. Transfor(ation >os
'hen a hop is disabled in a transformation" the steps donstream of the disabled hop are cut off from any
data floing upstream of the disabled hop. $his may lead to une8pected results hen editing the
donstream steps. !or e8ample" if a particular steptype offers a M4et !ieldsN button" clicking the button
may not reveal any of the incoming fields as long as the hop is still disabled.
*.1.2. 9ob >os
=esides the e8ecution order" it also specifies the condition on hich the ne8t Dob entry ill be e8ecuted. Oou
can specify the evaluation mode by right clicking on the Dob hop@
M>nconditionalN specifies that the ne8t Dob entry ill be e8ecuted regardless of the result of the
originating Dob entry.
M!ollo hen result is trueN specifies that the ne8t Dob entry ill only be e8ecuted hen the result
of the originating Dob entry as true" meaning successful e8ecution" file found" table found" ithout
error" evaluation as false" ...
M!ollo hen result is falseN specifies that the ne8t Dob entry ill only be e8ecuted hen the result
of the originating Dob entry as false" meaning unsuccessful e8ecution" file not found" table not
found" errorBs occurred" evaluation as false" ...
*.2. Creating ' >o
Oou can easily create a ne hop beteen 2 steps by one of the folloing options@
%ragging on the 4raphical ie beteen 2 steps hile using the middle mouse button.
%ragging on the 4raphical ie beteen 2 steps hile pressing the *?)!$ key and using the left
mouse button.
Pentaho Data Integration TM Soon !ser "#ide
%2
diting a 9ob ?opditing a $ransformation ?op
-
8/13/2019 Spoon 3 0 0 User Guide
43/265
*electing to steps in the tree" clicking right and selecting Yne hopY
*electing to steps in the graphical vie BC$R, _ left mouse click" right clicking on a step and
selecting Yne hopY
*plitting A ?op
Oou can easily insert a ne step into a ne hop beteen to steps by dragging the step Bin the 4raphicalie over a hop until the hop becomes dran in bold. Release the left button and you ill be asked if you
ant to split the hop. $his orks only ith steps that have not yet been connected to another step.
*.$. Loos
,oops are not alloed in transformations because *poon depends heavily on the previous steps to
determine the field values that are passed from one step to another. )f e ould allo loops in
transformations e often ould get endless loops and undetermined results.
,oops arealloed in Dobs because *poon e8ecutes Dob entries seHuentially. 9ust make sure
you donLt build endless loops. $his Dob entry can help you e8it closed loops based on the number of times a
Dob entry as e8ecuted.
*.%. Mi8ing ro/s? tra detector
6i8ing ros ith different layout is not alloed in a transformation. 6i8ing ro layouts ill cause steps to
fail because fields can not be found here e8pected or the data type changes une8pectedly.
$he Mtrap detectorN is in place to provide arnings at design time hen a step is receiving mi8ed layouts@
)n this case" the full error report reads@
'e detected ros ith varying number of fields" this is not alloed in a transformation. $he first
ro contained #( fields" another one contained # @ `customerQtkK0" versionK0" dateQfromK"
dateQtoK" C>*$36R:RK0" :A6K" !)R*$:A6K" ,A:4>A4K" 4:%RK" *$R$K"
?3>*:RK" =>*:RK" E)PC3%K" ,3CA$)3:K" C3>:$ROK" %A$Q3!Q=)R$?K
Note:this is only a arning and ill not prevent you from performing the task you ant to do.
Pentaho Data Integration TM Soon !ser "#ide
%$
-
8/13/2019 Spoon 3 0 0 User Guide
44/265
*.&. Transfor(ation ho coors
$ransformation hops display in a variety of colors based on the properties and state of the hop. $he
folloing table describes the meaning behind a transformation hopLs color@
?op Color 6eaning
4reen %istribute ros@ if multiple hops are leaving a step" ros of data ill be
evenly distributed to all target steps.
Red Copies ros@ if multiple hops are leaving a step" all ros of data ill be
copied to all target steps.
Oello Provides info for step" distributes ros
6agenta Provides info for step" copies ros
4ray $he hop is disabled.
=lack $he hop has a named target step.
=lue Candidate hop using middle button _ drag
3range B%ot line $he hop is never used because no data ill ever go there.
Red B=old %ot line $he hop is used for carrying ros that caused errors in source stepBs.
Pentaho Data Integration TM Soon !ser "#ide
%%
-
8/13/2019 Spoon 3 0 0 User Guide
45/265
-. @ariabes
-.1. @ariabe #sage
ariables can be used throughout Pentaho %ata )ntegration" including ithin transformation steps and Dob
entries. ariables can be defined by setting them ith the M*et ariableN step in a transformation or by
setting them in the
-
8/13/2019 Spoon 3 0 0 User Guide
46/265
-.2.2. ette ariabes
=ecause the scope of an environment variable B&.2.#.nvironment variables is too broad" /ettle variables
ere introduced to provide a ay to define variables that are local to the Dob in hich the variable is set.
$he M*et ariableN step in a transformation allos you to specify in hich Dob you ant to set the variableLs
scope Bi.e. parent Dob" grandparent Dob or the root Dob.
-.2.$. Interna ariabes
$he folloing variables are alays defined@
ariable :ame *ample value
)nternal./ettle.=uild.%ate 2007
-
8/13/2019 Spoon 3 0 0 User Guide
47/265
10. Transfor(ation Settings
10.1. Descrition
$ransformation *ettings are a collection of properties to describe the transformation and configure its
behavior. Access $ransformation *ettings from the main menu under $ransformationW*ettings. $he
folloing sections provides a detailed description of the available settings.
10.2. Transfor(ation Tab
$he transformation tab allos you to specify general properties about the transformation including@
*etting %escription
$ransformation name $he name of the transformation
ReHuired information if you ant to save to a repository
%escription *hort description of the transformation" shon in the repository e8plorer8tended description ,ong e8tended description of the transformation
*tatus %raft or production status
ersion ersion description
%irectory $he directory in the repository here the transformation is stored
Created by %isplays the original creator of the transformation.
Created at %isplays the date and time hen the transformation as created.
,ast modified by %isplays the user name of the last user that modified the transformation.
Pentaho Data Integration TM Soon !ser "#ide
%7
$ransformation *ettings
-
8/13/2019 Spoon 3 0 0 User Guide
48/265
*etting %escription
,ast modified at %isplays the date and time hen the transformation as last modified.
10.$. Logging
$he ,ogging tab allos you to configure ho and here logging information is captured. *ettings include@
*etting %escription
RA% log step >se the number of read lines from this step to rite to the log table. Read
means@ read from source steps.
):P>$ log step >se the number of input lines from this step to rite to the log table. )nput
means@ input from file or database.
'R)$ log step >se the number of ritten lines from this step to rite to the log table.
'ritten means@ ritten to target steps.
3>$P>$ log step >se the number of output lines from this step to rite to the log table.
3utput means@ output to file or database.
>P%A$ log step >se the number of updated lines from this step to rite to the log table.>pdate means@ updated in a database.
R9C$% log step >se the number of reDected lines from this step to rite to the log table.
ReDected means@ error record.
,og connection $he connection used to rite to a log table.
,og table specifies the name of the log table Bfor e8ample ,Q$,
>se =atch)%+ nable this if you ant to have a batch )% in the ,Q$, file. %isable for
backard compatibility ith *poonse logfield to store
logging in
$his option stores the logging te8t in a C,3= field in the logging table. $his
allos you to have the logging te8t together ith the run results in the same
table. %isable for backard compatibility ith *poonse this for e8ample" if you
find that the field %A$Q,A*$Q>P% has a ma8imum value of 200-02&
2(@00@00" but you kno that the values for the last minute are not complete.
)n this case" simply set the offset to 0.6a8imum date
difference
*ets the ma8imum date difference in the obtained date range. $his ill allo
you to limit Dob siFes.
10.&. Deendencies
$he %ependencies tab allos you to enter all of the dependencies for the transformation. !or e8ample" if a
dimension is depending on ( lookup tables" e have to make sure that these lookup tables have not
changed. )f the values in these lookup tables have changed" e need to e8tend the date range to force a
Pentaho Data Integration TM Soon !ser "#ide
%*
-
8/13/2019 Spoon 3 0 0 User Guide
49/265
full refresh of the dimension. $he dependencies allo you to look up hether a table has changed in case
you have a Mdata last changedN column in the table.
$he L4et dependencies buttonL ill try to automatically detect dependencies.
10.6. Misceaneo#s$he 6iscellaneous tab allos you to configure the folloing settings@
*etting %escription
:umber of ros in
rosets
$his option allos you to change the siFe of the buffers beteen the connected
steps in a transformation. Oou ill rarely
-
8/13/2019 Spoon 3 0 0 User Guide
50/265
11. Transfor(ation Stes
11.1. Descrition
A step is one part of a transformation. *teps can provide you ith a ide range of functionality ranging
from reading te8tfiles to implementing sloly changing dimensions. $his chapter describes various step
settings folloed by a detailed description of available step types.
11.2. La#nching seera coies of a ste
*ometimes it can be useful to launch the same step several times. !or e8ample" for performance reasons it
can be useful to launch a database lookup step ( times or more. $hat is because database connections
usually have a certain latency. ,aunching the same step several times keeps the database busy on different
connections" effectively loering the latency. Oou can launch several copies of step in a transformation
simply by rightclicking on a step in the graphical vie and then by selecting Mchange number of copies to
startN@
Oou ill get this dialog@
Pentaho Data Integration TM Soon !ser "#ide
&0
$he Y*tep copiesY popup menu
$he step copies dialog
-
8/13/2019 Spoon 3 0 0 User Guide
51/265
)f you enter ( this ill be shon@
)t is the technical eHuivalent of this@
Pentaho Data Integration TM Soon !ser "#ide
&1
6ultiple step copies eHuivalent
6ultiple step copies e8ample
-
8/13/2019 Spoon 3 0 0 User Guide
52/265
11.$. Distrib#te or co+
)n the e8ample above" green lines are shon beteen the steps. $his indicates that ros are distributed
among the target steps. )n this case" it means that the first ro coming from step MAN goes to step
Mdatabase lookup #N" the second to Mdatabase lookup 2N" the third to M%atabase lookup (N" the fourth back
to Mdatabase lookup #N" etc.
?oever" if e right click on step MAN" and select MCopy dataN" you ill get the hops dran in red@
MCopy dataN means that all ros from step MAN are copied to all ( the target steps.
)n this case it means that step M=N gets ( copies of all the ros that MAN has sent out.
NOTE:=ecause of the fact that all these steps are run as different threads" the order in hich the
single ros arrive at step M=N is probably not going to be the same as they left step MAN.
Pentaho Data Integration TM Soon !ser "#ide
&2
-
8/13/2019 Spoon 3 0 0 User Guide
53/265
11.%. Ste error handing
*tep error handling allos you to configure a step such that instead of halting a transformation hen an
error occurs" pass those ros that caused an error to a different step. $o configure error handling" right
click on the step and select M%efine rror handling...N.
)n the e8ample belo" e artificially generate an error in the *cript alues step hen an )% is higher than
.
$o configure the error handling" you can right click on the step involved and select the Mrror handing...N
menu item@
Pentaho Data Integration TM Soon !ser "#ide
&$
*tep error handling settings
-
8/13/2019 Spoon 3 0 0 User Guide
54/265
NOTE:this menu item only appears hen clicking on steps that support the ne error handling code.
As you can see" you can add e8tra fields being to the Merror rosN@
$his ay" e can easily define ne data flos in our transformations. $he typical usecase for this is an
alternative ay of doing an >psert B)nsertpdate@
Pentaho Data Integration TM Soon !ser "#ide
&%
-
8/13/2019 Spoon 3 0 0 User Guide
55/265
$his transformation performs an insert regardless of the content of the table. )f you put a primary key on
the )% Bin this case the customer )% the insert into the table cause an error. =ecause of the error handling
e can pass the ros in error to the update step. Preliminary tests have shon this strategy of doing
upserts to be ( times faster in certain situations. Bith a lo updates to inserts ratio
Pentaho Data Integration TM Soon !ser "#ide
&&
-
8/13/2019 Spoon 3 0 0 User Guide
56/265
11.&. 'ache @irt#a 5ie Sste( A@5SB s#ort
/ettle provides support for the Apache irtual !ile *ystem B!* as an additional ay to reference source
files" transformations and Dobs from any location you like. !or more information about !*" visitApache
Commons irtual !ile *ystem.
11.&.1. 8a(e? eferencing re(ote ob fies
?ere is a simple e8ample of using !* to reference the location of a Dob file e ant to e8ecute using
/itchen@
sh
-
8/13/2019 Spoon 3 0 0 User Guide
57/265
$his allos us to reference the transformation as follos@
Note:Oou ill not be able to save the Dob back to the eb server in this e8ample. $hat is not because e
do not support it" but because you donVt have the permission to do so.
!or more information on the almost endless list of possibilities ith !*" please visit@
http@
-
8/13/2019 Spoon 3 0 0 User Guide
58/265
11.6. Transfor(ation Ste Tes
11.6.1. Te8t 5ie In#t
11.6.1.1. "enera descrition
$he $e8t !ile )nput step is used to read date from a variety of different te8tfile types. $he most
commonly used formats include Comma *eparated alues BC* files generated by spreadsheets
and fi8ed idth flat files.
$he $e8t !ile )nput step provides the ability to specify a list of files to read" or a list of directories
ith ild cards in the form of regular e8pressions. )n addition" you can accept filenames from a
previous step making filename handling more even more generic.
$he folloing sections describe in detail the available options for configuring the $e8t fi le input
step.
11.6.1.2. 5ie otions
$he table belo provides a detailed descriptions of the features available on the !ile tab@
3ption %escription
!ile or directory $his field specifies the location and
-
8/13/2019 Spoon 3 0 0 User Guide
59/265
3ption %escription
*ho file content %isplays the content of the selected file.
*ho content from
first data line
%isplays the content from the first data line only for the selected file.
11.6.1.2.1. Selecting Files to read data from
$he file tab Bshon above is here you identify the file or files from hich you ant to read data.
$o specify a file@
#. nter the location of the file in the L!ile or directoryL field or click the =rose button to
brose the local file system.
2. Click the LAddL button to add a file to the list of Lselected filesL like this@
11.6.1.2.2. Selecting file using Regular Expressions
Oou can also have this step search for files by specifying a ild card in the form of a regular
e8pression. Regular e8pressions are more sophisticated than simply using L]L and L+L ild cards.
?ere are a fe e8amples of regular e8pressions@
!ilename Regular 8pression !iles selected
-
8/13/2019 Spoon 3 0 0 User Guide
60/265
$his option allos even more fle8ibility in combination ith other steps like M4et !ilenamesN. Oou
can construct your filename and pass it to this step. $his ay the filename can come from any
source@ te8t file" database table" etc.
3ption %escription
Accept filenames fromprevious steps
$his enables the option to get filenames from previous steps.
*tep to read filenames
from
$he step to read the filenames from
!ield in the input to use
as filename
$e8t !ile )nput ill look in this step to determine the filenames to use.
Pentaho Data Integration TM Soon !ser "#ide
60
-
8/13/2019 Spoon 3 0 0 User Guide
61/265
11.6.1.$. Content secification
$he content tab allos you to specify the format of the te8t files that are being read. ?ere is a list
of the options on this tab@
3ption %escription!ile type $his can be either C* or !i8ed length. =ased on this selection" *poon
ill launch a different helper 4>) hen you press the Mget fieldsN
button in the last MfieldsN tab.
*eparator 3ne or more characters that separate the fields in a single line of te8t.
$ypically this is ^ or a tab.
nclosure *ome fields can be enclosed by a pair of strings to allo separator
characters in fields. $he enclosure string is optional. )f you use repeat
an enclosures allo te8t line Not the $i$e oNNcloc< $e%s.N.
'ith L the enclosure string" this gets parsed as ot the $i$e
oNcloc< $e%s.
Allo breaks in enclosed
fields+
$his is an e8perimental feature hich is currently disabled.
Note: $his functionality is implemented and available in the C* )nput
*tep.
scape *pecify an escape character Bor characters if you have escaped
characters in your data. )f you have S as an escape character" the te8t
Not the $i$e oFNcloc< $e%s.L Bith L the enclosure ill get
parsed as ot the $i$e oNcloc< $e%s.
?eader 5 number of header
lines
nable this option if your te8t file has a header ro. B!irst lines in the
file Oou can specify the number of times the header lines appears.
!ooter 5 number of footer
lines
nable this option if your te8t file has a footer ro. B,ast lines in the
file Oou can specify the number of times the footer ro appears.
'rapped lines 5 number of
raps
>se this if you deal ith datalines that have rapped beyond a
certain page limit. :ote that headers 5 footers are never considered
rapped.
Paged layout 5 page siFe 5
doc header
Oou can use these options as a last resort hen dealing ith te8ts
meant for printing on a line printer. >se the number of document
header lines to skip introductory te8ts and the number of lines per
page to position the data lines.
Compression nable this option if your te8t file is placed in a Eip or 4Eip archive.
NOTE@At the moment" only the first file in the archive is read.
:o empty ros %onLt send empty ros to the ne8t steps.
)nclude filename in output nable this if you ant the filename to be part of the output.
!ilename field name $he name of the field that contains the filename.
Ronum in output+ nable this if you ant the ro number to be part of the output.
Ro number field name $he name of the field that contains the ro number.
Ronum by file+ Allos the ro number to be reset per file.
Pentaho Data Integration TM Soon !ser "#ide
61
-
8/13/2019 Spoon 3 0 0 User Guide
62/265
3ption %escription
!ormat $his can be either %3*" >:)G or mi8ed. >:)G files have lines that are
terminated by line feeds. %3* files have lines separated by carriage
returns and line feeds. )f you specify mi8ed" no verification is done.
ncoding *pecify the te8t file encoding to use. ,eave blank to use the default
encoding on your system. $o use >nicode specify >$!1 or >$!#.
3n first use" *poon ill search your system for available encodings.
,imit *ets the number of lines that is read from the file. 0 means@ read all
lines.
=e lenient hen parsing
dates+
%isable this option if you ant strict parsing of data fields. )n case
lenient parsing is enabled" dates like 9an (2nd ill become !eb #st.
$he date format ,ocale $his locale is used to parse dates that have been ritten in full like
M!ebruary 2nd" 200N. Parsing this date on a system running in the
!rench BfrQ!R locale ould not ork because !ebruary ould be
called !\vrier in that locale.
Pentaho Data Integration TM Soon !ser "#ide
62
-
8/13/2019 Spoon 3 0 0 User Guide
63/265
11.6.1.%. rror handing
$he error handling tab as added to allo you to specify ho this step should react hen errors
occur. $he table belo describes the options available for rror handling@
3ption %escription
)gnore errors+ Check this option if you ant to ignore errors during parsing*kip error lines nable this option if you ant to skip those lines that contain errors.
:ote that you can generate an e8tra file that ill contain the line
numbers on hich the errors occurred. )f lines ith errors are not
skipped" the fields that did have parsing errors" ill be empty Bnull
rror count field name Add a field to the output stream ros. $his field ill contain the number
of errors on the line.
rror fields field name Add a field to the output stream ros. $his field ill contain the field
names on hich an error occurred.
rror te8t field name Add a field to the output stream ros. $his field ill contain the
descriptions of the parsing errors that have occurred.
'arnings file directory 'hen arnings are generated" they ill be put in this directory. $hename of that file ill be Tarning
dirU
-
8/13/2019 Spoon 3 0 0 User Guide
64/265
11.6.1.&. 5iters
$he filters tab provides the ability to specify the lines you ant to skip in the te8t file.
$he table belo describes the available options for defining filters@
3ption %escription
!ilter string $he string to look for.
!ilter position $he position here the filter string has to be at in the line. 0 is the first
position in the line. )f you specify a value belo 0 here" the filter string is
searched for in the entire string.
*top on filter *pecify O here if you ant to stop processing the current te8t file hen the
filter string is encountered.
Pentaho Data Integration TM Soon !ser "#ide
6%
*pecifying te8t file filters
-
8/13/2019 Spoon 3 0 0 User Guide
65/265
11.6.1.6. 5ieds
$he fields tab is here you specify the information about the name and format of the fields being
read from the te8t file. Available options include@
3ption %escription
:ame name of the field$ype $ype of the field can be either *tring" %ate or :umber
!ormat *ee:umber !ormats for a complete description of format symbols.
,ength !or :umber@ $otal number of significant figures in a number^
!or *tring@total length of string^
!or %ate@length of printed output of the string Be.g. - only gives back the year.
Precision !or :umber@:umber of floating point digits^
!or *tring" %ate" =oolean@unused^
Currency used to interpret numbers like #0"000.00 or .000"00
%ecimal A decimal point can be a Y.Y B#0^000.00 or Y"Y B.000"00
4rouping A grouping can be a dot Y"Y B#0^000.00 or Y.Y B.000"00
:ull if treat this value as :>,,%efault $he default value in case the field in the te8t file as not specified. Bempty
$rim type trim this field Bleft" right" both before processing
Repeat Osed to Huote special characters in a prefi8 or suffi8"
for e8ample" YL[L[Y formats #2( to Y[#2(Y. $o
create a single Huote itself" use to in a ro@ Y[
oLLclockY.
Pentaho Data Integration TM Soon !ser "#ide
6&
http://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.htmlhttp://java.sun.com/j2se/1.4.2/docs/api/java/text/DecimalFormat.html -
8/13/2019 Spoon 3 0 0 User Guide
66/265
*cientific :otation
)n a pattern" the e8ponent character immediately folloed by one or more digit characters
indicates scientific notation. 8ample@ Y0.[[[0Y formats the number #2(- as Y#.2(-(Y.
11.6.1.6.2. ate formats
$he information on %ate formats as taken from the *un 9ava AP) documentation" to be found
here@ http@
-
8/13/2019 Spoon 3 0 0 User Guide
67/265
11.6.2. Tabe in#t
11.6.2.1. "enera descrition
$his step is used to read information from a database" using a connection and *;,. =asic *;,
statements are generated automatically.
11.6.2.2. Otions3ption %escription
*tep name :ame of the step. $his name has to be uniHue in a single transformation.
Connection $he database connection used to read data from.
*;, $he *;, statement used to read information from the database
connection. Oou can also click the L4et *;, select statement...L button to
brose tables and automatically generate a basic select statement.
nable laFy
conversion
,aFy conversion ill avoid unnecessary data type conversio