data stage
DESCRIPTION
datastage Real time question and answer and stagesTRANSCRIPT
![Page 1: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/1.jpg)
What is Compiler in Datastage | Compilation process in datastageCompilation is the process of converting the GUI into its machine code .That is nothing but machine understandable language. In this process it will checks all the link requirements, stage mandatory property values, and if there any logical errors.And Compiler produces OSH Code.
What is Modeling of Datastage | Modeling of Datastage
Modeling is a Logical and physical representation of Source system. Modeling have two types of Modeling Tools They are ERWIN AND ER-STUDIO In Source System there will be a ER-Model andin the Target system there will be a ER-Model and Dimensional Model Dimension:- The table which was designed for the client perspective. We can see in many ways in the Dimension tables. And there are two types of Models. They are Forward Engineering (F.E) Reverse Engineering (R.E) F.E:- F.E is the process starting the process from the scratch for banking sector. Ex: Any Bank which was required Datawarehouse. R.E:- R.E is the process altering existing model for another bank.
Advantages of Datamart
Datamart is the access layer of the datawarehouse environment. That means we create datamart to retrieve the data to the users faster. The Datamart is the subset of Datarehouse. That means all the data available in the datamart will be available in datarehouse. This Datamart will be created for the purpose of specific business. ( For example telecom Database or banking Database etc) There are many reasons to create Datamart.There is lot of importance of Datamart and advantages. It is easy to access frequently needed data from the database when reuired by the client. We can give access to group of users to view the Datamart when it is required. Ofcourse performance will be good. It is easy to maintain and to create the datamart. It will be related to specific business. And It is low cost to create a datamart rather than creating datarehouse with a huge space.
What are the types of Errors in Datastage.
You may get many errors in datastage while compiling the jobs or running the jobs. Some of the errors are as follows a)Source file not found. :-If you are trying to read the file, which was not there with that name. b)Some times you may get Fatal Errors. c) Data type mismatches.:-This will occur when data type mismaches occurs in the jobs. d) Field Size errors. e) Meta data Mismach f) Data type size between source and target different g) Column Mismatch i) Pricess time out. :-If server is busy. This error will come some time.
What are the client components in Datastage 7.5X2 Version
In Datastage 7.5X2 Version, they are 4 client Components. They are 1) Datastage Designer 2) Datastage Director 3) Datastage Manager 4) Datastage Admin In Datastage Designer, We Create the Jobs Compile the Jobs Run the Jobs In Director, We can
![Page 2: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/2.jpg)
View the Jobs View the Logs Batch Jobs Unlock Jobs Scheduling Jobs Monitor the JOBS Message Handling In Manager , We can Import & Export the Jobs Node Configuration And by using Admin , We can Create the Projects Organize the Projects Delete the Projects
Server Components of Datastage 7.5x2 Version
There are three Architecture Components in datastage 7.5x2They are a) Repository b)Server( Engine ) c) Datastage Package InstallerRepository:-- Repository is an environment where we create job, design, compile and run etc. Some Components it contains are JOBS,TABLE DEFINITIONS,SHARED CONTAINERS, ROUTINES ETC Server( engine):-- Here it runs executable jobs that extract , transform, and load data into a datawarehouse. Datastage Package Installer:-- It is a user interface used to install packaged datastage jobs and plugins.
What is Fact Tables in Datawarehousing ? Give Example?
Features of Datastage | Datastage Feautures
Datastage Features are 1) Any to Any ( Any Source to Any Target ) 2) Platform Independent. 3) Node Configuration. 4) Partition Parallelism. 5) Pipeline Parallelism. 1)Any to Any:-That means Datastage can Extrace the data from any source and can loads the data into the any target. 2) Platform Independent:-The Job developed in the one platform can run on the any other platform. That means if we designed a job in the Uni level processing, it can be run in the SMP machine. 3 )Node Configuration Node Configuration is a technique to create logical C.P.U Node is a Logical C.P.U 4)Partition Parallelism Partition parallelim is a technique distributing the data across the nodes based on the partition techniques. Partition Techniques are a) Key Based b) Key Less a) Key based Techniques are 1 ) Hash 2)Modulus 3) Range 4) DB2 b) Key less Techniques are 1 ) Same 2) Entire 3) Round Robin 4 ) Random 5) Pipeline Parallelism Pipeline Parallelism is the process, the extraction, transformation and loading will be occurred simultaneously. Re- Partitioning: The distribution of distributed data is Re-Partitioning. Reverse Partitioning: Reverse Partitioning is called as Collecting. Collecting methods are Ordered
![Page 3: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/3.jpg)
Round Robin Sort Merge Auto
What is OLTP ? Uses of OLTP?
OLTP is nothing but Online Transaction Processing. It will be characterized by a large number of short online transactions. The main emphasis for OLTP system is to put on very fast query processing.In order to get the data faster to the end-users. And we use the Online transaction process for the fast process. And Oltp system is used for data integrity in multi access environments, and effectiveness measured by number of transactions per second.
How does Hash Partition works in Datastage
Hash Partition technique is used to send the rows with same key column values to the same partition. In Datastage, partition techniques are usually distributed into two types. They are a) Key based partition Technique b) Key Less Partition Technique In Key based partition technique a)Hash Partition Technique b)Modulus c)Range d)DB2 In Key less partition technique, they are a)Same b)Entire c)Round Robin d)Random Example for Hash partition technique: If we use a hash partition technique and if we have a sample data as below e_id,dept_no 1,10 2,10 3,20 4,20 5,30 6,40 Data will partitioned as below In 1st partition 10,10 In 2nd Partition 20,20 In 3rd Partition 30 In 4th Partition 40
How to create group id in Sort Stage in Datastage
Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source.
![Page 4: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/4.jpg)
If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's
To do the sorting without sorting stage.
Take You can do it as normal process first as follows. If we want to read the data using Sequential File Design as follows : ------------- Seq. File ------------------------Dataset File To read the data in Sequential file Open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name.
Now for the sorting process. In the Target Open Dataset properties And go to Partitioning ---- Select Partitioning type as Hash In Available Columns Select Key Column ( E_Id for EXAMPLE) to be sorted. Click Perform Sort Click Ok Compile And Run The data will be Sorted inthe Target.
How to Remove Special Characters data and load rest of the data
Here we are going to know how to remove Special characters data rows and load rest of the rows into the target. Some times we get the data with special characters added for some of the rows. If we like to remove those special characters mixed rows in the column. We can use Alpha function. Alpha Function is used for this Job. If we use "Alpha" function. It will drop the special characters mixed rows and loads the rest of the rows into the target. So you can take sequential file to read the data and you can take Transformer stage to apply business logic. In Transformer stage in Constrain you can write the Alpha function. And Drag and Drop into the Target. . Then Compile and Run.
Alpha Function
Checks if a string is alphabetic. If NLS is enabled, the result of this function is dependent on the current locale setting of the Ctype convention
Syntax
Alpha (string)
string is the string or expression you want to evaluate.
Remarks
Alphabetic strings contain only the characters a through z or A through Z. Alpha returns 1 if the string is alphabetic, a null value if the string is a null value, and 0 otherwise.
Examples
These examples show how to check that a string contains only alphabetic characters:
![Page 5: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/5.jpg)
Column1 = "ABcdEF%" * the "%" character is non-alphaColumn2 = (If Alpha(Column1) Then "A" Else "B") * Column2 set to "B"Column1 = "" * note that the empty string is non-alphaColumn2 = (If Alpha(Column1) Then "A" Else "B") * Column2 set to "B"
DataStage Functions
Following is the list of DataStage String Functions
Function Usage
AlNum Cab be used to check if the given string has alphanumeric characters
Alpha TRUE if string is completely alphabetic
CompactWhiteSpace all consective whitespace will be reduced to single space
Compare Compares two strings for sort
ComparNoCase Compare two strings irrespective of Case in-sensitiveness
ComparNum Compare the first n characters of the two strings
CompareNumNoCase Compare first n characters of the two strings irrespective of case in-sensitiveness
Convert Replace character in a string with the given character.
Count Count number of times a given substring occurs in a string
Dcount Returns count of delimited fields in a string
DownCase Change all uppercase letters in a string to lowercase
DQuote Enclose a string in double quotation marks
Field Return 1 or more delimited substrings
Index Find starting character position of substring
Left Finds leftmost n characters of string
Len Length of the string or total number of characters in a string
![Page 6: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/6.jpg)
Function Usage
Num Return 1 if string can be converted to a number
PadString Return the string padded with the optional pad character and optionallength
Right Finds Rightmost n characters of string
Soundex Returns a string which identifies a set of words that are phonetically similar
Space Return a string of N space characters
Squote Covers a string into single quotation marks
Str Repeat a string
StripWhiteSpace Return the string after removing all whitespace
Trim Remove all leading and trailing spaces and tabs. Also reduce the internal occurrences of spaces and tabs into one.
TrimB Remove all trailing spaces and tabs
TrimF Remove all leading spaces and tabs
Trim Returns a string with leading and trailing whitespace removed
Upcase Change all lowercase letters in a string to uppercase
Surrogate Key in Datastage
Surrogate Key is a unique identification key. It is alternative to natural key . And in natural key, it may have alphanumeric composite key but the surrogate is always single numeric key. Surrogate key is used to generate key columns, for which characteristics can be specified. The surrogate key generates sequential incremental and unique integers for a provided start point. It can have a single input and a single output link.
What is Lookup stage? Uses of Look up stage?The Look up stage is a processing stage that performs lookup operations.
It performs on dataset read into memory from any other parallel job stage that can output data. The main uses of the lookup stage is to map short codes in the input dataset onto expanded information from a look up table which is then joined to the data coming from input. For example, some we get the data with customers name and address. Here the
![Page 7: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/7.jpg)
data identifies state as a two letters or three letters like mel for melbourne or syd for sydney. But you want the data to carry the full name of the state by defining the code as the key column. In this case lookup stage used very much. It will reads each line, it uses the key to look up the stage in the lookup table. It adds the state to the new column defined for the output link. So that full state name is added to the each row based on codes given. If the code not found in the lookup table, record will be rejected. Lookup stage also performs to validate the row. Look Up stage is a processing stage which performs horizontal combining. Lookup stage Supports N-Inputs ( For Norman Lookup ) 2 Inputs ( For Sparse Lookup) 1 output And 1 Reject link Up to Datastage 7 Version We have only 2 Types of LookUps a) Normal Lookup and b) Sparse Lookup But in Datastage 8 Version, enhancements has been take place. They are c) Range Look Up And d) Case less Look up Normal Lookup:-- In Normal Look, all the reference records are copied to the memory and the primary records are cross verified with the reference records. Sparse Lookup:--In Sparse lookup stage, each primary records are sent to the Source and cross verified with the reference records. Here , we use sparse lookup when the data coming have memory sufficiency and the primary records is relatively smaller than reference date we go for this sparse lookup. Range LookUp:--- Range Lookup is going to perform the range checking on selected columns. For Example: -- If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup.
In Lookup stage properties, you will have constraints option. If you click on constraints button- you will get options like continue, drop, fail and rejectIf you select the option continue: it means left outer join operation will be performed.If you select the option drop: it means inner join operation will be performed.
Examples
Aggregator Stage and Filter Stage with example
If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as
![Page 8: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/8.jpg)
Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 -----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets.
Funnel Stage with Real time example
Some times we get data in multiple files which belongs to same bank customers information.
In that time we need to funnel the tables to get the multiple files data into the single file.( table)
For Example , if we have the data two files as below
xyzbank1
e_id,e_name,e_loc
111,tom,sydney
222,renu,melboourne
333,james,canberra
444,merlin,melbourne
xyzbank2
e_id,e_name,e_loc
555,,flower,perth
666,paul,goldenbeach
777,raun,Aucland
888,ten,kiwi
For Funnel take the Job design as
Read and Load the data into two sequential files.
Go to Funnel stage Properties and
Select Funnel Type = Continous Funnel
( Or Any other according to your requirement )
Go to output Drag and drop the Columns
( Remember Source Columns Stucture Should be same ) Then click ok
Give file name for the target dataset then
compile and run th job
![Page 9: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/9.jpg)
Column Generator with sample data example
Column Generator is a development stage/ generating stage that is used to generate column
with sample data based on user defined data type .
Take Job Design as
Seq.File--------------Col.Gen------------------Ds
Take source data as a
xyzbank
e_id,e_name,e_loc
555,flower,perth
666,paul,goldencopy
777,james,aucland
888,cheffler,kiwi
In order to generate column ( for ex: unique_id)
First read and load the data in seq.file
Go to Column Generator stage -- Properties -- Select column method as explicit
In column to generate = give column name ( For ex: unique_id)
In Output drag and drop
Go to column write column name and you can change data type for unique_id in sql type and
can give length with suitable name
Then compile and Run
Basic Job Example for Sequential Stage to Datastage
This is the basic job for Datastage Learners. You can understand, how we can read the data and how we can load the data into the target. If we want to read the data using Sequential File Design as follows : ------------- Seq. File ------------------------Dataset File To read the data in Sequential file open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name. Now Compile And run That's it You will get the output.
Transformer Stage to filter the data
If our requirement is to filter the data department wise from the file below
samp_tabl 1,sam,clerck,10 2,tom,developer,20 3,jim,clerck,10 4,don,tester,30 5,zeera,developer,20 6,varun,clerck,10 7,luti,production,40 8,raja,priduction,40 And our requirement is to get the target data as below In Target1 we need 10th & 40th dept employees. In Target2 we need 30th dept employees. In Target1 we need 20th & 40th dept employees. Take Job Design as below
![Page 10: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/10.jpg)
Read and Load the data in Source file In Transformer Stage just Drag and Drop the data to the target tables. Write expression in constraints as below dept_no=10 or dept_no= 40 for table 1 dept_no=30 for table 1 dept_no=20 or dept_no= 40 for table 1 Click ok Give file name at the target file and compile and Run the Job to get the Output
Multiple Join stages to join three tables
If we have three tables to join and we don't have same key column in all the tables to join the tables using one join stage. In this case we can use Multiple join stages to join the tables. You can take sample data as below soft_com_1 e_id,e_name,e_job,dept_no 001,james,developer,10 002,merlin,tester,20 003,jonathan,developer,10 004,morgan,tester,20 005,mary,tester,20 soft_com_2 dept_no,d_name,loc_id 10,developer,200 20,tester,300 soft_com_3 loc_id,add_1,add_2 10,melbourne,victoria 20,brisbane,queensland Take Job Design as below
Read and load the data in three sequential files. In first Join stage , Go to Properties ----Select Key column as Deptno and you can select Join type = Inner Drag and drop the required columns in Output Click Ok In Second Join Stage Go to Properties ---- Select Key column as loc_id
![Page 11: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/11.jpg)
and you can select Join type = Inner Drag and Drop the required columns in the output Click ok Give file name to the Target file, That's it Compile and Run the Job
Aggregator Stage and Filter Stage with example
If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as
Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 ----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets.
How to create group id in Sort Stage in Datastage
Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source. If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's
Merge Stage Example
Merge Stage is a Processing Stage which is used to perform the horizontal combining. This is one of the stage to perform this operation like Join stage and Lookup Stage. Only the difference between the stages are size variance an Input requirements between them. Example for Merge Stage
![Page 12: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/12.jpg)
Sample Tables MergeStage_Master cars,ac,tv,music_system BMW,avlb,avlb,Adv Benz,avlb,avlb,Adv Camray,avlb,avlb,basic Honda,avlb,avlb,medium Toyota,avlb,avlb,medium Mergestage_update1 cars,cooling_glass,CC BMW,avlb,1050 Benz,avlb,1010 Camray,avlb,900 Honda,avlb,1000 Toyota,avlb,950 MergeStage Update2 cars,model,colour BMW,2008,black Benz,2010,red Camray,2009,grey Honda,2008,white Toyota,2010,skyblue Take Job Design as below
Read and load the Data into all the input files.
In Merge Stage Take cars as Key column. In Output Column Drag and Drop all the columns to the output files.
![Page 13: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/13.jpg)
Give File name to the Target/Output file and If you want you can give reject links (n-1)
Compile and Run the Job to get the required output.
Look up stage with example
Look Up stage is a processing stage and used to perform lookup operations and to map short codes in the input dataset into expanded information from a lookup table which is than joined to the incoming data and output. For example if we have the primary data as below. Table1 e_id,ename,e_state 100,sam,qld 200,jammy,vic 300,tom,Tas 400,putin,wa table1Ref e_state,full_state qld,queensland vic,victoria Take the job design as below
Read and load the two tables in sequential files. Go to lookup stage and Drag and drop the primary columns to the output. And Join e_state from primary table to the e_state in reference table And drag and drop the Full_state to the output. In properties select lookup failure as drop Now click ok Give Target file name and Compile & Run the Job here
What To Choose Join Stage or Lookup stage in Datastage
How to choose the stages.
Join stage or Lookup stage
We need to be careful when selecting the stages. We need to think about the
performance of the Job before selecting the stages. Time is more precious to the
![Page 14: Data Stage](https://reader036.vdocuments.us/reader036/viewer/2022082410/547d5efab4af9f5c258b45e5/html5/thumbnails/14.jpg)
clients. That's why we need to get the Job for very less time. We need to try
our best to get good performance to the Job.
Both the stages Join stage and Look up stage performs same thing. That is they
combine the tables we have. But why Lookup stage has been introduced.
Look Up Stage have some extra benefits which will not come with the Join stage.
Look up stage doest not required the data to be sorted. Sorting is mandatory with
the Join stage. In Look Up stage the columns with different column names can be
joined as well where it is not possible in the Join stage. That means Join stage,
the column name must be similar.
A Look Up Stage supports reject links , if our required demands reject links we
can’t go with Join stage. Because Join stage doesn’t supports Reject Links. And
Lookup stage has an option to fail the Job if the look up fails. It will be useful
when the look up stage is expected to be successful.
Look up stage keeps the reference data into the memory which yields better
performance for smaller volume of data. If you have large amount of data, you
need to go with Join stage.