data stage

17
What is Compiler in Datastage | Compilation process in datastage Compilation is the process of converting the GUI into its machine code .That is nothing but machine understandable language. In this process it will checks all the link requirements, stage mandatory property values, and if there any logical errors.And Compiler produces OSH Code. What is Modeling of Datastage | Modeling of Datastage Modeling is a Logical and physical representation of Source system. Modeling have two types of Modeling Tools They are ERWIN AND ER-STUDIO In Source System there will be a ER-Model andin the Target system there will be a ER-Model and Dimensional Model Dimension:- The table which was designed for the client perspective. We can see in many ways in the Dimension tables. And there are two types of Models. They are Forward Engineering (F.E) Reverse Engineering (R.E) F.E:- F.E is the process starting the process from the scratch for banking sector. Ex: Any Bank which was required Datawarehouse. R.E:- R.E is the process altering existing model for another bank. Advantages of Datamart Datamart is the access layer of the datawarehouse environment. That means we create datamart to retrieve the data to the users faster. The Datamart is the subset of Datarehouse. That means all the data available in the datamart will be available in datarehouse. This Datamart will be created for the purpose of specific business. ( For example telecom Database or banking Database etc) There are many reasons to create Datamart.There is lot of importance of Datamart and advantages. It is easy to access frequently needed data from the database when reuired by the client. We can give access to group of users to view the Datamart when it is required. Ofcourse performance will be good. It is easy to maintain and to create the datamart. It will be related to specific business. And It is low cost to create a datamart rather than creating datarehouse with a huge space. What are the types of Errors in Datastage. You may get many errors in datastage while compiling the jobs or running the jobs. Some of the errors are as follows a)Source file not found. :- If you are trying to read the file, which was not there with that name. b)Some times you may get Fatal Errors. c) Data type mismatches.:-This will occur when data type mismaches occurs in the jobs.

Upload: shivnat

Post on 02-Dec-2014

187 views

Category:

Documents


6 download

DESCRIPTION

datastage Real time question and answer and stages

TRANSCRIPT

Page 1: Data Stage

What is Compiler in Datastage | Compilation process in datastageCompilation is the process of converting the GUI into its machine code .That is nothing but machine understandable language. In this process it will checks all the link requirements, stage mandatory property values, and if there any logical errors.And Compiler produces OSH Code. 

What is Modeling of Datastage | Modeling of Datastage

Modeling is a Logical and physical representation of Source system. Modeling have two types of Modeling Tools They are ERWIN AND ER-STUDIO In Source System there will be a ER-Model andin the Target system there will be a ER-Model and Dimensional Model Dimension:- The table which was designed for the client perspective. We can see in many ways in the Dimension tables. And there are two types of Models. They are Forward Engineering (F.E) Reverse Engineering (R.E) F.E:- F.E is the process starting the process from the scratch for banking sector. Ex: Any Bank which was required Datawarehouse. R.E:- R.E is the process altering existing model for another bank.

Advantages of Datamart

Datamart is the access layer of the datawarehouse environment. That means we create datamart to retrieve the data to the users faster. The Datamart is the subset of Datarehouse. That means all the data available in the datamart will be available in datarehouse. This Datamart will be created for the purpose of specific business. ( For example telecom Database or banking Database etc) There are many reasons to create Datamart.There is lot of importance of Datamart and advantages. It is easy to access frequently needed data from the database when reuired by the client. We can give access to group of users to view the Datamart when it is required. Ofcourse performance will be good. It is easy to maintain and to create the datamart. It will be related to specific business. And It is low cost to create a datamart rather than creating datarehouse with a huge space. 

What are the types of Errors in Datastage.

You may get many errors in datastage while compiling the jobs or running the jobs. Some of the errors are as follows a)Source file not found. :-If you are trying to read the file, which was not there with that name. b)Some times you may get Fatal Errors. c) Data type mismatches.:-This will occur when data type mismaches occurs in the jobs. d) Field Size errors. e) Meta data Mismach f) Data type size between source and target different g) Column Mismatch i) Pricess time out. :-If server is busy. This error will come some time. 

What are the client components in Datastage 7.5X2 Version

In Datastage 7.5X2 Version, they are 4 client Components. They are 1) Datastage Designer 2) Datastage Director 3) Datastage Manager 4) Datastage Admin In Datastage Designer, We Create the Jobs Compile the Jobs Run the Jobs In Director, We can 

Page 2: Data Stage

View the Jobs View the Logs Batch Jobs Unlock Jobs Scheduling Jobs Monitor the JOBS Message Handling In Manager , We can Import & Export the Jobs Node Configuration And by using Admin , We can Create the Projects Organize the Projects Delete the Projects 

Server Components of Datastage 7.5x2 Version

There are three Architecture Components in datastage 7.5x2They are a) Repository b)Server( Engine ) c) Datastage Package InstallerRepository:-- Repository is an environment where we create job, design, compile and run etc. Some Components it contains are JOBS,TABLE DEFINITIONS,SHARED CONTAINERS, ROUTINES ETC Server( engine):-- Here it runs executable jobs that extract , transform, and load data into a datawarehouse. Datastage Package Installer:-- It is a user interface used to install packaged datastage jobs and plugins.

What is Fact Tables in Datawarehousing ? Give Example?

Features of Datastage | Datastage Feautures

Datastage Features are 1) Any to Any ( Any Source to Any Target ) 2) Platform Independent. 3) Node Configuration. 4) Partition Parallelism. 5) Pipeline Parallelism. 1)Any to Any:-That means Datastage can Extrace the data from any source and can loads the data into the any target. 2) Platform Independent:-The Job developed in the one platform can run on the any other platform. That means if we designed a job in the Uni level processing, it can be run in the SMP machine. 3 )Node Configuration Node Configuration is a technique to create logical C.P.U Node is a Logical C.P.U 4)Partition Parallelism Partition parallelim is a technique distributing the data across the nodes based on the partition techniques. Partition Techniques are a) Key Based b) Key Less a) Key based Techniques are 1 ) Hash 2)Modulus 3) Range 4) DB2 b) Key less Techniques are 1 ) Same 2) Entire 3) Round Robin 4 ) Random 5) Pipeline Parallelism Pipeline Parallelism is the process, the extraction, transformation and loading will be occurred simultaneously. Re- Partitioning: The distribution of distributed data is Re-Partitioning. Reverse Partitioning: Reverse Partitioning is called as Collecting. Collecting methods are Ordered 

Page 3: Data Stage

Round Robin Sort Merge Auto 

What is OLTP ? Uses of OLTP?

OLTP is nothing but Online Transaction Processing. It will be characterized by a large number of short online transactions. The main emphasis for OLTP system is to put on very fast query processing.In order to get the data faster to the end-users. And we use the Online transaction process for the fast process. And Oltp system is used for data integrity in multi access environments, and effectiveness measured by number of transactions per second. 

How does Hash Partition works in Datastage

Hash Partition technique is used to send the rows with same key column values to the same partition. In Datastage, partition techniques are usually distributed into two types. They are a) Key based partition Technique b) Key Less Partition Technique In Key based partition technique a)Hash Partition Technique b)Modulus c)Range d)DB2 In Key less partition technique, they are a)Same b)Entire c)Round Robin d)Random Example for Hash partition technique: If we use a hash partition technique and if we have a sample data as below e_id,dept_no 1,10 2,10 3,20 4,20 5,30 6,40 Data will partitioned as below In 1st partition 10,10 In 2nd Partition 20,20 In 3rd Partition 30 In 4th Partition 40

How to create group id in Sort Stage in Datastage

Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source. 

Page 4: Data Stage

If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's 

To do the sorting without sorting stage.

Take You can do it as normal process first as follows. If we want to read the data using Sequential File Design as follows : ------------- Seq. File ------------------------Dataset File To read the data in Sequential file Open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name.

 Now for the sorting process. In the Target Open Dataset properties And go to Partitioning ---- Select Partitioning type as Hash In Available Columns Select Key Column ( E_Id for EXAMPLE) to be sorted. Click Perform Sort Click Ok Compile And Run The data will be Sorted inthe Target. 

How to Remove Special Characters data and load rest of the data

Here we are going to know how to remove Special characters data rows and load rest of the rows into the target. Some times we get the data with special characters added for some of the rows. If we like to remove those special characters mixed rows in the column. We can use Alpha function. Alpha Function is used for this Job. If we use "Alpha" function. It will drop the special characters mixed rows and loads the rest of the rows into the target. So you can take sequential file to read the data and you can take Transformer stage to apply business logic. In Transformer stage in Constrain you can write the Alpha function. And Drag and Drop into the Target. . Then Compile and Run. 

Alpha Function

Checks if a string is alphabetic. If NLS is enabled, the result of this function is dependent on the current locale setting of the Ctype convention

Syntax

Alpha (string)

string is the string or expression you want to evaluate.

Remarks

Alphabetic strings contain only the characters a through z or A through Z. Alpha returns 1 if the string is alphabetic, a null value if the string is a null value, and 0 otherwise.

Examples

These examples show how to check that a string contains only alphabetic characters:

Page 5: Data Stage

Column1 = "ABcdEF%" * the "%" character is non-alphaColumn2 = (If Alpha(Column1) Then "A" Else "B") * Column2 set to "B"Column1 = "" * note that the empty string is non-alphaColumn2 = (If Alpha(Column1) Then "A" Else "B") * Column2 set to "B"

DataStage Functions

Following is the list of DataStage String Functions

Function Usage

AlNum Cab be used to check if the given string has alphanumeric characters

Alpha TRUE if string is completely alphabetic

CompactWhiteSpace all consective whitespace will be reduced to single space

Compare Compares two strings for sort

ComparNoCase Compare two strings irrespective of Case in-sensitiveness

ComparNum Compare the first n characters of the two strings

CompareNumNoCase Compare first n characters of the two strings irrespective of case in-sensitiveness

Convert Replace character in a string with the given character.

Count Count number of times a given substring occurs in a string

Dcount Returns count of delimited fields in a string

DownCase Change all uppercase letters in a string to lowercase

DQuote Enclose a string in double quotation marks

Field Return 1 or more delimited substrings

Index Find starting character position of substring

Left Finds leftmost n characters of string

Len Length of the string or total number of characters in a string

Page 6: Data Stage

Function Usage

Num Return 1 if string can be converted to a number

PadString Return the string padded with the optional pad character and optionallength

Right Finds Rightmost n characters of string

Soundex Returns a string which identifies a set of words that are phonetically similar

Space Return a string of N space characters

Squote Covers a string into single quotation marks

Str Repeat a string

StripWhiteSpace Return the string after removing all whitespace

Trim Remove all leading and trailing spaces and tabs. Also reduce the internal occurrences of spaces and tabs into one.

TrimB Remove all trailing spaces and tabs

TrimF Remove all leading spaces and tabs

Trim Returns a string with leading and trailing whitespace removed

Upcase Change all lowercase letters in a string to uppercase

Surrogate Key in Datastage

Surrogate Key is a unique identification key. It is alternative to natural key . And in natural key, it may have alphanumeric composite key but the surrogate is always single numeric key. Surrogate key is used to generate key columns, for which characteristics can be specified. The surrogate key generates sequential incremental and unique integers for a provided start point. It can have a single input and a single output link. 

What is Lookup stage? Uses of Look up stage?The Look up stage is a processing stage that performs lookup operations. 

It performs on dataset read into memory from any other parallel job stage that can output data. The main uses of the lookup stage is to map short codes in the input dataset onto expanded information from a look up table which is then joined to the data coming from input. For example, some we get the data with customers name and address. Here the 

Page 7: Data Stage

data identifies state as a two letters or three letters like mel for melbourne or syd for sydney. But you want the data to carry the full name of the state by defining the code as the key column. In this case lookup stage used very much. It will reads each line, it uses the key to look up the stage in the lookup table. It adds the state to the new column defined for the output link. So that full state name is added to the each row based on codes given. If the code not found in the lookup table, record will be rejected. Lookup stage also performs to validate the row. Look Up stage is a processing stage which performs horizontal combining. Lookup stage Supports N-Inputs ( For Norman Lookup ) 2 Inputs ( For Sparse Lookup) 1 output And 1 Reject link Up to Datastage 7 Version We have only 2 Types of LookUps a) Normal Lookup and b) Sparse Lookup But in Datastage 8 Version, enhancements has been take place. They are c) Range Look Up And d) Case less Look up Normal Lookup:-- In Normal Look, all the reference records are copied to the memory and the primary records are cross verified with the reference records. Sparse Lookup:--In Sparse lookup stage, each primary records are sent to the Source and cross verified with the reference records. Here , we use sparse lookup when the data coming have memory sufficiency and the primary records is relatively smaller than reference date we go for this sparse lookup. Range LookUp:--- Range Lookup is going to perform the range checking on selected columns. For Example: -- If we want to check the range of salary, in order to find the grades of the employee than we can use the range lookup. 

In Lookup stage properties, you will have constraints option. If you click on constraints button- you will get options like continue, drop, fail and rejectIf you select the option continue: it means left outer join operation will be performed.If you select the option drop: it means inner join operation will be performed.

Examples

Aggregator Stage and Filter Stage with example

If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as 

Page 8: Data Stage

 Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 -----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets. 

Funnel Stage with Real time example

Some times we get data in multiple files which belongs to same bank customers information.

In that time we need to funnel the tables to get the multiple files data into the single file.( table)

For Example , if we have the data two files as below

xyzbank1

e_id,e_name,e_loc

111,tom,sydney

222,renu,melboourne

333,james,canberra

444,merlin,melbourne

xyzbank2

e_id,e_name,e_loc

555,,flower,perth

666,paul,goldenbeach

777,raun,Aucland

888,ten,kiwi

For Funnel take the Job design as

Read and Load the data into two sequential files.

Go to Funnel stage Properties and

Select Funnel Type = Continous Funnel

( Or Any other according to your requirement )

Go to output Drag and drop the Columns

( Remember Source Columns Stucture Should be same ) Then click ok

Give file name for the target dataset then

compile and run th job

Page 9: Data Stage

Column Generator with sample data example

Column Generator is a development stage/ generating stage that is used to generate column

with sample data based on user defined data type .

Take Job Design as

Seq.File--------------Col.Gen------------------Ds

Take source data as a

xyzbank

e_id,e_name,e_loc

555,flower,perth

666,paul,goldencopy

777,james,aucland

888,cheffler,kiwi

In order to generate column ( for ex: unique_id)

First read and load the data in seq.file

Go to Column Generator stage -- Properties -- Select column method as explicit

In column to generate = give column name ( For ex: unique_id)

In Output drag and drop

Go to column write column name and you can change data type for unique_id in sql type and

can give length with suitable name

Then compile and Run

Basic Job Example for Sequential Stage to Datastage

This is the basic job for Datastage Learners. You can understand, how we can read the data and how we can load the data into the target. If we want to read the data using Sequential File Design as follows : ------------- Seq. File ------------------------Dataset File To read the data in Sequential file open Properties of Sequential file and give the file name. Now you can give the file path, by clicking on the browse for the file. And in Options select True ( If the first line is column name ) You can just leave the rest of the options as it is. Now go to Column and click on load, then select the file you like to read from the table definitions . ( This file should be same which you have given in the properties. ) Now in the Target Dataset - Give file name. Now Compile And run That's it You will get the output. 

Transformer Stage to filter the data

If our requirement is to filter the data department wise from the file below 

samp_tabl 1,sam,clerck,10 2,tom,developer,20 3,jim,clerck,10 4,don,tester,30 5,zeera,developer,20 6,varun,clerck,10 7,luti,production,40 8,raja,priduction,40 And our requirement is to get the target data as below In Target1 we need 10th & 40th dept employees. In Target2 we need 30th dept employees. In Target1 we need 20th & 40th dept employees. Take Job Design as below 

Page 10: Data Stage

 Read and Load the data in Source file In Transformer Stage just Drag and Drop the data to the target tables. Write expression in constraints as below dept_no=10 or dept_no= 40 for table 1 dept_no=30 for table 1 dept_no=20 or dept_no= 40 for table 1 Click ok Give file name at the target file and compile and Run the Job to get the Output 

Multiple Join stages to join three tables

If we have three tables to join and we don't have same key column in all the tables to join the tables using one join stage. In this case we can use Multiple join stages to join the tables. You can take sample data as below soft_com_1 e_id,e_name,e_job,dept_no 001,james,developer,10 002,merlin,tester,20 003,jonathan,developer,10 004,morgan,tester,20 005,mary,tester,20 soft_com_2 dept_no,d_name,loc_id 10,developer,200 20,tester,300 soft_com_3 loc_id,add_1,add_2 10,melbourne,victoria 20,brisbane,queensland Take Job Design as below 

 Read and load the data in three sequential files. In first Join stage , Go to Properties ----Select Key column as Deptno and you can select Join type = Inner Drag and drop the required columns in Output Click Ok In Second Join Stage Go to Properties ---- Select Key column as loc_id 

Page 11: Data Stage

and you can select Join type = Inner Drag and Drop the required columns in the output Click ok Give file name to the Target file, That's it Compile and Run the Job 

Aggregator Stage and Filter Stage with example

If we have a data as below table_a dno,name 10,siva 10,ram 10,sam 20,tom 30,emy 20,tiny 40,remo And we need to get the same multiple times records into the one target. And single records not repeated with respected to dno need to come to one target. Take Job design as 

 Read and load the data in sequential file. In Aggregator stage select group =dno Aggregator type = count rows Count output column =dno_cpunt( user defined ) In output Drag and Drop the columns required.Than click ok In Filter Stage ----- At first where clause dno_count>1 ----Output link =0 -----At second where clause dno_count<=1 -----output link=0 Drag and drop the outputs to the two targets. Give Target file names and Compile and Run the JOb. You will get the required data to the Targets. 

How to create group id in Sort Stage in Datastage

Group ids are created in two different ways. We can create group id's by using a) Key Change Column b) Cluster Key change Column Both of the options used to create group id's . When we select any option and keep true. It will create the Group id's group wise. Data will be divided into the groups based on the key column and it will give (1) for the first row of every group and (0) for rest of the rows in all groups. Key change column and Cluster Key change column used based on the data we are getting from the source. If the data we are getting is not sorted , then we use key change column to create group id's If the data we are getting is sorted data, then we use Cluster Key change Column to create Group Id's 

Merge Stage Example

Merge Stage is a Processing Stage which is used to perform the horizontal combining. This is one of the stage to perform this operation like Join stage and Lookup Stage. Only the difference between the stages are size variance an Input requirements between them. Example for Merge Stage 

Page 12: Data Stage

Sample Tables MergeStage_Master cars,ac,tv,music_system BMW,avlb,avlb,Adv Benz,avlb,avlb,Adv Camray,avlb,avlb,basic Honda,avlb,avlb,medium Toyota,avlb,avlb,medium Mergestage_update1 cars,cooling_glass,CC BMW,avlb,1050 Benz,avlb,1010 Camray,avlb,900 Honda,avlb,1000 Toyota,avlb,950 MergeStage Update2 cars,model,colour BMW,2008,black Benz,2010,red Camray,2009,grey Honda,2008,white Toyota,2010,skyblue Take Job Design as below 

 Read and load the Data into all the input files. 

 

In Merge Stage Take cars as Key column. In Output Column Drag and Drop all the columns to the output files. 

Page 13: Data Stage

 

Give File name to the Target/Output file and If you want you can give reject links (n-1)

Compile and Run the Job to get the required output.

Look up stage with example

Look Up stage is a processing stage and used to perform lookup operations and to map short codes in the input dataset into expanded information from a lookup table which is than joined to the incoming data and output. For example if we have the primary data as below. Table1 e_id,ename,e_state 100,sam,qld 200,jammy,vic 300,tom,Tas 400,putin,wa table1Ref e_state,full_state qld,queensland vic,victoria Take the job design as below 

 Read and load the two tables in sequential files. Go to lookup stage and Drag and drop the primary columns to the output. And Join e_state from primary table to the e_state in reference table And drag and drop the Full_state to the output. In properties select lookup failure as drop Now click ok Give Target file name and Compile & Run the Job here 

What To Choose Join Stage or Lookup stage in Datastage

How to choose the stages.

Join stage or Lookup stage

We need to be careful when selecting the stages. We need to think about the

performance of the Job before selecting the stages. Time is more precious to the

Page 14: Data Stage

clients. That's why we need to get the Job for very less time. We need to try

our best to get good performance to the Job.

Both the stages Join stage and Look up stage performs same thing. That is they

combine the tables we have. But why Lookup stage has been introduced.

Look Up Stage have some extra benefits which will not come with the Join stage.

Look up stage doest not required the data to be sorted. Sorting is mandatory with

the Join stage. In Look Up stage the columns with different column names can be

joined as well where it is not possible in the Join stage. That means Join stage,

the column name must be similar.

A Look Up Stage supports reject links , if our required demands reject links we

can’t go with Join stage. Because Join stage doesn’t supports Reject Links. And

Lookup stage has an option to fail the Job if the look up fails. It will be useful

when the look up stage is expected to be successful.

Look up stage keeps the reference data into the memory which yields better

performance for smaller volume of data. If you have large amount of data, you

need to go with Join stage.