d stage faqs

45
TCS 1.What are the sources systems and target systems in your project? -> source: flatfiles,oracle,(any database). -> target: db2(8.1px),orcle,sql-server. 2.what are the ETL stages you have used in your project? -> transformer,sort,join,dataset,lookup,change capture,aggrigrater. 3.what are the difference between oracle and odbc enterprise stages? 4.which stage gives good performance? -> dataset stage 5.what is the diff between dataset and sequentical stages? ->SEQ FILE: 1.extract/load from/to seq file.max==2GB. 2.when used as a source at the time of compilation it will be converted into native format from ASCII. 3.By default it will be processed in sequence only. 4.doesn't support null values. 5.processed at the server. 6.supports .csv,.txt,.xls etc... 7.does not support lkp fileset. -> DATASET: 1.used as an intermediate stage. 2.at compile time conversion is not required. 3.datasets get processed in our local system.so performance is improved as the server is not loaded. 4.supports only .ds 5. 2GB limit is not thete

Upload: akrayruy

Post on 10-Apr-2015

639 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: d Stage Faqs

TCS

1.What are the sources systems and target systems in your project? -> source: flatfiles,oracle,(any database). -> target: db2(8.1px),orcle,sql-server.

2.what are the ETL stages you have used in your project? -> transformer,sort,join,dataset,lookup,change capture,aggrigrater.

3.what are the difference between oracle and odbc enterprise stages?

4.which stage gives good performance?

-> dataset stage

5.what is the diff between dataset and sequentical stages?

->SEQ FILE: 1.extract/load from/to seq file.max==2GB. 2.when used as a source at the time of compilation it will be converted into native format from ASCII. 3.By default it will be processed in sequence only. 4.doesn't support null values. 5.processed at the server. 6.supports .csv,.txt,.xls etc... 7.does not support lkp fileset.

-> DATASET:

1.used as an intermediate stage. 2.at compile time conversion is not required. 3.datasets get processed in our local system.so performance is improved as the server is not loaded. 4.supports only .ds 5. 2GB limit is not thete 6.support lkp

6.what are the diff processing stages you have used in your project?

-> transformer,modify,changeapply,merge,sort,survgate.

7.what is the shared container? what is the purpose of using this stage?

-> create reusable object that many jobs within the project can include.

Page 2: d Stage Faqs

-> When we go for parallel shared container the logic can be reusable across many jobs

8.what is the funnel stage?

-> this is stage is use to comibne maltiple input data(same metadata) to single outputdta. 1.CONTINOUS FUNNEL: NO PARTCULAR ORDER( LOAD WITHOUT ORDERING). 2.SORT FUNNEL: IT LOAD IN PARTCULAR ORDER(ASCENDING OR DECENDING). 3.SEQVANCE FUNNEL: IT READ FIRST INPUT FIRST AFTER SECOND INPUT SECOND LOAD

9.what is the diff between joner and merge stage?

10.what is the diff between joiner and lookup stage?

11.suppose if i have souce data records and lookup data is about 50000 records. which stage is preferred to used for this requirement?

-> look up

12.suppose if i have 50000 records in source and 100 records in lookup stage, then what is the stage preferred for this requirement? ->lookup

13.what is the diff between normal transformer and basic transformer? -> A Basic transformer compiles in "Basic Language" whereas a Normal Transformer compiles in "C++".

-> Basic transformer does not run on multiple nodes wheras a Normal Transformer can run on multiple nodes giving better performance.

-> Basic transfomer takes less time to compile than the Normal Transformer. -> Basic transformer stage can only be used for SMP(Symmetric Multiprocessors) systems and not for MPP(Massively Parellel Processing) system or clusters systems. USAGE : -> A basic transformer should be used in Server Jobs. -> Normal Transformer should be used in Parallel jobs as it will run on multiple nodes here giving better performance.

14. what is the diff between parallel and server jobs? -> The basic difference between server and parallel jobs is the degree of parallelism

Page 3: d Stage Faqs

-> Server job stages do not have in built partitoning and parallelism mechanism for extracting and loading data between different stages. -> Datastage compiled in to BASIC(interpreted pseudo code) and Parallel compiled to OSH(Orchestrate Scripting Language). -> Server Transformer supports basic transforms only, but in parallel both basic and parallel transforms. -> In File stages, Hash file available only in Server and Complex falat file , dataset , lookup file set avail in parallel only. -> Parallel Jobs haveing a huge amount of data to maintains. -> server Jobs haveing a low amount of data to maintains. -> Parallel Jobs it having a more stages comperies server Jobs . -> Parallel Jobs ARE supports smp,mpp. -> server Jobs support smP. 15.what is the stage you have used to load the data into teradata table?

-> teradeta api stage

16.what is the multinode concept in teradata? 17.what is the slowly changing dimension? -> it explain how to capture the changes in the target over the period of the time. -> it explain change data capture.

18.how do u capture the changes in slowly changing? -> Type I: Replace the old record with a new record with updated data there by we lose the history. But data warehouse has a responsibility to track the history effectively where Type I implementation fails.

-> Type II: Create a new additional dimension table record with new value. By this way we can keep the history. We can determine which dimension is current by adding a current record flag or by time stamp on the dimensional row.

-> Type III: In this type of implementation we create a new field in the dimension table which stores the old value of the dimension. When an attribute of the dimension changes then we push the updated value to the current field and old value to the old field.

19.what is the surrogate key? -> A surrogate key is a system-generated (non-meaningful from a business perspective).it specified by uniqness

Page 4: d Stage Faqs

-> It is the system generated key which cannot be edited by the user. -> It is the primary key of the dimension table in warehouse. -> It is nothing but the sequence which generates the numbers for primary key column.

20.what is the diff between narmalized model and dimensional model? -> normalized data is held in a very simple structure. The data is stored in tables. -> Each table has a primary key and should contain data relating to one entity – so a normalized customer table contains only data about customers. -> We need to make logical connections between the entities (for example, this customer placed these orders). -> To do this we use a foreign key in the orders table to point to the primary key in the customer table.

21.what is the star schema? -> A relational database schema organized around a central table (fact table) joined to a few smaller tables (dimension tables) using foreign key references. -> The fact table contains raw numeric items that represent relevant business facts. -> Star schema is a type of organising the tables such that we can retrieve the result from the database easily and fastly in the warehouse environment. Usually a star schema consists of one or more dimension tables around a fact table which looks like a star,so that it got its name.

22.what is the ods and edw? -> ods mean Operational Data Store -> it maintain the transactional date(current data) -> in ods tha data is volatile(it can be changed) -> EDW means enterprize datawarehouse -> it maintain histrocial data for long time -> in edw data is non-volatile(it cant chane)

23.when do u use ods and edw? -> ODS can be described as a snap shot of the OLTP system.It acts as a source for EDW(Enterprise datawarehouse). -> ODS is more normalised than the EDW.Also ODS doesnt store any history.Normally the Dimension tables remain at the ODS (SCD types can be applied in ODS)where as the Facts Flow till the EDW. -> edw maintain histrocialdata. -> it is very use ful to decision making for enterprize.

24.WHAT IS DATAMART?

Page 5: d Stage Faqs

-> A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject, that may be distributed to support business needs.

25.if i want to delete dataset file in the unix, how do i do that?

-> YA, ORCHADMIN RM <DATASET NAME>

26.how do u schedule the jobs?

-> using datastahe director-> tools

27.how do you scedule the jobs in job sequence? -> 1. Open Datastage -> 2. Open New Job Sequence -> 3. Save it with some name. -> 4. A your job sequence is created. -> 5. Now drag the job sequence stages-->as per your business requirement.

28.what is the normal view and materialized view? when do u used which view? VIEW

-> A view takes the output of a query and makes it appear like a virtual table. You can use a view in most places where a table can be used. -> All operations performed on a view will affect data in the base table and so are subject to the integrity constraints and triggers of the base table. -> A View can be used to simplify SQL statements for the user or to isolate an application from any future change to the base table definition. -> A View can also be used to improve security by restricting access to a predetermined set of rows or columns. MATERLIZED VIEW

-> Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse. -> A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. -> Unlike an ordinary view, which does not take up any storage space or contain any data. -> The existence of a materialized view is transparent to SQL, but when used for query rewrites will improve the performance of SQL execution -> An updatable materialized view lets you insert, update, and delete.

Page 6: d Stage Faqs

29.what it the buildops?

-> Buildops are good if users need custom coding but do not need dynamic (runtime-based) input and output interfaces. -> Buildop provides a simple means of extending beyond the functionality provided by PX, but does not use an existing executable (like the wrapper).

Reasons to use Buildop include: -> Speed / Performance -> Complex business logic that cannot be easily represented using existing stages, like advanced lookups across a range of values, custom surrogate key generation, calculating olling aggregates -> Build once and reusable everywhere within project, no shared container necessary -> Can combine functionality from different stages into one

30.In datastage, which language code will be used? -> OSH(orcistrate shell lang)

31.why do u use copy stage? can do sort in copy stage?

-> to copy tha data in multiple targets -> Yes we do sort in copy stage.

32.in what other stage we can do copy? -> transformer

33.what is sort stage? in what other stage we can do sort?

-> soting(ascending or descending) the data based on key. -> using copy stage we can sort. 34.why transformer stage so costlier in development? how this stage different from other stage?

-> ref: OSH : Archestrate shell script -> multi purpose -> counters and hold values for previous rows to make comparison -> hold dervvation to be used in multiple field dervations -> can be used to control execution of constraints -> it compile on c++ genrated object mode

Page 7: d Stage Faqs

35.what is modify stage? what are all diff things we can in modify stage? -> modify column types -> null handling -> date/time handling -> string handling -> add or drop columns 36.what is column generate stage? -> it adds column to incoming data and generates mock data for these columns for each data row processed

37.instead of using column generate stage can we generate a column in any stage?

-> runtime column propagation (RCP) 38.what is RCP?

-> runtime column propagation (RCP) -> IF Your job counters extra columns that are not defined in the metadata when it actually runs it will adopt these extra columns and propagate them through the rest of the job.

39.diff between lookup,join and merge stage

40.can we track rejections in join stage? -> NO

41.why do we use funnel stage? -> the funnel stage copies multiple input data(same meta data) to a ingle output dataset -> the operation is usepul for combining separate datasets into a single large dataset -> a processing stage that combines datafrom multiple input links to a single output link

42.which one will give good performance between ODBC & Oracle Enterprise stages?

-> oracle enterprize(parell and easy connct)

43.How do we reject the records in sequential file stage? -> using reject port

44.In what other stage we can reject the records? -> all except join

Page 8: d Stage Faqs

45.what is filter stage?

-> to select aportion of data for checkng -> filter the data based on condtion

46.what is difference between job parameters and environment variables? -> Basically Environment variable is predefined variable those we can use while creating DS job. -> We can set eithere as Project level or Job level. -> Once we set specific variable that variable will be availabe into the project/job

47.what is the aggregator stage? -> which is use to calculate the summrized for a group og records -> columns to be aggregated

48.what is shared container stage?

-> create reusable object that many jobs within the project can include. -> When we go for parallel shared container the logic can be reusable across many jobs

49.Have you used sequencer jobs? what are the different triggers you have used? -> ok condition,failed warning,custom,userstatus,unconditional

50.Explain about yourself, what is your work experience and how do you rate yourself on datawaresing?

51.what model we are doing here? ->star shema

52.suppose if we have two flat files and if we want to build one scenario where one fact tableand two dimensions then how do you build this and what trasformations you use there?

-> one transformer,one lookup,two targets

53.why you use shared container to read read unix flat file why not some other stage?

Page 9: d Stage Faqs

54.what is that some other stage?

55.what is the teradata version you are using?

56.what is ur target,sources? -> target:db2 -> source:flatfiles

57.who will create environment variables? -> admin

58.how do you register a plug-in?

59.suppose take three jobs and if we want to run those jobs using sequencer how do you use that and explain. -> arrange seqvansilly with links

60.how do the jobs run in sequecer? parallelly or sequecially? -> we can run ant type( both server and parelle)

61.what are different stages involve in the sequencer job and explain them?

62.what is the difference between star & snow flake schemas? for what purpose u will use them?

63.do you know narmalization techniques?

64.suppose u have build summarised tables and if we need to improve performance what do u do?

65.what type performance u will get?

66.how do you improve load performance? ->increase memory size

67.what is an aggregator?What it do?

POLARIS-INTERVIEW

Page 10: d Stage Faqs

68. What is the size of database in your project? -> 110gb69. What is the size of data extracted in the extraction process? -> lee then 1gb70. How many data marts are there in your project? -> 271. How many fact and dimension tables in your project? -> 4 fact tables 10 dimension table72. What is the size of fact table in your project?

73. How many dimension tables did you had in your project and name some dimensions? -> 3 book,country,counter,subaccount,counterparty,security,legal entity,intercompany,trnstype,project.74. How many measures you have created? -> 4575. How do you enter into oracle?

JDA76.What is difference between local & environment varialbles?77.what is the routine? have you used the routines in your project?78.what are after and before routines?79.what are stage variables? where do we create stage variables? why do we use stage variables?80.how do we access the oracle database from datastage? -> orcle enterprize,odbc81.suppose if there is no client version in your local machine,the oracle is been installed in unix box,then how do you get access the oracle connection.

82.can create an environment variable from your designer? ->no

83.suppose there is no datastage client components installed in your local machine then how do you compile and run the job?

84.suppose a single job is accessed in three sequencer jobs, can we run three sequencer jobs at a time? -> all

85.what is job controls? when do we specify these controls and where?

86.how sequential file execute during the run time in PX? Either sequential or parallel mode? ->seqvential

Page 11: d Stage Faqs

87.how do we make a sequential file stage to run in parallel mode? -> no of readers for node =2

IBM INTERVIEW QUESTIONS (15TH OCT’07)

88. Have you built separate data marts for financial and manufacturing aspects?

89. Which schema have you used?

90. Wether you were loading data into DWH or a staging area?If loading the date into DWH, then how do you load the data into target table?

91. How many fact and dimension tables you have used?

92. Tell me about funnel stage

93. Which stage in server jobs is equivalent to funnel stage?

94. Difference between merge stage and funnel stage

95. Explain about SCD type-2?

96. What is the order of execution: stage variable,constraints,derivation?->yes

97. How you used to get the source?

98. Difference between data set stage and file set stage

99. If we are using SFS as source and hash files as reference, with transformer stage and pulling data into target without using any constraints, then which kind of join simulation we are doing?

100. What is a factless fact? Give example?

101. What is a semi-additive fact. Give example

102. How to go for SCD type 2 in server jobs?

WIPRO INTERVIEW QUESTIONS (26TH OCT’07)

Page 12: d Stage Faqs

103. Tell me about your project. Was it a Datawarehousing project?

104. How many dimensions and fact tables you have developed?name some of the fact and dimension tables

105. Have you been a part of the production team? ->no

106. Name some of the stages used by you

107. Difference between join and look-up stage

108. Difference between funnel and merge stage

109. What is the difference between stage variable and parameters?

110.If the look-up table has more than 10,000 records and the source table has considerably less number of records,which stage will you use?look-up or join'-> sprase lokup111. What is configuration file?

112. What is a node?

113. What are the different types of partitions? Hwat is the default?

114. Difference between same and entire partition

115. What different things can be done by using transformer stage

116. What all things you can do using copy stage

117. If I have a file containing date as one of the columns and I have 5 years of data.

Now I have to load the data at run time according to user requirements which is 6 months of data. How to go for it? -> using job parmater

TERRADATA INTERVIEW QUESTIONS (26th Oct ’07)

DATA STAGE

Page 13: d Stage Faqs

118. What is a configuration file?

119. have you ever created a configuration file -> no120. If you have a 4CPU box, what kind of configuration file you suggest i.e, how many number of processing nodes

121. What is a pipeline parallesism? How is is different from partition parallelism

122. What is a data set?How is it different from SFS?

123. In data stage where do we store jobnames,parameters etc. what is the name of the place/database where we store them

124. What do you mean by restartability of sequencer

125. What is meant by multiple instances

126. Have you developed any routines or custom transforms

127. What is meant by shared containers? What are the different types of shared containers?what is the difference between them

128. Difference between look-upstage,join stage and merge stage?

129. How to optimize and tune the parallel jobs for better performance

130. What is an environment variable

ORACLE

131. What is a correlated subquery

132. What is bitmap-indexing

133. What is a materialized view

UNIX134. What is shell scripting

DWH

135. What is the difference between a fact table and a dimension table

136. What is confimed dimension

Page 14: d Stage Faqs

137. What is a factless fact

138. What is the best architecture?

139. Explain SCD type III

IBM Interview 14th and 15th June-2007

140. What is invocation id?

141. What is RCP?

142. Difference between stage and environmental variables?

143. What is array size and transaction size?

144. In a sequence of 5 jobs, if aborts at 3rd job. If we run the sequence, from where it starts processing?

145. Difference between sparse and normal lookup and where we use sparse. Can give one situation?

146. Data set will be stored in two formats. What are they?

147. How do you invoke the UNIX scripts in Data stage jobs?

148. What is Email task?

149. When do you use separate sort stage and inbuilt sort utility in a stage? What is the difference between these two?

150. Have you run the jobs with odd number of nodes? If not why?

151. How many nodes have you used in your project?

152. Have you created environmental variables?

153. Difference between sequential file and data set?

154. If a job is aborted. Can you run it with out compiling?

155. How do you committ the records while loading?

156. What is the purpose of check points?

Page 15: d Stage Faqs

157. What are job Controls?

158. How to call the stored procedures in data stage jobs? Can you specify constraints?

159. How do you compile stored procedures?

160. If a job has 3 stages and nodes=3 then how many process you get?

161. A sequential file has 100 records. How to use first 19 records in the job?

162. How to check the number of nodes while running the job in UNIX environment?

163. Have you used the load option?

164. A transformer stage is running in parallel. How can you change to sequential?

165. What is preserve partitioning?

166. Configuration file is specified at project level and at job level. Which one will override the other?

-> may be job level ->

167. In UNIX you don’t know some commands. While running the jobs you need the commands, then how can you get them?

168. What is isolation? -> see below

170. What is NLS? -> national lang support

171. What command is used to search a string?

-> grep command:-> is used for finding any string in the file.

172. What commands do you know in UNIX? -> grep ( searchinf file) -> echo (display file)

Page 16: d Stage Faqs

-> chmod ( change mode) -> rm ( remove) -> cp(copy) -> ls (list of files)

173. How do you start UNIX server and stop it?

-> n a UNIX environment, you can also use the pd_start script to manually start and stop the server processes. -> syn: # pd_start {start|restart|stop|status}

174. Performance difference between join and merge. Which takes more memory? -> performance = join -> join takes more memory

175. What is Email loop activity?

-> Notification Activity - used for sending emails to user defined recipients from within Datastage

176. A sequential file has 4 duplicate records. How do you get 3rd record?

-> no not posible in seq file

177. What types of joins can be specified in look up stage?

-> inner and outer

178. What routines you wrote?

179. If the job is aborted while running in UNIX, then what error message you get?

180. Command VI is used for what? -> The default editor that comes with the UNIX operating system is called vi (visual editor) -> editing

181. Which version of UNIX you are using?

Page 17: d Stage Faqs

-> 5.2

182. How do you connect to UNIX?

-> using command line in admin -> osh commands -> orchadmin

183. Link collector is present in parallel jobs or not? -> ya

184. To join 2 sequential files which stage you use? -> join or merge

185. Performance difference between ODBC, Oracle Enterprise, Dynamic RDBMS? -> may be oracle enterprize bcoz of native connection

186. What performance techniques do you use while developing jobs?

-> using partioning methods -> types of parallel( pipeline and partion) -> using hash key -> enablin and increase cashe memory

INTERVIEW QUESTIONS

ORACLE Satyam Teynampet 17/04/2004

187. What are inline views? -> An inline view is a SELECT statement in the FROM-clause of another SELECT statement.

188. What is a subquery & correlated subquery. Tell with example?

-> In sub query the inner query is executed only once. Depeding upon the results of inner query outer query is evaluated.

-> ex: select * from emp where deptno=( select deptno from dept where dname='hyd');

-> In correlated subquery the inner query is evaluated once for each row processed by the parent statement or outer query.

Page 18: d Stage Faqs

-> SELECT empnum, name FROM empl e WHERE sal > (SELECT avg(sal) FROM emp WHERE department = e.department);

189. What is normalisation. Please state all forms with example?

-> Normalization is process for reducing the redency of the data.there 5 normal forms mainly used in 3 forms4th one boyesscoded normal form. -> First Normal form (1NF): A relation is said to be in 1NF if it has only single valued attributes, neither repeating nor arrays are permitted.

-> Second Normal Form (2NF): A relation is said to be in 2NF if it is in 1NF and every non key attribute is fully functional dependent on the primary key.

-> Third Normal Form (3NF): We say that a relation is in 3NF if it is in 2NF and has no transitive dependencies. -> Eliminate Columns Not Dependent On Key - If attributes do not contribute to a description of the key, remove them to a separate table.

190. If I select from two tables, how do I assure which table is searched first? ->

191. What are triggers. What are Row level and Statement Level triggers? -> Triggers are an action that is performed is a condition is met within the code. -> Row-Level Triggers -> Statement-Level Triggers -> Before Trigger -> After Trigger -> Schema Triggers -> Database level Triggers

-> row-level trigger: A row-level trigger fires once for each row that is affected by a triggering event -> Statement-Level Triggers: fires once per triggering statement regardless of the number of rows affected by the triggering event

Page 19: d Stage Faqs

192. Implicit and Explicit Cursors. Does it work for and update statement? -> PL/SQL declares a cursor implicitly for all SQL data manipulation statements, including quries that return only one row -> explicit cursor or use a cursor FOR loop However,queries that return more than one row. -> An implicit cursor is used for all SQL statements Declare, Open, Fetch, Close. An explicit cursors are used to process multirow SELECT statements An implicit cursor is used to process INSERT, UPDATE, DELETE and single row SELECT. .INTO statements.

Satyam G.N.Chetty Road 28/03/2004

193. Write a statement using inline views? -> small example -> select * from ( select * from dept);

194. Write a sql to fetch the sumof top 5 salaries department wise?

->

195. What is a snapshot to old error? -> If you have lots of updates, long running SQL and too small UNDO, the ORA-01555 error will appear. -> The SNAPSHOT_TOO_OLD erro will come ... the query will goves into cartesion product or infinite loop -> Other case is suppose ur updated loarge no of rows at atime without saveing the records . In this case u can use commit statement for every 500 records then u can avoid this problam. Or ask DBA to extend the table space for this segment.

196. What is the difference between Function and Procedure? -> 1. Functions must return a value while procedure may or may not return any value.

-> 2. Function can return only a single value at a time while procedure can return one, many or none.

-> 3. Function can be called from sql statements while procedures can't.

Page 20: d Stage Faqs

197. What if I give a return parameter in Function as NULL?

198. How do I complile Pl/Sql procedure from Unix Prompt? -> x := OSCommand_Run('/home/test/myoscommand.sh')

199. A view has three base tables. One of the column of a table is removed. What happens to the view. -> a pattel error occur

200. How do I enable the above view again.

201. How to read a flat file using utl file?

-> "UTL_FILE" allows PL/SQL programs to both read and write to any operating system files that are accessible from the server on which your database instance is running.

ex: utl_file_dir = c:sampledata

202. What are Ref cursors? -> CURSOR IN WHICH QUERY CAN BE CHANGED AT RUNTIME IS CALLED REF CURSOR -> ref cusor is a dynamic type of cursor which is used to provide Reusability of cursor....... -> Ref Cursors also known as Dynamic cursors cane be associated with as many SELECT statements you want at different times. You can associate with different SELECT stataments at dynamic time -> two types in dynamic cur 1) Strong- Return type 2) weak- No Return type

203. What are the Build in packages you have used?

204. What does a change capture do

Page 21: d Stage Faqs

->same question below

205. What does a transformer stage do? -> dervations -> constrains -> experssions for constraints and dervations can referance -> job parameters -> funcations -> system varaiables and cinstraints -> stage variables -> external routines

206. What is a sequencer job and why do we use it -> to control the flow of jobs -> it excutes jobs in seqvantially.

207. Why do we run jobs thru shell scripts

208. What is making a job as of multiple instances ?

-> same question below

209. What is the link ordering in the datastage? -> -> same question below

210. What is the combinality mode in the teradata stage?

211. What is Hashed file and where do we use it.

-> same question below

235.what is multiple instance of making the job?

Page 22: d Stage Faqs

-> same question below

236. what is the fast multi node & fast default node

->

237. what is the routine?what it does? same question below

238. what is the shared container?how & where do u use shared containers? -> same question below

239. Tell me few runtime errors u struggled with? -> if transformer used check c++ compiler -> if buildop errors try bulidop from command line -> chek enveronment variables settings -> examine genrated osh -> some stages not supported rcp -> null handling errors...........

240. what change capture stage do?

-> change capture stage catch holds of changes from two different datasets and generates a new column called change code....change code has values

0-copy 1-insert 2-delete 3-edit/update

241. Tell few stages u used in ur development environment?

-> explain any stages

242. what is the defferences between server job & parallel job & main frame jobs?

Page 23: d Stage Faqs

-> Server jobs. These are available if you have installed DataStage Server. They run on the DataStage Server connecting to other data sources as necessary.

-> Parallel jobs. These are only available if you have installed Enterprise Edition. Server jobs can be run on SMP MPP machines.Here performance is low i.e speed is less

-> Parallel jobs can be run only on cluster machines .Here performance is high i.e speed is high

-> These run on DataStage servers that are SMP MPP or cluster systems. They can also run on a separate z/OS (USS) machine if required.

243.what is the defferences between server shared container & parallel shared containers?

-> SERVER SHARED CONTAINERS CAN BE USED IN BOTH SERVER & PARALLEL JOBS,BUT PARRALLEL CONTAINERS CAN NOT BE USED IN SERVER JOBS. -> When we go for parallel shared container the logic can be reusable across many jobs -> Server shared containers contain server stage types, parallel shared containers contain parallel stage types.

244. where is the repository comes into picture?

-> Repository resides in a spcified data base. it holds all the meta data rawdata mapping information and all the respective mapping information. -> it contain jobs,routines,tabledefinations etc.......

245. what is the reject file ordering?

246. what are the responsibilities of Datastage Manager?

-> import and export metadata -> backup and recovery the project

Page 24: d Stage Faqs

-> use the mmanager to store and manage reusable metadata for the jobs you define in desiner -> this meta data includes table and file layouts and routines for transforming extracted data.

247. what is the Analysis tool? what we do with that?

-> may be usage analysis

248. What do we do with Reporting Assistant?

249. What change capture stage do? -> change capture stage catch holds of changes from two different datasets and generates a new column called change code....change code has values

0-copy 1-insert 2-delete 3-edit/update

250. what is the link ordering in the datastage? -> U can edit the order of the input and output links from the Link ordering tab in the stages.

251. compare the informatica & datastage tools in view of performance

-> Main difference lies in parellism Datastage uses parellism concept through node configuration where Informatica does not

-> Partitioning - Datastage PX provides many more robust partitioning options than informatica. You can also re-partition the data whichever way you want.

-> Parallelism - Informatica does not support full pipeline parallelism (although it claims).

-> File Lookup - Informatica supports flat file lookup but the caching is horrible. DataStage supports hash files lookup filesets datasets for much more efficient lookup.

Page 25: d Stage Faqs

-> Merge/Funnel - Datastage has a very rich functionality of merging or funnelling the streams. In Informatica the only way is to do a Union which by the way is always a Union-all.

252. how many jobs u have been created?

-> 100 + jobs in last six months

253. Explain few of the jobs u created?

-> sort -> aggrigrater -> remove duplicate -> surogate key -> transformer

254. how do u create stage variables in transformer stage?

-> transformerstage -> stage properties -> varables

255. what is the combinality mode in the teradata stage?

256. what is the datastage debugger? where it runs?

-> it check various locations -> it check and debug the portion of data or data -> like peek,head,tail......

257. how do u test your jobs at client place?

-> using datastage director

258. if input & output column datatypes are not matched and not used proper conversion process then there is an error like this. -> pattel error datatypes mis match -> bad conversion error

259. When checking operator: When binding output schema variable "outRec": When binding output

Page 26: d Stage Faqs

interface field "CPSC_CODE" to field "CPSC_CODE": Converting nullable source to non-nullable

-> result; fatal runtime error could occur (use modify operator to specify value to which null should be converted)

260. what is the architecture of datastage?

-> client-server architeck

261. how do you define and use the job parameters? -> create in job properties -> used at runtime

262. what is stage variables,system variables,environment varaibles?

-> Stage Variable - An intermediate processing variable that retains value during read and doesnt pass the value into target column. -> system variable - System variables have a predefined type and structure that cannot be changed. When an expression is stored into a system variable, it is converted to the variable type, if necessary and possible.

263. how to use routines in datastage?

-> write the routine in C or C++ create the object file and place object in lib directory. -> now open disigner and goto routines configure the path and routine names -> Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. 1) Transform functions 2) Before-after job subroutines 3) Job Control routines

264. what is difference between shared-container and local-container? -> A shared container can be shared between jobs and is great for re-usability. -> local container can only be used in the job in which it is created, it is useful for simplifying job design by breaking the job into different sections.

265. how do you connect to oracle ?

Page 27: d Stage Faqs

-> using odbc connections

266. what is TrimB and TrimF functions in datastage?

-> Trim - removes spaces and tabs (or %stripchar%) -> TrimB - removes all trailing spaces and tabs -> TrimF - removes all leading spaces and tabs 267. please explain any ETL process that you have developed?

-> u explain any stages ( including ur job) 268. What is hash file ? Types of hash files.

-> used for look-ups it is like a referance table -> it is also used in-place od odbc,oci tables fo better performance -> two types, static hashfile,dynamic hash file

269. If ur are doing any changes in shared-container will it reflect in all the jobs wherever you used this shared- container? -> ya

270. have u written any custom routines in ur project? if so explain?

271. how to make read-only routines in datastage?is it possible if so how?

-> ya one aption is there

272. what are different field operators in routines?

273. How do you get log info into a file?

275. What is isolation level and when do you use them?

-> Transactions not only ensure the full completion (or rollback) of the statements that they enclose but also isolate the data modified by the statements. -> The isolation level describes the degree to which the data being updated is visible to other transactions.

Page 28: d Stage Faqs

276. What is before job subroutine/After job subroutine? When do you use them?

-> Before-stage subroutine Contain the name (and value) of a subroutine that is executed before the stage starts to process any data. -> After-stage subroutine Contain the name (and value) of a subroutine that is executed after the stage has processed the data

277. How do you clear the lock of a job?

->ds dirctor jobs cleanup resources

278. How do you backup and restore the project? -> using datastage manager ( datastage web sphered 7.5px) -> using datastage director ( datastahe information server8.1)

279. What is Clear Status File and when do you use it?

-> ds dirctor-> job -> clear status file

280. What is Cleanup Resources and when do you use it?

-> ds dirctor jobs cleanup resources ( we use at when job locking)

281. can i join a flat file and oracle and load into oracle?is this possible? -> ya

282. how can u do incremental aggregation in datastage? -> may be using change data capture (cdc)

283. while loading some data into target suddenly thier is a problem loading process stopped how can u start loading from the records that were left? -> startin at error stoping( using restatble check points) ......................

284. what are the general problems that u face in datastage?

Page 29: d Stage Faqs

-> if transformer used check c++ compiler -> if buildop errors try bulidop from command line -> chek enveronment variables settings -> examine genrated osh -> some stages not supported rcp -> null handling errors...........

285. Is it possible to rollback a set of jobs when after some jobs are executed, if the latest job fails?

-> no 286. Could DS generate data dictionary of source database and target database ? -> ya 287. What are the various reports that could be generated using this tool ?

-> usage analysis

288. Could DS show record length, total db size for source and target based on existing source data to arrive at target database sizing?

-> no

289. Does DS support data scrambling? -> ya

290. Compare ETL features of DS with that of its competitors ? Chart is essential. -> yes parallel

291. Does DS generate Audit Trail?

292. What other reports could DS provide apart from Impact Analysis Document ? Is customizable reporting feature available?

-> ya (one option is there) 293. Could DS generate test cases to verify the veracity of mapping etc at design time to validate the mapping and transforms?

Page 30: d Stage Faqs

-> ya

294. Is there any mechanism available in Transformer Stage to show the left out nand un mapped fields from source and target stages?

-> RCP

295. Could DS handle object relational database table structures? -> yes

296. Does Meta Data export of Copy book and that data in the same available in DS ? -> yes

297. How is data compression handled in DS ?

-> compare stage

[email protected]

Learn it from ValueCap. They are ex-developers of PX. They can conduct on-site px course (with all bell and whistles).

[email protected]

CSC (Hyderabad) – 01/06/07.

298. How you are getting your source? -> flat files

299. How many routines you have made so far? -> 4

300. How will you connect to ODBC? -> computer controle pannel -> admin tools -> data source -> system dsn-> dsn name,tsn name, pwd

301. Is ODBC is used to connect the database or to store the data?

Page 31: d Stage Faqs

-> ya

302. Your output from ETL is used to analyses the data for others? So, what type of data ware house it is?

-> enterpize datawarehouse

303. Your project is slowly changing dimension, then how frequently you are getting your data? i.e weekly/monthly -> weekly or monthly for user rquirement

304. What are different source systems? -> flatfiles,oracle

WIPRO (Chennai) – 01/06/07.

305. What is surrogate key? -> A surrogate key is a system-generated (non-meaningful from a business perspective).it specified by uniqness -> It is the system generated key which cannot be edited by the user. -> It is the primary key of the dimension table in warehouse. -> It is nothing but the sequence which generates the numbers for primary key column.

306. What is the difference between primary key and surrogate key?

-> primary key is combitnation of uniqe and notnull. -> surrogate key is system genrated seqvance key

307. Where we use the primary key and where we use surrogate key? -> primary key is combitnation of uniqe and notnull.( ex: empid,customerid....) -> surrogate key is system genrated seqvance key ( suppose the primary key repeted in many times we genrate surrgote key)

308. How you get the timestamp in date type in Oracle?

-> SELECT CURRENT_TIMESTAMP FROM dual; -> TIMESTAMP datatype which stores all the information that the DATE datatype stores, but also includes fractional seconds.

309. What is multiple instances?

Page 32: d Stage Faqs

-> same job run in simultaneously on the same computer.

310. How you compare two tables using multiple instances?

-> If your goal is to compare two tables, send the output of a select * from my_table order by my_key to a file for each table and, then, use a compare utility (like UNIX diff).

311. How you control jobs in data stage? -> ya, using job properties -> job control lanuage it is used to run more number of jobs at a time with or without using loops.

312. What is the use of parameters? -> Dynamic value passing at runtime(while running of job) -> so ur job wil be dynamic take that parameters and do perform the work.

313. How do you give the parameters? Is it by hard coding or passing? -> passing

314. If the data in text file, is you use to hard code the parameters or pass ?

315. How you pass the parameters?

-> IN JOBPARAMATERS AT RUN TIME

316. How you trigger the job in windows?

317. If the data is in 10 files and to load in table, which stage will you prefer? -> same metadata (SEQ FILE SET)

318. What is the difference between dataset and sequential file? SEQFILE:

-> excute seqvance mode( read and write also seqvance) -> stored capcity : 2gb stored in hard disk -> did't support lookupfile -> support any extention like .txt,.csv.... -> performance less compare to dataset

Page 33: d Stage Faqs

dataset:

-> excute parall (read and write also parall) -> srored capacity more syored in logical memory -> support lookup file -> support only .ds extention -> performance high compare to seq file

319. How can you see the dataset in UNIX and windows?

-> orchadmin dump <dataset>.ds 320. How can you delete the dataset in UNIX?

-> create reusable object that many jobs within the project can include. -> When we go for parallel shared container the logic can be reusable across many jobs

321. What is the function of shared container?

322. You have the sequential files in funnel stage, the records of first file has to come first in target and second file records next and finally third. How you can achieve this?

-> seqvance funnel

323. What is the difference between remove duplicate & sort stage? -> REMOVE DUPLICATE: just remove the the duplicate data based on key -> SORT: just sot the data(ascending or desending) based on key. remove duplicates also posible in sort stage.

324. Is it not possible to delete the duplicate in sort stage? -> no