Wednesday, December 26, 2012

Sequential Files in DataStage

1)Sequential Files are of two types -* Fixed Width file * Delimited File

*Fixed Width
example
123abc111
456cd 789

 Column  Column Length
id              3
name         3
code          3
                 --
                 9
total length=9  123+abc+111=9 or 456 +cd' ' + 789=9

Columns can be declared with relevant data types(char preferred)  with proper length mentioned in column tab of input file properties.

if the length mentioned in column tab is less than that present in the record it produces "SHORT READ" error on running the job.

this method ensures proper check of accuracy where one can refer the properties of input seeing the column tab

Other Method is the use of transformer to cut the record using left ,right etc string functions.this is difficult for maintenance purpose.

It depends on the purpose of job which active stage has to be used.If only segregating the column is the purpose follow the first method and use copy active stage and load into target.  

Sometimes the client demands report of error consisting of which record,name of row,column value,which part of column etc produced an error.such data can be created using routines written in C/C++ in transformer stage.
Use of quotes depends on the input file and can be set accordingly.
Besides these,one has to make sure to set following three properties as  the MUST in input

Record Length=Fixed
Record Delimiter=UNIX Newline
Delimiter=None
-----------------------------------------------------------------------------------------------------
*Delimited File

Delimited file is a sqquential file where records consist of delimiters to separate columns.
Eg.

id,name,code
1,abc,888
2,def,999

or

"1",abc,"888"
"2","def","999"

Delimiter can be anything comma,hash,etc

Note: 1)

id,name,code
1,ab,c,888
2,def,999

such files are not accepted or allowed since delimiter is part of data ab,c

Delimiter SHOULD NOT BE PART OF DATA

2)
id,name,code
"1","ab,c","888"
"2","def","999"

this is accepted since quotes are used to separate data .Here Delimiter can be used as part of data.

One must make sure when using delimited file
i)no of delimiter should be same in all the records
ii)the record having inconsistent no of delimiters will produce a warning on running job and wont be present in target but the job will RUN.




Friday, May 18, 2012

How to remove duplicates without using "Remove Duplicates" stage in DataStage

Step 1:Use Copy stage and use hash partitioning

Step 2:mark option perform sort to get sorted records


Step3: mark option stable( to get first value as distinct value in case of duplicates)

Step 4:mark option unique to get DISTINCT records

Step 5:select key fields for sorting and partitioning usage.

Thursday, May 17, 2012

Sort and Remove Duplicates stages in DataStage

Step 1: Drag all the stages.here in example we have chosen a  sequential file(.csv file to be specific) as the source and a complex flat file as the destination file

Input data


Step 2: the records in source are sorted in sort stage.Make sure to choose a key in order to sort the records based on the key field.In the example, the key field is chosen as shipdate.
Step 3: In the Remove duplicate stage also we need to specify the key (shipdate chosen in the example) based on which it will search for the duplicate record(s) and remove them.Hence the output will contain only disctinct records based on the key specified.
Output file will be

Hence we get distcinct records with each of them having unique shipdate value.

Tuesday, May 15, 2012

Using Column Export stage in DataStage


The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link.



The Column Export stage exports data from a number of columns of different data types into a single column of data type ustring, string, or binary. It is the complementary stage to Column Import
  





Monday, May 14, 2012

Using Pivot in DataStage

                                                        


Here the source is taken as an excel file(.csv extension) and the data contains

Name Qty1 Qty2 Qty3
abc      10     20    30
def       70     60    50

After pivot stage (as described in figure) the derivation done is as follows in output column
                  Derivation
Name         Name
Qty             Qty1 Qty2 Qty3

(please make sure the datatype should be same in derivation column)


hence,The destination (flat file in thi example) will be

Name   Qty
abc       10
abc       20
abc       30
def        70
def        60
def        50


Using funnel and filter in DataStage


This is an example of parallel job in datastage

 Description:-


Using two excel files with same structure (condition for funnel).the output of funneling stage is input for filter stage giving where clause as id=1 which is loaded into another excel file as output

what is the main difference between lookup ,join and merge in DataStage


LOOK UP STAGE:
* One output link and one reject link.

* Faster if the volume of data is less than the memory size.

* If the size is too huge then DS uses scrap files to store the
   remaining data and that hinders the performance.



JOIN STAGE:
* Can have any no. of input links

* Does not use memory, look up is done on table level,
   so would be better to use if the data volume is pretty  huge.

* Doesn't have any reject link.

* Only one output link.



MERGE STAGE:
*Can have any number of input link and update

  links.

* Need to have a key column.

* Can have any number of reject link.

* Only one output link.