Friday, May 18, 2012

How to remove duplicates without using "Remove Duplicates" stage in DataStage

Step 1:Use Copy stage and use hash partitioning

Step 2:mark option perform sort to get sorted records


Step3: mark option stable( to get first value as distinct value in case of duplicates)

Step 4:mark option unique to get DISTINCT records

Step 5:select key fields for sorting and partitioning usage.

Thursday, May 17, 2012

Sort and Remove Duplicates stages in DataStage

Step 1: Drag all the stages.here in example we have chosen a  sequential file(.csv file to be specific) as the source and a complex flat file as the destination file

Input data


Step 2: the records in source are sorted in sort stage.Make sure to choose a key in order to sort the records based on the key field.In the example, the key field is chosen as shipdate.
Step 3: In the Remove duplicate stage also we need to specify the key (shipdate chosen in the example) based on which it will search for the duplicate record(s) and remove them.Hence the output will contain only disctinct records based on the key specified.
Output file will be

Hence we get distcinct records with each of them having unique shipdate value.

Tuesday, May 15, 2012

Using Column Export stage in DataStage


The Column Export stage is a restructure stage. It can have a single input link, a single output link and a single rejects link.



The Column Export stage exports data from a number of columns of different data types into a single column of data type ustring, string, or binary. It is the complementary stage to Column Import
  





Monday, May 14, 2012

Using Pivot in DataStage

                                                        


Here the source is taken as an excel file(.csv extension) and the data contains

Name Qty1 Qty2 Qty3
abc      10     20    30
def       70     60    50

After pivot stage (as described in figure) the derivation done is as follows in output column
                  Derivation
Name         Name
Qty             Qty1 Qty2 Qty3

(please make sure the datatype should be same in derivation column)


hence,The destination (flat file in thi example) will be

Name   Qty
abc       10
abc       20
abc       30
def        70
def        60
def        50


Using funnel and filter in DataStage


This is an example of parallel job in datastage

 Description:-


Using two excel files with same structure (condition for funnel).the output of funneling stage is input for filter stage giving where clause as id=1 which is loaded into another excel file as output

what is the main difference between lookup ,join and merge in DataStage


LOOK UP STAGE:
* One output link and one reject link.

* Faster if the volume of data is less than the memory size.

* If the size is too huge then DS uses scrap files to store the
   remaining data and that hinders the performance.



JOIN STAGE:
* Can have any no. of input links

* Does not use memory, look up is done on table level,
   so would be better to use if the data volume is pretty  huge.

* Doesn't have any reject link.

* Only one output link.



MERGE STAGE:
*Can have any number of input link and update

  links.

* Need to have a key column.

* Can have any number of reject link.

* Only one output link.