Thursday, May 17, 2012

Sort and Remove Duplicates stages in DataStage

Step 1: Drag all the stages.here in example we have chosen a  sequential file(.csv file to be specific) as the source and a complex flat file as the destination file

Input data


Step 2: the records in source are sorted in sort stage.Make sure to choose a key in order to sort the records based on the key field.In the example, the key field is chosen as shipdate.
Step 3: In the Remove duplicate stage also we need to specify the key (shipdate chosen in the example) based on which it will search for the duplicate record(s) and remove them.Hence the output will contain only disctinct records based on the key specified.
Output file will be

Hence we get distcinct records with each of them having unique shipdate value.

No comments:

Post a Comment