Wednesday, December 26, 2012

Sequential Files in DataStage

1)Sequential Files are of two types -* Fixed Width file * Delimited File

*Fixed Width
example
123abc111
456cd 789

 Column  Column Length
id              3
name         3
code          3
                 --
                 9
total length=9  123+abc+111=9 or 456 +cd' ' + 789=9

Columns can be declared with relevant data types(char preferred)  with proper length mentioned in column tab of input file properties.

if the length mentioned in column tab is less than that present in the record it produces "SHORT READ" error on running the job.

this method ensures proper check of accuracy where one can refer the properties of input seeing the column tab

Other Method is the use of transformer to cut the record using left ,right etc string functions.this is difficult for maintenance purpose.

It depends on the purpose of job which active stage has to be used.If only segregating the column is the purpose follow the first method and use copy active stage and load into target.  

Sometimes the client demands report of error consisting of which record,name of row,column value,which part of column etc produced an error.such data can be created using routines written in C/C++ in transformer stage.
Use of quotes depends on the input file and can be set accordingly.
Besides these,one has to make sure to set following three properties as  the MUST in input

Record Length=Fixed
Record Delimiter=UNIX Newline
Delimiter=None
-----------------------------------------------------------------------------------------------------
*Delimited File

Delimited file is a sqquential file where records consist of delimiters to separate columns.
Eg.

id,name,code
1,abc,888
2,def,999

or

"1",abc,"888"
"2","def","999"

Delimiter can be anything comma,hash,etc

Note: 1)

id,name,code
1,ab,c,888
2,def,999

such files are not accepted or allowed since delimiter is part of data ab,c

Delimiter SHOULD NOT BE PART OF DATA

2)
id,name,code
"1","ab,c","888"
"2","def","999"

this is accepted since quotes are used to separate data .Here Delimiter can be used as part of data.

One must make sure when using delimited file
i)no of delimiter should be same in all the records
ii)the record having inconsistent no of delimiters will produce a warning on running job and wont be present in target but the job will RUN.




No comments:

Post a Comment