0

Batch Data Cleansing

Hi Guys,

The batch data cleansing is a good feature when you want to limit the size of the log file. However, if the table is incrementally loaded and you the need to do a full load of the table the current implementation is not ideal.

I suggest you check if the valid table is empty before deciding how to insert data into the valid table.

The current implementation (19.6.5):

INERT INTO ValidTable(Fields)
SELECT T.Fields
FROM T-View T
INNER JOIN #NonexistentRows N ON T.[DW_Id] = N.[DW_Id]
WHERE T.DW_Id BETWEEN @batchFirstId AND @batchNextId

In a full load scenario of an incremental table this could be optimised like this:

INERT INTO ValidTable(Fields)
SELECT T.Fields
FROM T-View T
WHERE T.DW_Id BETWEEN @batchFirstId AND @batchNextId

This eliminates the need to create the #NonexistentRows temp table as well as joining the T-View and the #NonexistentRows temp table on every batch insert.

 

Regards,

Lars

 

4 comments

Please sign in to leave a comment.