Follow

How to handle deletes in a source based incremental load scenario

I often get the question about how to handle deletes in the source system on source based incrementally loaded tables in TX.

The source based incremental load method requires a field in the source table that can help us identify new or modified records – and this will be included in the selection rule for the table. Deleted records in the source system are obviously not affected by the selection rule and as a result the deleted data will still be present in the TX Staging Area and Data Warehouse.

Does this mean that we should not do incremental load on source tables where deletes could occur?

No! – but we need to find a solution. How do we get those deleted records deleted from our data?

I have with success used the following method.

The scenario is that I have a large table that I am loading incrementally:

p1.png

p2.png

To figure out which records has been deleted, I load the primary key fields from the table into the same staging area:

p3.png

I have selected not to have a physical valid table on the “DeleteCheck” table:

p4.png

The “DeleteCheck” table will contain the complete list of records that still exist in the source system. The records are uniquely identified by the primary key field(s).

Create a Script Action that deletes the rows in the Incrementally loaded that no longer exists in the “DeleteCheck” table. 

 p5.png

Set the Script Action as a Post Data Cleansing Script on the Incrementally loaded table:

p6.png

To ensure the correct execution order in managed Thread Execution, remember to setup table dependencies on the incrementally loaded table, so it becomes dependant on the “DeleteCheck” table:

p7.png

Note: If you want to do “Soft Delete” instead of physically removed deleted rows from the table, then you could change the script into an update statement instead – and update an “DeletedDateTime” column.

Warning! Be careful about executing the table individually. If the script is executed and there is no records at all in the “DeleteCheck” table, then all records in the incrementally loaded table is deleted and a full load is the only recovery.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

2 Comments

  • 0
    Avatar
    Charles Lorenz

    This was a very helpful and awesome document.

  • 0
    Avatar
    Jarle Opstad

    Any plans of automating this?

    In some cases I have run the check directly against the source-table. Seems to work ok if the source table has index on the primary key used.

Please sign in to leave a comment.