Solved

Incremental load based on history table with soft deletes

  • 26 April 2023
  • 13 replies
  • 327 views

Userlevel 2

I have table A with history enabled, all columns SCD type 2 except the PK and a record is marked as deleted when deleted in the source (with a seperate record).

In the next data area / data warehouse (DSA) i want to create an incremental table B based on table A, but I do not want to have deleted records from A in table B.
When follwing the incremental rule wizzard the incremental selection rule can be set based on the incrementaltimestamp but deletes can not be handled because table A has no deletes (is thomstone =1).

How to solve this the easiest? 
I can only think of taking along the Isthombstone field to table B and delete the records in a post script but this should be possible musch easier I would say.

Suggestions?

 

icon

Best answer by Roy V 9 May 2023, 11:14

View original

13 replies

Userlevel 5
Badge +5

Hi Roy

I am not 100% sure if this is your issue, but here is what I would do if I applied incremental load to a history table using deletes.

I have this history table with the following content.

So to avoid transferring the deleted rows, I made a data selection rule on my mapped table, like this.

This gave the following result.

 

Userlevel 2

Thanks Thomas.

But now one of the active records in DWh.History gets deleted so [Is Thomstone] will be set to 1.

Because of the selection rule in the MDW table this deleted record will not be passed to the MDW table.

So the MDW table has an active record that is deleted in the History table.

The question is how to achieve this record is deleted in the MDW table as it should not be there any more.

 

Userlevel 5
Badge +7

Hi,

because the last table in the chain is incremental it will not perform any such postprocessing on the valid table. You would need to write a post-process script to handle this I think. Note that if the tables are large this kind of processing becomes expensive.

Userlevel 2

thanks Rory, problem is indeed this delete afterwards is expensive and timeconsuming.
An MDW view based on the MDW table with a filter to exclude the deleted records from the source might be the best solution then wihen it is the last table in the chain (with [Is Thombstone] from the HIST table as a column in the MDW table in the example from Thomas) 
.

Userlevel 5
Badge +5

Hi Roy

Are you saying that rows are deleted from the source that should not be deleted?

I mean you say active records become tombstone, which means that it will have all data except the primary keys removed.

This was also part of my test and the row did get removed from my incremental table.

I then made it return and it would again be added to table b as it got a new DW_TimeStamp.

Userlevel 5
Badge +7

You could add a Custom View in the DSA on top of the original table with the filter you require and incrementally load this to MDW (as long as you have an appropriate field for Incremental Selection Rule).

Userlevel 2

so you are saying Thomas:
that an active record (with [HIST].[SCD Is Current] = 1)  in the first run is added to the incremental table in MDW.

That record is removed in the source so a new record is added in HIST with [SCD Thombstone] = 1.

When the MDW incremental table is processed again that record is deleted from the incremental table?

So when you set handle hard deletes on an incremental table it uses the ‘ SCD Is Thombstone’  on the source table if it has that column? 
 

Userlevel 5
Badge +5

Hi Roy

I missed explaining an option, which may be what makes it work for you or not.

I had to use SCD Is current in the selection rule as well, for it to be updated after it was removed initially.

 

If you still want rows that is not current in your incremental table and at the same time remove them when they are both, I would add an custom field, set to type 0, which you then update with a external script so it contains an 1 when this is the case.

 

This gives some other issues, such as primary key violation errors.

 

So I don’t know if it can work with IS Current not accounted for.

You can see how I tested it with these two projects. Add init first and run it once, then add changed and apply it to the existing project and see how it behaves with the script applied.

Userlevel 2

thanks Thomas , i will do some tests as well to see what happens in certain situations.
 

with incremental delete handling i would expect it compares primary keys in source and destination but it would be great that in case the source table is a historic table that it compares primary key in combination with the Thombstone field with the primary keys in the destination table

Userlevel 5
Badge +5

Hi Roy

I mean the other suggestions might work as well, I just found it odd that it could not work with the default options.

For keeping the non current rows in the incremental table, you would need to use an alternate primary key, possibly a supernatural key the comprises of all these fields. Then it could add them, but if you ever ran a full load they would be removed from the incremental table.

Userlevel 2

Hi Thomas, 

I did a test and the behaviour of TimeXtender is as I want it to be.
For determining which records should be deleted from the incremental table TimeXtender uses the primary keys of the mapping table with the selection rules applied.   
So in the DSA. etl.TESTROY_ICL table I did a run and a record is now marked as deleted, When execution my incremental table (see below) the incremental delete in the end checks if all the primary keys are available in the DSA with the filter SCD Is Current = 1 and Tombstone = 0.
As Thomstone = 1 (in the DSA) the record is deleted from the incremental table, as I hoped but did not know for sure yet but now I do. 
 

Of course if I remove the Tombstone Equal 0 part also removed records will remain but in this case that is not what I wanted.

Userlevel 5
Badge +5

Hi Roy

So you got it to work for you, that is good.

While I can’t see it in the image, I assume the history table also runs with incremental load.

I set your response to best answer. I hope you agree with that.

Badge +2

Hi,
we have a simular issue where we want the Tombstone flag (from the BU) to be loaded into the DSA and MDW table (as a soft delete marker).
We have History (all type I) on the BU ODX table with soft delete (tombstone flag).
On DSA we setup incremental load based on BU ODX DW_Timestamp.  Now it seems that setting the Tombstone flag on the BU ODX table doesn’t update the BU ODX DW_Timestamp, and as such this change is not loaded into DSA (incremental rule will not pick up this record).
To my opinion the DW_Timestamp column should also be updated when the Tombstone flag is set.  That would solve the issue.
Any ideas ?

Reply