Solved

Incremental loading on table using a custom table insert with a union

  • 9 March 2023
  • 8 replies
  • 197 views

Badge

I have 3 tables in my src data warehouse:

  • src.SalesOrderItemsDelta, this table gets filled every day with a delta (changes today vs yesterday) of our order lines. Via 

     

  • src.SalesOrderItems, this table gets filled every Saturday night with all the order lines available.

     

  • src.preFactSalesOrderItems, this table gets filled via a custom table insert with the following table insert:
    SELECT
    [SalesOrderItemID]
    ,[SapClient]
    ,[SalesOrderNumber]
    ,[SalesOrderItemNumber]
    ,[SalesOrganization]
    ,[DistributionChannel]
    ,[Division]
    ,[FaboryArticleNumber]
    ,[SoldToCustomerCode]
    ,[BinCode]
    ,[CreatedOnDate]
    ,[CreatedOnDateID]
    ,[ChangedOnDate]
    ,[ReasonForRejectionCode]
    ,[PromisedDeliveryDate]
    ,[PromisedDeliveryDateID]
    ,[CommunicatedDeliveryDate]
    ,[CommunicatedDeliveryDateID]
    ,[PlantCode]
    ,[SalesAmount]
    ,[SalesCurrency]
    ,[ItemCategoryCode]
    ,[OrderedQuantity]
    ,[BinQuantity]
    ,[SalesOrderCategoryCode]
    ,[SalesOfficeCode]
    ,[ConfirmedDeliveryDate]
    ,[ConfirmedDeliveryDateID]
    ,[CommittedDeliveryDate]
    ,[CommittedDeliveryDateID]
    ,[RequestedDeliveryDate]
    ,[RequestedDeliveryDateID]
    ,[DeliveryStatusCode]
    ,[ShipToCustomerCode]
    ,[DocumentType]
    ,[CustomerPONumber]
    ,[WebshopOrderID]
    ,[PurchaseOrderType]
    ,[SalesDocumentType]
    ,[Location]
    ,[BinMainID]
    ,[MaterialSalesID]
    ,[MaterialGeneralID]
    ,[MaterialPlantID]
    ,[SoldToCustomerID]
    ,[CustomerArticleNumberID]
    ,[DW_TimeStamp] AS [srcDW_TimeStamp]
    ,[OrderReasonCode]
    ,[OrderReasonDescription]
    FROM [SalesOrderItemsDelta]
    UNION
    SELECT
    [SalesOrderItemID]
    ,[SapClient]
    ,[SalesOrderNumber]
    ,[SalesOrderItemNumber]
    ,[SalesOrganization]
    ,[DistributionChannel]
    ,[Division]
    ,[FaboryArticleNumber]
    ,[SoldToCustomerCode]
    ,[BinCode]
    ,[CreatedOnDate]
    ,[CreatedOnDateID]
    ,[ChangedOnDate]
    ,[ReasonForRejectionCode]
    ,[PromisedDeliveryDate]
    ,[PromisedDeliveryDateID]
    ,[CommunicatedDeliveryDate]
    ,[CommunicatedDeliveryDateID]
    ,[PlantCode]
    ,[SalesAmount]
    ,[SalesCurrency]
    ,[ItemCategoryCode]
    ,[OrderedQuantity]
    ,[BinQuantity]
    ,[SalesOrderCategoryCode]
    ,[SalesOfficeCode]
    ,[ConfirmedDeliveryDate]
    ,[ConfirmedDeliveryDateID]
    ,[CommittedDeliveryDate]
    ,[CommittedDeliveryDateID]
    ,[RequestedDeliveryDate]
    ,[RequestedDeliveryDateID]
    ,[DeliveryStatusCode]
    ,[ShipToCustomerCode]
    ,[DocumentType]
    ,[CustomerPONumber]
    ,[WebshopOrderID]
    ,[PurchaseOrderType]
    ,[SalesDocumentType]
    ,[Location]
    ,[BinMainID]
    ,[MaterialSalesID]
    ,[MaterialGeneralID]
    ,[MaterialPlantID]
    ,[SoldToCustomerID]
    ,[CustomerArticleNumberID]
    ,[DW_TimeStamp] AS [srcDW_TimeStamp]
    ,[OrderReasonCode]
    ,[OrderReasonDescription]
    FROM [SalesOrderItems]
    WHERE [SalesOrderItemID] NOT IN (SELECT [SalesOrderItemID] FROM [SalesOrderItemsDelta]) AND [DW_TimeStamp] >= DATEADD(HOUR, -24, GETDATE())

In my STG data warehouse we have the table stg.FactSalesOrderItems which loads the src.preFactSalesOrderItems via incremental loading. During the weekend execution we have full load enabled for this table.

 

But the problem we see is that the stg.FactSalesOrderItems only loads data during the weekend, although the information is available in src.preFactSalesOrderItems.

Any ideas how to solve this?

 

 

icon

Best answer by sven.debie 3 May 2023, 12:47

View original

8 replies

Badge

Additional information, the data is really there in src.preFactSalesOrderItems:

 

Userlevel 6
Badge +5

Hi Sven

My colleague is attempting to replicate the specific issue you experience with the union in a custom table insert. It should work once it is in the _V table with PK values set and a date time field to base the incremental load on.

I have another idea.

If you use normal table inserts like this.

I added a custom IncrementalTimeStamp field and I mapped the DW_TimeStamp field from the source.

I made an extra field in the Full load table where I created a field that contained a small query that gives back the current day. I applied this rule to the one insert so it will only be allowed to run on Saturdays.

Then whenever you execute it it will contain the delta load except on saturdays where it will be overwritten with the full load.

This should work for incremental load if you set it up like so.

 

Badge

This is similar to my setup, except I use a custom table insert. But the src.preFactSalesOrderItems contains all the right data. But the incremental load rule filters out too much apparently, so it doesn't load anything.

Userlevel 6
Badge +5

Hi
I can see that my setup should not work, for some reason it works fine, it really should be the IncrementalTimeStamp field I base my incremental rule on in the MDW area.

It should look like so.

1

2

3

I changed the rule to add the missing rows.

1

2

3

So this setup works for me. It adds the new rows correctly. Once it hits Saturday it will run the other insert as well and it should add all rows to the table with a higher date and add it to the MDW table.

Badge

The example given will not work. As src.SalesOrderItems will only be loaded on Saturday. So the contents of that table will be the same for the rest of the week and will be updated again in the following week. So it will do a full load every day instead of an incremental load.

Userlevel 6
Badge +5

Hi @sven.debie 

We had some back and forth in teams, where I seem to remember that you found a solution/workaround of this issue.

Just so we could find a post to set as the solution to this.

 

Badge

We have tried the suggested solutions and it didn't work for us, so we created a pre-processing step via databricks and then put the outcomes into a separate table and load the increments created via databricks from that table.

Userlevel 6
Badge +5

Regarding incremental load and multiple mapped tables, we have a guide that explains how it would be done when two tables are mapped into an data area.

 

Reply