Incremental DWH

Incremental load in Data Warehouse Instances

  • 28 December 2022
  • 0 replies
  • 1361 views
Incremental load in Data Warehouse Instances
Userlevel 6
Badge +5

Relates to TimeXtender 6024.1 and later versions

Incremental Load Rules in a Data Areas

Tables in a Data Area can have several mappings and each mapping have individual incremental load rules, instead of an overarching one. This means that a table with multiple mapped ODX tables can have one mapped table run a full load, the other mapped table can run incremental with deletes, and a third mapped table can run an incremental load with updates.

Incremental loading uses the max batch number from the incremental table and the source table, then compares it with the Primary Key table (i.e. the _PK table) to determine whether there are updates, deletes or new rows.

In order to setup incremental load for a table in a Data Area, you will need to setup incremental load in the data source within the ODX instance, to which the table is mapped. For more information, see setting up Incremental Load in an ODX Instance.

Setting up incremental load on tables

Mapping a table from an ODX source to a Data Area table

When you map in a new table from a ODX, the data extraction setting for the table will be set as automatic. If the table has an incremental load rule that is applied in the source, the table will load incrementally. This will be represented by an "I" icon for the mapping in the mappings folder under the table. 

Below is an example of a table that is brought in from the ODX to the Data Warehouse instance, where the mapped table has an incremental load rule applied in the ODX data source.
 

It is possible to enable soft or hard deletes by right clicking on the table and selecting Table Settings, and selecting the Data Extraction tab. There is no need to change the data extraction from Automatic to Incremental Load, since the table is already using incremental load automatically based on the incremental mapping, however you can change it to incremental load if you wish.

If there is no incremental load rule that is applied in the source, then the mapping for the table will not display an "I" icon, and a full load will occur when the table is executed. See an example of this below.

Data Extraction is still set to Automatic by default, however since there is no incremental rule applied in the source and the table mapping is not incremental, and therefore the table will automatically be fully loaded. Therefore there is no requirement to change the data extraction from Automatic to Full Load, however you can change it to Full Load if you wish.

If you want to force a table that with an incremental load mapping, to always run with full load, you can control it there, by changing the setting to Full Load. Doing this will gray out the options for delete.

Mapping multiple tables to a Data Area table from multiple ODX tables

Incremental rules will be set on the individual mapped table, so you can have one running with incremental load and the other without incremental load without any issues. It does not require the setting to be changed from automatic.

 

Mapping tables from another Data Area

If you are mapping a table to a table from another Data Area, and the source table has an incremental rule setup in the ODX, we recommend that you use the Automate feature to base the incremental rule on the auto-suggested IncrementalTimeStamp field. This IncrementalTimeStamp field is mapped to the DW_Timestamp field of the source table. However, if you are mapping a table to a table from another Data Area, and the source table does not have an incremental rule setup in the ODX, we recommend that you use setup the incremental rule on a date field (e.g. a modified date field).

When the rule has been applied, the incremental load table will contain the incremental rule.

It will require that the data extraction settings are set to Incremental Load and not Automatic.

If you are mapping more than one table and have these set up for incremental load, it will still create one rule per mapped table.

If a table has a mapping from an ODX instance, and another mapping from a table in another Data Warehouse instance, it will not be possible to use incremental load. This is because one uses the DW_TimeStamp field and the other uses an ODX Batch number. You will have to split them out into two tables and then merge them at another point.

The incremental table

The _I table has seven fields.
 

  • Last id field
    • This is the max DW_Id of the table before the latest execution.
  • Mapping id field
    • This is the GUID value of the mapping of the two tables. It is used to make sure it always uses the correct table.
  • Status field
    • If you open a table during execution and goes to the _I table, this will contain an Null value, but after an successful execution it will be set to OK.
  • Create time field
    • This is the value that is applied as the incremental value have been added for each table.
  • ODX Batch Number field
    • Is the reference field in the _R table that is used to know what the largest batch is.
  • ODX Batch Timestamp field
    • Is the date of creation of the latest full load folder or table depending on the store.
    • Data Lake
    • SQL
  • ODX Batch Number Full Load field
    • This will show whether an execution was a full load or not.

Here is how it looks when two tables are mapped.

Also what you can see here is that the table have been executed at least twice, but haven't received new data from the source. That is why the ODX fields contain null values.

 

The raw table

There are two fields added, when adding data from an ODX to conform to how the incremental load works.

  • ODX Batch Number field
    • The reference batch number from the incremental table
  • ODX Batch Id field
    • The id of the batches. Necessary as you can map more than one table that each have a batch 0 and you would not be able to see the difference otherwise.

Here is how it looks when there is new data in the raw table.

If you relate that to the incremental table, the batch number have in the same numbers.

Here is what the reference in the data lake is. Each file has an higher number, which is the Batch number in the other tables.

Here is what the reference is in an SQL store. It is a field containing the batch number.

Changes to the valid table

The only change is that since we are not using a specific field in the source to generate an Incremental field, we don't have that.
 

Deletes

As in the other parts it is also possible to set up deletes on the Data Area tables.

It works by comparing the new rows with the primary key and then deleting the row or changing the Is Tombstone field value from False to True.

 

You can see how this is done in the code by looking at the data cleansing procedure.

Here it is for the Country table using Hard deletes.

-- Incremental load: primary key hard deletes

IF EXISTS
(
SELECT TOP 1 1
FROM [NAV].[Country_I]
)
BEGIN
DELETE V
FROM [NAV].[Country] V
WHERE
NOT EXISTS
(
SELECT TOP 1 1
FROM [NAV].[Country_PK] P
WHERE
P.[Company] = V.[Company]
AND P.[Code] = V.[Code]
)
AND NOT EXISTS
(
SELECT TOP 1 1
FROM [NAV].[Country_R] R
WHERE
R.[Company] = V.[Company]
AND R.[Code] = V.[Code]
)
END
Additionally you can also see how the Updates and Soft Deletes will be applied in there.

0 replies

Be the first to reply!

Reply