How incremental loading works

There are two forms of incremental loading in TX DWA - source-based and target-based.  These two forms have different characteristics, and come with certain trade-offs.  Determining when and where to use these incremental techniques is much easier if you understand the basics of how they work.

Source based incremental loading

This form of incremental loading is the faster of the two, and so is used in most cases.  

Enabling this option requires that the source table have a reliable natural or surrogate key, and a reliable incremental field.  This field must be set to a larger value when a record has changed.  Usually, this will be a single field called something like "ModifiedDateTime" or "TimeStamp."

When you enable this incremental option and deploy and execute the table, TX DWA will fully load the table and create an _I table, which records the largest incremental field value from a given load.  This allows each subsequent load to only transfer records with an incremental value larger than the one recorded in the _I table, which is the set of new or updated records.  

If you are using certain adapters TX DWA may create an _INCR table instead of an _I table.  The _INCR table works on the same principle that an _I table does, but records incremental values for each source company or account in the source system. 

After the initial load, the valid table will not be truncated unless there is a change in a data selection rule, incremental selection rule, or primary key.  A full reload can also be triggered by the user.

During data cleansing, TX DWA will compare the transferred records to the records in the valid table using on the table's primary key.  Existing records will be updated and new records will be inserted, in that order.

Note that unchanged records will not be refreshed in an incremental load.  That means means transformations, lookups and new fields will only be updated for new or changed records.  We have listed some ways of dealing with this limitation here

If you have enabled deletes, TX DWA will create a _PK table when you deploy this option.  This table maintains a live record of all primary key combinations present in the source, and is populated after the _R table.  If a record is present in the valid table but not in the _PK table and _R table, that record will be flagged or removed. This step happens after the update and insert steps mentioned above.

Target based incremental loading

This incremental option also requires that the source table have a reliable natural or surrogate key, but does not rely on an incremental field or fields.  Instead, target based incremental load detects changes in one or more specified non-key fields and updates any existing records accordingly.

This method can be slow, but it may give a performance boost over a full load in certain circumstances.  It also allows the user to choose whether the load inserts new records, updates existing records, deletes records no longer present in the source, or any combination of the three.  In some cases, this flexibility can be extremely valuable.  

When you enable target based incremental loading, TX DWA will fully load the table and add two additional fields - a hash of the table's primary key called the Incremental Hash Key, and a hash of the fields you wish to detect changes in, called the Incremental Value Key.  

Just as with source-based incremental loading, the table will be populated after the first load and will remain populated moving forward, barring the changes listed in the previous section or a user command.

Unlike source based incremental loading, all records will be copied over from the source.  

During data cleansing, these records will then be compared to the valid table on their Incremental Hash Key.  Where the records exist and the Incremental Value Key has changed, that record will be updated.  Where there is no match for the Incremental Hash Key in the new data set, the record will be inserted.  Finally, where the Incremental Hash Key is present in the valid table, but not in the new data, the record will be deleted.

One, two, or all of these functions may occur depending on how the incremental load is configured, but they will occur in the order listed above.

Again, new fields will not be populated nor old fields refreshed in existing records where no change is detected in the source.

Was this article helpful?
1 out of 1 found this helpful


Please sign in to leave a comment.