Solved

Trying to Understand Incremental Performance

  • 8 January 2020
  • 2 replies
  • 31 views

So based on what I'm reading about TX, it seems to me that even if only say 5 records changed on a source file, TX loads the entire source table into a Raw table (all 40 gigs of it) and then finds the 5 records.

Is there a way to get it to just pull the 5 changed records into TX?  
My incremental executions seem to take about 50% of the time of a full execution even though there is little to no data change.

icon

Best answer by JTreadwell 8 January 2020, 23:55

View original

2 replies

Userlevel 3
Badge +5

Hi Mark, 

Thanks for posting. With incremental load configured correctly, given your scenario, it will only load the 5 records. You can see how many records have been transferred by previewing the raw table. Right click the table, click preview table, then click the "instance" drop-down and click Raw. This will only include the records transferred during the last load. 

You can also use the "execution overview log" also in the right click menu. This will show the time of the last select few executions. You can also switch the "measures" drop-down to show raw row counts or valid row counts.

You say 50% of the execution. What is the time for a full load vs an incremental load and how many records are we dealing with in both loads?

Badge

To add to this, one thing that might be contributing to your slower load times is if you have deletes enabled for the incremental load.  If that's the case, the load should only bring over the five new or updated records to put in the raw table.  However, it will also need to perform extra steps to determine if a record has been deleted. 

To do this, Discovery Hub brings in a copy of all of the primary keys in the source system table, and compares this key list against the local copy of the table.  For large tables, this can be a time-intensive process.  If you have a very narrow table with a composite key, using one of the delete options could potentially lead to the decrease in incremental performance time you are reporting.  

If that's the case, and if this table is very large and changes in it are very rare, a periodic reload of the table during a weekend or other downtime window might be a better solution than checking for deletes every single load. 

Reply