Follow

How to configure segmented table executions

When backloading very large tables it may become necessary to load a table in smaller segments or "chunks". The following instructions explain how you can configure a segmented table execution in Discovery Hub using a combination of project variables, data selection rules, and Incremental load. These instructions use an example table called [InvoiceLines] and an example date field called [LastEditedWhen]. 

Prerequisites

In order to support this configuration, you must enable Incremental Load, so the table must have a reliable primary key and field with an incremental value that increases with each insert AND update. This is typically a ModifiedDate field, but other data types may also work as well.

Configuration 

  • Enable Incremental Load on the table you wish to configure segmented table execution, in this example [InvoiceLines]. The Incremental selection rule MUST be the SAME FIELD used in the project variable in the next step, in this case [LastEditedWhen]. At this point, you should deploy the table, but don't execute.
  • This configuration will load data based on the maximum value available in the ODX. So if you prefer to backload all available data you will want to truncate the table during the next execution. 
  • Create a Dynamic Project Variable called [DynamicCeiling]

2019-01-29_11h35_33.png

  • Type: Dynamic, Resolve: Every Time, Context: ODX

2019-01-29_11h44_38.png

  • Click "Script Editor" and insert this script: SELECT ISNULL(DATEADD(month, 1, MAX([LastEditedWhen])),'2000-01-01') FROM [InvoiceLines]

2019-01-29_11h43_51.png

This script will select the greatest value in the [LastEditedWhen] field in the [InvoiceLines] table in the ODX, and add 1 month (This is the segment of data that will be loaded during each execution. You may adjust this increment as desired). If the table is empty it will use the static date of 2000-01-01. This static date should be updated to your desired value, hypothetically 2 weeks after the earliest ModifiedDate in the source table. NOTE: The date field used in this script must be the same field used as the incremental selection rule on the table. 

  • Be sure to click the “show value translation” radio button to validate the script is working correctly. If the table is empty, the result should be your static value. If correct click OK and return to the project.

2019-01-29_12h01_28.png

  • Add a Custom Data Selection Rule on the desired table [InvoiceLines] in the data source.

2019-01-29_12h05_42.png

  • Drag in the same Date field used previously from the right side pane, type "<=" (less than or equal), then drag in the project variable from the right side pane. IMPORTANT: you must place single quotes around the variable in the script. Your custom data select rule should look like this:

2019-01-29_13h05_46.png

With this configuration, during every execution, discovery hub will check for the maximum value in this date field in the ODX. It will then add 1-month to that date and only load records within that segment. Loading 1 additional month of data with each subsequent execution. Your source table should now look like this: 

2019-01-29_13h08_44.png

NOTE: If you DO NOT wish to load the table from the earliest date available in the source you must add an additional data selection rule to the table. For example, if you only wanted to load data after the year 2004 you could add an additional constraint to the existing data selection rule such as LastEditedWhen <= 2005-01-01. 

2019-01-29_13h17_45.png

Execution

  • Include all three tables (ODX, DSA, and MDW) in a single execution package. With this configuration, all three tables will load the same segment of data configured in the ODX. 
  • Finally, you can schedule this execution package to recur using your desired frequency. After enough subsequent executions, this table will be fully backloaded with all data in the source. 
Was this article helpful?
1 out of 1 found this helpful

2 Comments

  • 0
    Avatar
    Mark Chinsky

    So I've gotten this working using an integer my ERP solution creates as an incremental number for every record created or modification in the system among all tables.  That is what I use for my incremental loads to ensure I get new and modified records.

    That has always worked but is causing a problem with this.

    The problem is this is a part transaction history table.  This client, did some massive costing adjustments 8 years later on a large range of transactions.  Prior to this, the last sysrevid was about 20,788,887.

    Then suddenly in the file it jumps to 59,827,490.  This far exceeds the chunk 'increment'.  I don't have a 'last edited' date.

    If I increase the chunk increment big enough to handle this jump, it causes too many transactions to load per chunk prior to this large 'gap'.

    Any thoughts or is there a SQL formula I could use?  I do have a system creation date & time, and a sysrowid, which is a unique number when created, but the article says it must be the same as used by the incremental load and neither one of those will work for modified rows.

    Thoughts?

     

    Edited by Mark Chinsky
  • 0
    Avatar
    Lennaert van den Brink

    In reply to Mark's comment: I had a similar issue with a client of mine and there I solved it by running the chunking until it stalled at the gap, then manually adjust the dynamic variable to make a bigger jump and execute once. After that I reverted the dynamic variable back to the smaller segments and let it continue as before.

    In addition to the solution itself: With large tables the max(lasteditwhen) can become very slow as the table itself is getting larger and larger. One solution I found to optimizing this is when you are able to load incrementally (with a table that large you should strive to do that anyway). In that case you can use the increment value in the _I table as a replacement for the lasteditwhen field since those will give the same result. A max over 1 record is significantly faster then a max over several million...

Please sign in to leave a comment.