Hey all,
We use incremental load in the ODX on a datetime field and we use an offset of 3 days (again in ODX). Now we've noticed that when we load a table from ODX to the data warehouse (even when full loading) that the _R table can contain duplicate records.
In our Data Lake each increment is a parquet file of the last 3 days of data, since we load every day these files can overlap by 2 days. I believe this overlap is the cause of the duplicates in our _R table.
What happens next is that the Error report gets spammed with “Violation of primary key constraint” (which is true). But our Valid table does have the expected result.
A full load transfer task solves the issue, but of course we do not want to do that all the time.
Is this the intended design? Is there a page that explains how the ODX handles these parquet files before inserting them into the Raw table?
Any suggestions or insights on how to handle this issue would be greatly appreciated.