Solved

Primary Key Violation using Incremental Offset

1 year ago
January 16, 2024
8 replies
193 views

sierd.zuiderveld
Contributor
26 replies

Hey all,

We use incremental load in the ODX on a datetime field and we use an offset of 3 days (again in ODX). Now we've noticed that when we load a table from ODX to the data warehouse (even when full loading) that the _R table can contain duplicate records.

In our Data Lake each increment is a parquet file of the last 3 days of data, since we load every day these files can overlap by 2 days. I believe this overlap is the cause of the duplicates in our _R table.

What happens next is that the Error report gets spammed with “Violation of primary key constraint” (which is true). But our Valid table does have the expected result.

A full load transfer task solves the issue, but of course we do not want to do that all the time.

Is this the intended design? Is there a page that explains how the ODX handles these parquet files before inserting them into the Raw table?

Any suggestions or insights on how to handle this issue would be greatly appreciated.

Best answer by Thomas Lind

Hi @sierd.zuiderveld

Yes you need to use updates on your incremental rules.

That is why you get the primary key violation errors.

The DW instance can’t do updates on incrementally loaded tables if the options isn’t set. It will expect all rows to be unique and do no checks of whether it exists in the valid table already.

I did attempt to explain this in our incremental load guide.

Incremental Load in an Ingest Instance Incremental

Perhaps the link between ODX and DWH isn’t obvious.

Due to how incremental load works when connecting to an ODX instance, a DW incremental table will only behave as it does because of the data sources incremental rules.

View original

Did this topic help you find an answer to your question?

daniel
TimeXtender Xpert
192 replies
1 year ago
January 16, 2024

Dear @sierd.zuiderveld ,

Which version do you use?

I've had this happen to me once and it came down to a setting which I forget to turn on in the Incremental Load.
In the Incremental selection rule you can set up the Updates and Deletes.

You need to select the updates (always, otherwise you get duplicates. Unless the tables you are loading is 100% transactional) or the Deletes (or both). Mind you that the selecting the check for deletes option might kill your performance.

Hope this helps
= Daniel

sierd.zuiderveld
Author
Contributor
26 replies
1 year ago
January 16, 2024

Hey @daniel,

Thanks, we do load with deletes. But not with updates. Do you think that with updates would solve our issue?

daniel
TimeXtender Xpert
192 replies
1 year ago
January 16, 2024

Hey @sierd.zuiderveld ,
I think so yes. As the record (Primary Key) is not deleted, but changed it will add the record to the set, but TX will not replace the record so now you've got two records with the same PK.

Thomas Lind
Community Manager
1095 replies
Answer
1 year ago
January 17, 2024

Hi @sierd.zuiderveld

Yes you need to use updates on your incremental rules.

That is why you get the primary key violation errors.

The DW instance can’t do updates on incrementally loaded tables if the options isn’t set. It will expect all rows to be unique and do no checks of whether it exists in the valid table already.

I did attempt to explain this in our incremental load guide.

Incremental Load in an Ingest Instance Incremental

Perhaps the link between ODX and DWH isn’t obvious.

Due to how incremental load works when connecting to an ODX instance, a DW incremental table will only behave as it does because of the data sources incremental rules.

Christian Hauggaard
Community Manager
1161 replies
1 year ago
January 30, 2024

Hi @sierd.zuiderveld has this issue been resolved? If so please help us by marking a best answer above, if you have any follow up questions please let us know

JogchumSiR
Contributor
38 replies
1 year ago
February 22, 2024

@Thomas Lind: This is still an issue at our environment. Could you shed your light on how the ‘Incremental load with primary key updates’ works? Maybe we can figure out then why we get the primary key violation errors in the transfer from ODX to MDW.

Thomas Lind
Community Manager
1095 replies
1 year ago
February 22, 2024

Hi @JogchumSiR
It starts with a request.

SELECT max(incrementaldatefield) FROM table WHERE incrementaldatefield > (last max value)

Then if there is a value higher than this it will run

SELECT * FROM table WHERE incrementaldatefield > (last max value)

So all the fields that have an higher date than this will be added.

There is no check for updates on the transfer to the data lake parquet file. The rows with a higher value is just added in a new batch file. If you use a program to read the values in the files you will likely find rows with the same primary keys.

When you then transfer from ODX to DW it will attempt to add the new values to the existing ones if an row with PK values is the same and have a higher value.

If the source table is transactional, meaning it will add a new row for each change and there is multiple changes in one transfer, then maybe the additional ones will be sent to the errors view.

JogchumSiR
Contributor
38 replies
1 year ago
February 22, 2024

@Thomas Lind : Thanks for your explanation. We've profiled it ourselves and did not see extra queries as opposed to only having the ‘Incremental load with primary key deletes’ and your story confirms this.

For updates, we expected some kind of extra file or field with the latest update timestamp in the source so it can compare with the other data, but we don't see that. In that regards it is a bit like a black box for us.

From our information having the updates settings set to on, would bring a big performance drawback in the ODX loading process. We will go on and test this and check if it resolved the primary key violation errors and does not result in a big performance dip opposed to having it of.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

Improve execution times by splitting data out on individual dates

Each date as individual row based on from and to datesicon

Troubleshooting data cleansing rulesicon

CTE (Common Table Expression)icon

Date differenceicon

Most helpful members this week

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings