We are using Azure Datalake Gen2 Storage Container as our Ingest instance.
I have set a load job (incr_load) from a sql server database to the ingest instance.
This works fine during the day. When get into the next day the same job does a full load on my tables in the database. Why is this happening?
I really need to solve this asap :-)
regards,
Bjørn A.
Page 1 / 1
Hi @bjorn
I have some questions.
The check for full load needs to be unchecked, but it is the default so I would assume it to be set. Right?
How is the incremental rule set up for the data source?
When a full load happens you will see a new DATA_<some datetime> folder that your parquet files are stored in. Do you see this when it starts to do full loads?
Hi @Thomas Lind,
I have added a transfer task like this
My setup for incr load like this
I run this job/task through the TimeXtender O&DQ Desktop
I need to talk to my client to get access to storagecont, working on that….
regards,
Bjørn A.
Hi @bjorn
Why don’t you check for Updates?
Updates are necessary for the incremental load table to be able to compare rows that were already transferred once before. Otherwise it will just add the new rows without checking if it already exists.
The requirement for Updates is that you are sure the tables run with primary keys and that the fields in the Prepare Data Area table also are set to use these fields as primary key fields.
Thanks @Thomas Lind,
I will add “updates” and run the task, and check for status tomorrow morning
best regards,
Bjørn A.
Hi @bjorn
It may be necessary to run a synchronize against the Ingest instance from the prepare instance that hosts the table.
It should not make any visible changes, to do so.
Hi @Thomas Lind
All TX tasks running icremental works as expected during the evening.
I stopped all TX tasks and saw that there was data in all tables. Overnight all the parquet files have been deleted.
I don't have access to Azure at my customer's, unfortunately...
Are there any jobs/tasks in Azure that run automatically and can delete the Parquet files? How can I stop this or set the interval myself?
best regards,
Bjørn A.
Hi @bjorn
You really do need access to the data lake container to see what is going on.
You can read about the Storage Management Task here.
That is the only option for removing parquet files. It should be noted that it only removes the old full load files, that will not be taken into account anyway.