Skip to main content
Solved

How to use Rolllup task


Forum|alt.badge.img

Hello Community!

When running a rollup task, the resulting rollup file ends up in a separate folder inside the same full load folder:

 

  1. After the rollup, new incremental files continue to appear in the DATA folder and the old incremental files are still there. What have we achieved by doing the rollup? Is the new rollup file being used at all?
  2. How can we delete the old incremental files given that the delete Storage Management task only removes entire full load folders and we don’t perform full loads?

Currently I don’t understand how we can clean up the many incremental files, both for load performance and storage management.

Please note: We cannot rely on doing regular full loads because of the size of the tables.

Thanks!

TX version: 6745.1

Connector: TimeXtender SQL Data Source 21.0.2.1

We are using Azure Data Lake and ADF.

Best answer by Christian Hauggaard

Hi @sigsol 

The rollup will merge multiple batches into a single batch. for example batch 1,2,3 will be one file containing all data. Existing individual batch files will not be deleted, this is because if a table in a Prepare instance is currently on batch number 2 and you delete the single batch files for 1, 2 and 3 and only have the rollup file, consequently TimeXtender Data Integration would not be able to determine which data is missing for that table in the Prepare instance and the only solution would be to do a full load. Keeping the single files allows this transfer to use the batch 3 file instead of the rollup file.

However in all scenarios where a part of the batch in the rollup file is not needed, TimeXtender Data Integration will use the rollup file instead of the single files. So using the same example above, if you also had another rollup file containing batch number 4,5 and 6, this transfer would use the single file for batch number 3 and the rollup file for 4,5 and 6, in order to improve load performance.

For more information please see the following article:

"When a data source is configured with frequent incremental loads without the occasional full load, the Ingest Instance may generate a lot of small files in data lake storage. The Rollup option offers to roll up (or consolidate) individual files into larger files, to increase load performance."

Please let me know if you have any follow up questions

View original
Did this topic help you find an answer to your question?

8 replies

Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 951 replies
  • October 1, 2024

Hi @sigsol 

The above explains the current way this works, it haven’t changed yet.

Are you running without updates and deletes in the incremental rule?


Forum|alt.badge.img
  • Author
  • Contributor
  • 25 replies
  • October 1, 2024

Hi @Thomas Lind 

Your link is my previous question about getting the Rollup task to run, not how it works when it’s running. So I can’t see how that answers any of my questions.


Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 951 replies
  • October 1, 2024

Hi @sigsol 

My point is that as far as I am aware no changes have been made to how it works, so if you see this behavior as new, it is not.

Do you now have the suggested setup without updates or deletes in the incremental setup?


Forum|alt.badge.img
  • Author
  • Contributor
  • 25 replies
  • October 1, 2024

Hi @Thomas Lind 

My questions are not about anything being new. This is the first time I’m actually using the rollup feature, and I couldn’t find an answer to these questions. The rollup apparently works fine and I get a rollup file in a new folder. Then I have these questions:

  1. After the rollup, new incremental files continue to appear in the DATA folder and the old incremental files are still there. What have we achieved by doing the rollup? Is the new rollup file being used at all?
  2. How can we delete the old incremental files given that the delete Storage Management task only removes entire full load folders and we don’t perform full loads?

Forum|alt.badge.img
  • Author
  • Contributor
  • 25 replies
  • October 1, 2024

@Thomas Lind I expected that the rollup task would replace the numerous incremental files with the rollup file, but this does not happen. I then don’t see the point of the rollup task.


Christian Hauggaard
Community Manager
Forum|alt.badge.img+5

Hi @sigsol 

The rollup will merge multiple batches into a single batch. for example batch 1,2,3 will be one file containing all data. Existing individual batch files will not be deleted, this is because if a table in a Prepare instance is currently on batch number 2 and you delete the single batch files for 1, 2 and 3 and only have the rollup file, consequently TimeXtender Data Integration would not be able to determine which data is missing for that table in the Prepare instance and the only solution would be to do a full load. Keeping the single files allows this transfer to use the batch 3 file instead of the rollup file.

However in all scenarios where a part of the batch in the rollup file is not needed, TimeXtender Data Integration will use the rollup file instead of the single files. So using the same example above, if you also had another rollup file containing batch number 4,5 and 6, this transfer would use the single file for batch number 3 and the rollup file for 4,5 and 6, in order to improve load performance.

For more information please see the following article:

"When a data source is configured with frequent incremental loads without the occasional full load, the Ingest Instance may generate a lot of small files in data lake storage. The Rollup option offers to roll up (or consolidate) individual files into larger files, to increase load performance."

Please let me know if you have any follow up questions


Forum|alt.badge.img
  • Author
  • Contributor
  • 25 replies
  • October 14, 2024

Hi @Christian Hauggaard 

Thank for a great explanation on the functional aspect of the Rollup task! I absolutely see the point now.

So it seems that TimeXtender does not have an option to delete the old incremental files. What are your recommendations to free up space in the scenario where doing a full load is not an option? 

Off topic, but I think the words “primary key” are missing from this note in the article:

Note: Rollup incremental data with ADF option currently does not support tables which have incremental load with <primary key> updates and deletes enabled


Christian Hauggaard
Community Manager
Forum|alt.badge.img+5

Hi @sigsol 

There is currently no other option besides doing a full load. Please create a product idea here: https://support.timextender.com/ideas

I have updated the article now, thank you for pointing this out.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings