Troubleshooting data cleansing rules

  • 22 February 2023
  • 2 replies



I have a project that is loading data from the ledger table into the staging database with approx, 5m records.

When manually executing the load, the ADO.NET transfer is taking 5-6 minutes, but the cleansing rules can take hours, to the point I have to terminate the load.

Is there any troubleshooting I can perform on the cleansing steps? I know I have taken the easy option on joins where the data types differ, and wonder if that is now compounding the issue?

Any tips or pointers greatly appreciated.




Best answer by Thomas Lind 22 February 2023, 13:22

View original

2 replies

Userlevel 6
Badge +5

Hi Richard

What makes a cleansing procedure take long is how many transformations that is done on each row and how much memory is available.

If you use all memory it will start using swap memory on your C drive, which is very slow compared to normal memory.

Using the batch data cleansing option you can decrease the amount of memory used as it will cleanse the amount of row set in each batch then release the memory and do it again with the next batch.
We had a question about this, I will refer to.

Another thing is how the transformations are done. Say you have three of or four fields all from one other table added as lookup fields. This can be done in one statement, instead of four if you lookup setting is set to Merge all if possible.

Lastly some specific joins are really slow, namely larger than smaller than join fields. If that is the case, you should attempt to split out values on dates to get it to be equal instead.

I hope some of these helps, otherwise specify what this specific table does.

Hi Thomas,

Thanks for the rapid response!

I checked the table settings and the “Merge all if possible” option was already on. However, batch cleansing was not, so I turned this on and followed the guidance in the article by setting the batch size as 250,000. This resulted in transfer times of ~4m30s (similar to before) and cleansing time of 7m30s!

After my initial post I restarted the application and SQL servers and this did improve things so I also need to get in the habit of closing the project down to release the memory as well.

Many thanks!