Hi all,
I'd like to start a discussion about the batch cleansing option. You can find this option in: Table Settings → Performance → Enable batch data cleansing.
This option is disabled by default. I'm looking for best practices when you should use it and how to determine the best batch cleansing size.
I normally enable it on big tables (5.000.000 + rows), this reduces the disk space used for logging. But in some cases it also improves performance. It can result in faster loading of a table. However, I'm always struggling determining the correct batch size. I know the answer will be ‘it depends’. But I'm trying to find some best practices for this option.
For now I start at 250.000 rows for batch cleansing and increase with 250.000 rows. I write down the timings and pick the best setting. This setting depends on the number of rows and number of transformations in the table.
I'm curious if there are any practices or tips and tricks for this?