Solved

Max Threads when daisy chaining execution packages

  • 8 August 2023
  • 6 replies
  • 90 views

Hello!

I want to experiment a little and check if changing my Max Threads setting can improve execution times. I have one big package which includes multiple other packages as Steps.

If I change the Max Threads setting in the main package, do the packages from the included steps inherit this setting, or do I have to manually change the setting for each included step?

Side question; I have plenty of data for execution times at 4 threads. My plan is to do three runs at 3, 5 and maybe even 6 threads and compare the averages. Is that a good setup? 

Thanks.

icon

Best answer by rory.smith 8 August 2023, 14:37

View original

6 replies

Userlevel 6
Badge +5

Hi @Mvest 

The settings will not be passed on to the sub packages. However, if you use the Merge Steps option in the big package, it can start one of the sub packages before the other is done.

I would have the big package run in 1 thread without Merge Steps and then have 4, 5 or 6 threads and Merge Steps in each sub package.

Regarding how many threads to use is depending on how fast your server is. I have seen that using 8 or 12 was slower than using 6 because running 12 tables at once took more memory which made it slow.

Try out the different thread settings, if your Managed Execution setting is ExecutionTime, it will be faster after a few runs compared to using ExecutionNumber.

Userlevel 4
Badge +5

Dear @Mvest ,

As far as I know the main package does not overrule the other pakages.

But If you run multiple packages from one main package I would turn on the ‘Merge Steps’.

Be careful to only turn it on when you lineage is good, otherwise it will not work. When you turn on the Merge Steps my assumption is that it then will overrule the other package settings als it will merge all the steps over all the packages and build a new refresh schedule and runs that over the threads you've put in the package. 

Hope this helps

= Daniel

Userlevel 6
Badge +7

Hi,

 

there are some tricky details to this:

  • nested packages (Execution Packages in the Include Steps) run under the conditions of the calling package w.r.t. retries and threads etc, if Merge Steps is on. Otherwise they run sequentially under their own settings.
  • subsequent packages (under Run Package) keep their own settings - note that there is a difference in condition evaluation between scheduled and manually run here: 

     

  • Depending on your database server's resources and what your TX load does, you will find a bottleneck (Azure SQL DB's is usually Log I/O), you can usually increase threads until you start reaching that. You may find that different parts of your project have different workload, i.e. transfer vs cleanse which could benefit from a subsequent package setup or non-merged nested setup.
  • Merge steps unravels all the (sub)steps in your include steps and optimizes their order under the settings in the calling package. This should lead to the fastest execution with the least amount of empty space in your Gantt chart.
  • Be wary of complex nested structures with many levels and/or Excluded steps, this can be very expensive for TX to calculate.

@Thomas Lind Thanks, I will set up multiple test cases starting with setting the main package to 1 thread, and sub-packages to 4 threads with Merge Steps turned on (but only if the order of execution does not matter, or is TX smart enough to see the dependancies and execute the tables in the right order, even when multiple threads are running?).

@rory.smith I am definitely already hitting the 100% I/O on certain steps in the package, with an average of about 70% for the entire run. Do you mean that more threads is meaningless when you are already at 100%? We do sometimes get transaction log = full errors on production. 

Userlevel 6
Badge +7

Hi,

if you want 4 threads overal, set the main package to 4 and merge steps on with ExecutionTime as management option. TX knows the dependency order across your project as long as you are not using a lot of custom sql without parameters / object dependencies set.

 

If you are running in Azure PaaS databases, 100% I/O means you will be throttled by a governor process and only be slower due to more overhead. If you are running IaaS that can be slightly different as you control at the OS level and config settings on databases can matter. I typically see #threads <= #cores where you are only equal to cores if you are not I/O-bound.

Thanks, I will add that as one of my test cases.

Reply