Skip to main content
Solved

How to configure Max Threads for execution package

  • January 31, 2022
  • 4 replies
  • 242 views

sierd.zuiderveld
Contributor
Forum|alt.badge.img+1

Hi, I am a bit unclear on how to configure Max Threads for an execution package.

The way I perceive it now is: if I run on 4 cores, I choose 4 Max Threads, if I run on 16 cores, I choose 16 Max Threads. 
i.e. N Cores -> N Max Threads. 
Is this the correct approach or are there other factors that need to be accounted for? 

Best answer by Thomas Lind

Hi Brian
Besides running in more than one thread, you would normally also use one of three ManagedExecutions options.


These will make sure that the tables are executed in the correct order. So two tables that share lookup fields will not be executed at the same time, it will be executed after the source is done with the data cleansing before starting transfer from it.
If you run in two threads, for example, it will still only use one in this case.

View original
Did this topic help you find an answer to your question?

4 replies

JTreadwell
Employee
Forum|alt.badge.img+5
  • Employee
  • 179 replies
  • January 31, 2022

Hi Sierd, This is a great question. 

In this usage, max threads is referring to concurrency or How many SQL processes do you want to run simultaneously during an execution? 

So, while processor performance can have an impact, this doesn't directly correlate to the number of cores on your processor. The setting that you choose ultimately depends on the overall performance of your environment which includes many factors such as data size, ETL complexity, RAM, I/O, and Processor speed. 

Typically, I recommend taking the average of 3 full project executions at 3, 5, 7, and 9 threads. The execution history logs make is very easy to see the time of each of these runs and determine which will give you the overall fastest execution. 


Forum|alt.badge.img
  • Contributor
  • 51 replies
  • February 1, 2022

Just wanted to add that we struggled with this as well and ran a free trial of this: https://www.spotlightcloud.io/

It gives you a nice overview of which components (CPU / MEM / Storage) are used the most with the different thread settings.

Of course you can only do this on your own VMs (unless your customer agrees). For our case we noticed that we had too much CPU in our VMs and not enough memory, so we did some reconfiguration.

On the plus side it also says which indexes are missing for maximum performance, so it helps adding indexes in your project :).


  • Starter
  • 1 reply
  • June 22, 2022

Is it correct understood that when we talk about concurrency using Max Treads, then if you have staging tables that use some lookup fields from a let's say a header table and you have ex. 4 max treads you can't be sure that the line table doesn't inherit data from the header table if they are executed at the same time? Also you need to create separate executing packages ex. Staging, DWH, Cubes?  


Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 1035 replies
  • Answer
  • June 22, 2022

Hi Brian
Besides running in more than one thread, you would normally also use one of three ManagedExecutions options.


These will make sure that the tables are executed in the correct order. So two tables that share lookup fields will not be executed at the same time, it will be executed after the source is done with the data cleansing before starting transfer from it.
If you run in two threads, for example, it will still only use one in this case.


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings