How to configure Max Threads for execution package
Hi, I am a bit unclear on how to configure Max Threads for an execution package.
The way I perceive it now is: if I run on 4 cores, I choose 4 Max Threads, if I run on 16 cores, I choose 16 Max Threads. i.e. N Cores -> N Max Threads. Is this the correct approach or are there other factors that need to be accounted for?
Page 1 / 1
Hi Sierd, This is a great question.
In this usage, max threads is referring to concurrency or How many SQL processes do you want to run simultaneously during an execution?
So, while processor performance can have an impact, this doesn't directly correlate to the number of cores on your processor. The setting that you choose ultimately depends on the overall performance of your environment which includes many factors such as data size, ETL complexity, RAM, I/O, and Processor speed.
Typically, I recommend taking the average of 3 full project executions at 3, 5, 7, and 9 threads. The execution history logs make is very easy to see the time of each of these runs and determine which will give you the overall fastest execution.
It gives you a nice overview of which components (CPU / MEM / Storage) are used the most with the different thread settings.
Of course you can only do this on your own VMs (unless your customer agrees). For our case we noticed that we had too much CPU in our VMs and not enough memory, so we did some reconfiguration.
On the plus side it also says which indexes are missing for maximum performance, so it helps adding indexes in your project .
Is it correct understood that when we talk about concurrency using Max Treads, then if you have staging tables that use some lookup fields from a let's say a header table and you have ex. 4 max treads you can't be sure that the line table doesn't inherit data from the header table if they are executed at the same time? Also you need to create separate executing packages ex. Staging, DWH, Cubes?
Hi Brian Besides running in more than one thread, you would normally also use one of three ManagedExecutions options.
These will make sure that the tables are executed in the correct order. So two tables that share lookup fields will not be executed at the same time, it will be executed after the source is done with the data cleansing before starting transfer from it. If you run in two threads, for example, it will still only use one in this case.