Skip to main content

Hello,

we're using Timextender with ADF for the data movement between our Data Lake storage and SQL Database. Running the execute from TimeXtender of rerun it within ADF the performance is poor (slow!).

Our setup at the moment, regarding to the specs within the topic "Create a Demo Template Project TX SaaS”. We wanted to start basic and then tweak the settings for better performance. Executing the first

 

SQL Database

  • Service tier: General Purpose
  • Compute tier: Serverless
  • Hardware:    Standard (Gen5), Min/Max vCores 10 and 30GB Min/Max Memory
  • Auto-pause delay: 1 hour

Integration Runtime - type Azure

  • Computer Size: small

We tried using a different Computer Size on the IR and also changed the SQL DB Tier for more DTU. No performance improvement after changing.

Looking at one of the jobs we are transferring 2048 rows from our Data Lake to our Azure SQL.

The Queue is taking most of the time. We did some research on the support page of TimeXtender (No topics) and also at the Microsoft community. There are several topics about Slow Performance when using the function “ForEach”:

" Update queries at the same time as the lookup which is trying to update the table is within foreach. With the increase in the number of concurrent queries running in the database , it might have blocked the resource class and in turn one query is blocking other queries. (link to source)”.

 

Wondering if this is a familiar problem and can we fix it using TimeXtender or do we need to tweak more settings within ADF? If so, this will require also (more) knowledge of ADF. As TimeXtender is building up the Pipelines and Linked Services I'm wondering if it is wise to change settings on the ADF side.

Tahnk you and hopefully somebody can help us.

 

 

Did some reading about Integration Runtime and there is a difference. We're using the Self-hosted and the Managed Virtual Network Integration:

Activity execution time varies when the dataset is based on different Integration Runtime.

  • Symptoms: Simply toggling the Linked Service dropdown in the dataset performs the same pipeline activities, but has drastically different run-times. When the dataset is based on the Managed Virtual Network Integration Runtime, it takes more time on average than the run when based on the Default Integration Runtime.

  • Cause: Checking the details of pipeline runs, you can see that the slow pipeline is running on Managed VNet (Virtual Network) IR while the normal one is running on Azure IR. By design, Managed VNet IR takes longer queue time than Azure IR as we are not reserving one compute node per service instance, so there is a warm up for each copy activity to start, and it occurs primarily on VNet join rather than Azure IR.

 


We found the solution in a “new” functionality:

“Managed virtual network provides customers with a secure and manageable data integration solution. But due to the limitation of architecture, we need to provision computes in managed virtual network each time we execute an activity. This can lead to relatively long queue times. Especially when you have small jobs that are executed sequentially, it’s not very efficient. So we introduce a TTL feature that allows users to reserve computes and these computes won’t be released within TTL period after the last activity execution.

Tested this function and it reduces (almost removes) the Queue on the second and further copy actions.

Topic can be closed.


Thanks for this, this helped me to greatly reduce execution time with ADF. For those who may be wondering, the TTL setting which needs to be enabled and then set is the ‘hidden’ one which appears when you enable ‘copy compute core’: 

  1. click on the managed vnet runtime you are using
  2. click on 'virtual network'
  3. expand the 'advanced section'
  4. set 'copy compute scale' to enable
  5. voila; you can now select a time to live in this section

Happy holidays, 

Andrew


Reply