This article describes a few scenarios on concurrency and parallel processing of tasks as well as expected behavior in TimeXtender Data Integration (TDI) and Troubleshooting concurrency issues in Ingest Instances data sources.
Symptoms
You experience sporadic failures within the Ingest Instances load from source systems. The sources vary and there is never a consistent table/connection that is failing. The tables do pass when executed manually.
You notice intermittent errors during the scheduled execution of the tasks. However, re-execution of a task works OK without failure. You feel it is related to throughput or table locking (concurrency) but not a communication issue.
Which data transfers involve Ingest Instances
The Ingest Server handles both inbound and outbound Ingest transfers. So, if a Prepare Instance table is sourced from the Ingest Instance, anytime that table is executed, the Ingest Transfer step is accomplished by the Ingest Server Service. In other words, any table which shows a mapping to the Ingest Instance source will invoke the Ingest Server Service when it is Executed.
Source table data extraction to Ingest Instance storage (on Data Lake or SQL storage) - triggered by an Ingest task, running manually, with data on demand, or scheduled.
Transfer from the Ingest Storage to a staging Prepare Instance - triggered by an Execute operation on Prepare Instance object (manual, queued, or scheduled package).
There are a few places to review and manage concurrency:
Concurrent executionthreads in each Ingest data source Advanced Settings.
Since Ingest Instance to Prepare Instance transfers are processed by the Ingest Server, any table executed also adds to the number of concurrent threads. These are the outbound transfers that you see in the Ingest Instance Execution log. Keep an eye on scheduled Execution packages in any environment as well as manual Executions.
Review schedules, tasks & threads
Number of threads on execution packages (including the default package) containing tables mapped to a ingest instance.
Number of ingest tasks set to run at the same time a 'high-thread' execution package runs.
Currently running TDI and Ingest Instance Service helper processes in Task Manager
Schedules of tasks in ingest instance data sources and Execution packages - try to adjust the timing of certain scheduled task to not overlap with other tasks, outbound stage transfer, or use the Enable data on demand option instead of scheduling transfer tasks.
How much data is being extracted/ transformed on these tasks/connections? Depending upon the architecture and data passing through the Ingest Server system, monitor its baseline and peak performance to determine if an alternate design or scaling of CPU/ Memory may help.
How to increase the timeout for Ingest Instance data source, ADLS, and ADF
For very large data transfers, increase the timeouts on the Data Lake storage Do not set the timeout to infinite (0) it will always be better to have a actual timeout.
When using Azure Data Factory to connect to and on-premises SQL (via Self Host Integration Runtime) and extracting large number of rows, if you get a timeout error (similar to the following):
System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Operation could not be completed within the specified time.).
Edit the ADF data source settings in the portal
Increase Connection timeout from 30 to 7200.
FAQ on concurrency and parallel processing
How many tasks can I run in parallel in an Ingest Instance, when I have multiple data sources and multiple tasks inside each data source?
For a specific data source, you can run only 1 task at a time. If you run multiple tasks there, the first task will run and the rest would wait (i.e. pending).
Multiple data sources can run 1 task each at the same time. So, if you have nn data sources, each may run 1 task at the same time, making a total of nn tasks running in the Ingest Instance.
At some point, the number of parallel tasks will be limited by your machine capacity. Resize the VM running the Ingest Server, to increase the number of vCPUs, and the number of concurrent tasks allowed
Two data sources extract data from the same table in a source database. Can I run an Ingest Transfer task in each data source at the same time?
Yes.
Can I Execute a default or named package in my Prepare Instance (manual or Scheduled) at the same time when ingest tasks are running? Both the ingest task and execution package operate on the same table in various stages. Ingest stores data on a data lake.
Yes, you can. Ideally, it should not cause an error. But the package may wait if it depends on the completion of a ingest task.
If an ingest task is extracting new data into its storage, the Execution package will NOT fetch data from the "previous" (existing) version of table from the data lake. It will wait until the ingest task fetches new version of data into data lake, then the Execution package will pull the new data into the Prepare Instance.
You can turn on Data On Demand on the data source, to be sure the tables in the ingest store gets updated before a transfer from the ingest instance to the prepare instance is done.
How does the "Multiple threads" setting in the ingest task or execution package contribute to concurrency?
Each thread may fetch a different table at the same time.
Does reducing "Multiple threads" setting helps resolve some errors with concurrent tasks, multiple threads and package executions?
Yes.
How many Execution packages (manual or scheduled) should I run at the same time in a Development environment?
Only 1
The check that looks for currently running packages - only applies to scheduled instances of an execution package.
If you also start a scheduled package manually, the scheduler will not recognize it. In that scenario, both the manual and scheduled version of the execution may run at once.
This overlap may cause unexpected results. Potentially, it might cause deadlocks and/or 'faulty' data due to execution not performed in the correct order. Consider the two execution packages as one where individual constraints are unknown.
The service only runs one execution package per Prepare Instance per check
The service will not start a Execution Package if a previously scheduled execution of that package, or any package is still running.
I have multiple packages in multiple Prepare instances. I have scheduled these to run at the same time. Will the execution service run these at the same time?
Yes, if the relevant instances have been selected in the execution service configuration then they will run in parallel.
I have multiple Prepare instances sharing the same Ingest Instance data sources. The Execution service is checked for each instance. Can the execution packages run at the same time?
Yes, but consider the pressure put on the source systems as you are reading the same data from different instances at the same time.
How to sequence packages to run one after another?
Edit package settings, In the Post Execution section, Run Package option, select the next package name.
I have a number of packages scheduled to run repeatedly with short intervals. How does TimeXtender pick a package to run? Does that "collide" with ingest tasks?
One execution package can be executed per prepare instance per starting time. To avoid possible conflicts, be sure to space the scheduled start times of each of your packages by at least five minutes from one another.
I see a number of ExecutionEngine_x64 processes running in Task Manager when I run many ingest tasks
Those are helper processes launched by Ingest Server service. Each started task on the Ingest Server will generate one of these.
Performance optimization
Transfers from the Ingest Instance to the Prepare instance - The Ingest Instance Server attempts to download the parquet files from Data Lake into memory on the VM, then write those back to the Prepare Instances SQL database.
You have a few options to optimize:
Use Azure Data Factory (ADF) to transfer from the the Ingest Instance to the Prepare Instance - This may significantly increase speed but may come with a minor cost for the Azure resource.
Increase the Memory Limit on the Ingest Instance Data Lake Storage account. For example, if it was set to 8 GB of your 14 GB available on the server, you could increase it to 10-12 GB. This should increase transfer speed slightly without any extra cost.
Increase threads. Test your scenario with 2-6-10 thread etc. and see which setting works best.
Increase the Azure SQL database Compute tier. Review Azure metrics. if your SQL database is maxing out at times, more compute resources could be utilized, but it may increase cost.
We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.