Solved

ODX Transfer task running indefinitely

  • 14 February 2024
  • 20 replies
  • 164 views

Hello, 

TimeXtender: 20.10.40.64
ODX: 20.10.31

We are using the ODX and TimeXtenders SQL database connector to get data from a synpase database. 

Sometimes the synpase databas seems to have problems, where a query runs and doesn’t complete (a seperate issue we are working on).

However what happens in TimeXtender in the ODX is that the transfer task keeps running indefinitely and never times out, leading timextender execution package to run but just sits there waiting for the odx transfer task to finish. However nothing fails anywhere, so we do not get an alert.

We need some way to be able to capture this.

Is there anyway to set up one or both of these things in TimeXtender or if you have any other ideas?

1. Kill the transfer task if it takes more then X minutes
2. Get an alert on failure or long running task

Thank you!
 

icon

Best answer by Christian Hauggaard 8 March 2024, 13:05

View original

20 replies

Userlevel 5
Badge +7

Hi @Victor ,

 

I assume your timeouts on the source, ODX storage, ADF if applicable are all short enough that they should have triggered?

Hello Rory,

The ODX sql connector har standard setting (15s command timeout)

The ODX date lake has a connection timeout of 100s and a command timeout of 7200s. 

The task we had ran for alittle over 3 hours.

Userlevel 5
Badge +7

Hi @Victor ,

 

It could be that the TX SQL connector runs the extraction in batches, if so it could be that the command timeout is per batch and did not reach two hours for that batch yet. If you know that the extract should be shorter I would set the timeout to be ~1.5 times the longest extract time you expect.

It’s possible and we will try some different settings.

But the main question I have is there no way to recieve a notification for ODX transfer tasks similar to the execution packages? Wether they fail, successful with errors or any way to monitor the length of loads?

Userlevel 5
Badge +7

Hi @Victor ,

in the 20.10.x series you can only get mail notifications for critical errors regarding the ODX Server service itself as far as I am aware. In the new release you can use on-demand execution on an ODX source to trigger a table's reload when a DWH table gets executed. In that way you can use the DWH mail notification to send a mail on a failure in the ODX.

If you really really need to you could investigate the sqlite backlog database that is synched locally from the cloud repository but you would need to be careful to avoid locking situations.

 


Over this weeend we had a similar issue with another connector and with another source. 


This time a task that is supposed to take about 20 minutes ran for about 80 hours without failing.

Surely there has to be some way to stop sources from getting stuck on transfer, which makes execution packages not run and brings down production. Or at the very least some way to flag when this is happening so we can catch it.

Does the ODX connection and command timeout setting just not work? This is a big problem.
 

Userlevel 6
Badge +5

Hi @Victor 

You mention that the synapse database you are connecting to seems to have problems - “where a query runs and doesn’t complete”. Could you please provide some more insight as to why the query does not complete? Are you using managed queries for this TimeXtender SQL data source?

You also mention that you encountered the same issue for another data source - which data source was this? 

Do you currently have retries setup on your transfer task?

 

Hello!

Both have restart on fail with 3 attempts. One has 2 minutes between and 10 on the other. 

For the synpase database, we are not sure why it happened and that is being investigated separately. The the root issues lies within the synpase.

Same with the other source, they had a crash in the database during the weekend so the root cause lies with the source.

This has only happened about 5 times in 2 years. However, the main thing we need is finding a way to timeout transfers and not have it takes 40-50h+ before we manually notice it.

As an example, our qlik reports loading from these sources directly correctly timed out. 

Currently I am building a script in the repository that sends an email when a execution packages takes longer then X hours. This will help us identify it in the future earlier, but getting timeout to work correctly is important to the customer.

Userlevel 6
Badge +5

Hi @Victor 

In order to troubleshoot, can you please try disabling retries on the transfer task?

Furthermore, are you using managed queries for this TimeXtender SQL data source or simply selecting tables?

Can you also please confirm which data source you are using for the other data source which encountered a database crash during the weekend?

Userlevel 6
Badge +5

Hi @Victor could you please provide an update on my questions above when you get the chance so we can troubleshoot further? Thanks

Hello,

We are selecting tables from the source. There is a query table in one of the sources, but not the others.

It has happened on a total of 3 different sources, and happened again this friday on a new source (2 in list below).

1.One synapse database (with timextender sql database connector) - with retries on transfer task
2.One regular sql database (with azure datafactory connector) - without retries on transfer task
3.Another sql database (with a ODBC driver connector) - with retries on transfer task


We could remove retries and see if it comes back, but that will eventually bring it’s own issues.

All we want to achieve if for the tasks to fail and not get stuck on running forever.


EDIT: forgot to answer the last question.
The database crash that caused one was the ODBC connector

Update:

Removing retries did not resolve the issue for the 2 tasks.

We are getting an increased amount of transfers getting stuck and running for long times without timing out.

Userlevel 6
Badge +5

@Victor thanks for confirming. For the ADF data source could you please try to lower the activity timeout? what is it currently set to? 

Regarding “We are getting an increased amount of transfers getting stuck and running for long times without timing out” - are these for the same data sources mentioned above or different data sources?

Userlevel 6
Badge +5

Hi @Victor 

You mention that “The ODX sql connector har standard setting (15s command timeout)”

Do you mean connect timeout?

What is your command timeout set to? Please see where to find command timeout below. Can you please attach some screenshots

 

 

Hello,

It is for the same data sources as mentioned above.

This is sql connector setup for the one we are having the biggest problem with.

The sql connectors look like this:

 


Command setup:

 

This is on timextender version: 20.10.31.64
ODX version: 20.10.31

Userlevel 6
Badge +5

Hi @Victor 

Could you please lower the command timeout to 300 to troubleshoot further? How many tables do you have for the transfer task for the TX SQL data source?

Hello,

We could try that , but the nightly full transfer for that task takes about 15 minutes. Will that not make it stop working?

The hourly transfer for that source is 25 tables with most of them being incremental.

Userlevel 6
Badge +5

@Victor I believe the command timeout is per command i.e. per table

I have made the change and will monitor it over the weekend and report back here.

Meanwhile, thank you very much for the support.

A combination of changing the command timeout and removing retries has solved the issues of it getting stuck “forever”.

We will try at a lter date to reintroduce retries to see if it can work with a shorter command timeout.

Thank you @Christian Hauggaard for the help!
 

Reply