Solved

Issue With Data Source Connection No Longer Working


I have a number of Oracle data connections (TimeXtender Oracle Data Source) which have been set up and working against the same data source for a few months now. However, just recently, one of these data connections stopped working and started reporting timeout issues after 1 hour of connecting. I have multiple cloned versions of this same Oracle connection, and it seems to be only one of the cloned connections which is having this issue.

 

I have since cloned one of the working connectors and added the 6 tables that are pulled by the failing connector (to mimic the failing connector setup) and this temporary connector runs without issue, completing in just a few minutes. As far as I can see, the 2 connectors are set up exactly the same way, its just the the original does not work whilst the new temporary test connection does.

 

It may not be related, but this connector started failing one day after I had performed an in-place upgrade of both the TX ODX server and desktop client. However, what I don’t understand is that after the upgrade I retested all the connections and they all processed without issue - it was only the following day that one of the connectors started to fail. Of course this may not be related but I thought I would mention it.

 

The provider version of the TimeXtender Oracle Data Source we upgraded to is 18.1.2.1 and the version of TX (ODX and client) is 6536.1.

 

I am reticent to remap my existing data pipelines to use tables pulled using my temporary working connector, especially without understanding why I am experiencing this issue or whether it could happen again, and would prefer to be able to get the failing connector working again, thereby preserving the integrity of my existing data pipelines.

 

Any advice would be appreciated.

icon

Best answer by Paul Bridge 16 April 2024, 15:46

View original

17 replies

Userlevel 5
Badge +5

Hi @Paul Bridge 

The other Oracle data sources that are fast is also in the new version of the provider and they continue to be fast after the upgrade?

Userlevel 5
Badge +7

Hi @Paul Bridge ,

could you have a look at the size of TempDB and see if your failing connector is spooling to that? I have seen a situation where the upgrade to that version suddenly increased load on TempDB (which takes a lot more time). You can also check the number of sockets associated with your ODX Server (netstat -an -p tcp), I saw a lot of sockets with increasing port numbers for the ODX Server in this case.

Hi @Thomas Lind 

Thanks for getting back to me. Yes, I only have one Oracle data source connection set up in the portal, which has been upgraded, and all the local ODX data source connections are using that, including the one that is failing.

Userlevel 6
Badge +5

Hi @Paul Bridge 

Do you have other transfer tasks running while the transfer task times out?

How many tables are selected in the data source / transfer task which times out?

Also can you please send screenshots of the advanced settings for both the working data source and the data source which is timing out?

 

Hi @Christian Hauggaard 

I can attempt to run the table transfer task manually when I know that there is nothing else running and it will still fail with a 1 hour timeout.

The original table transfer task had 8 tables selected, but even when removing all but 1 of the tables it still fails.

Failing Data Source - Advanced Settings

Working Data Source - Advanced Settings

 

 

 

Userlevel 6
Badge +5

Hi @Paul Bridge 

Thanks for confirming. Can you please send screenshots of the edit data source window for both the working data source and the one that is timing out so that confirm the connection names?

Also can you please try running transfer task and test connection for the data source that is timing out? Does the sync task complete successfully and is test connection successful? If the sync task is successful can you please try re-running the transfer task to see if it still times out?

Can you also please see if you find any relevant errors or warnings regarding this data source in the ODX service log?

 

Hi @Christian Hauggaard 

Please see below as requested - customer name has been obscured;

Failing Data Source

Working Data Source

 

When running the [sync] task this successfully completes. However, when running the [transfer] task this times out with a timeout error after 1 hour. And yes, I reran both this morning and the [sync] still completes ok and the [transfer] fails.

If I leave the [transfer] task to timeout after 1 hour the service log entry is as follows;

The task '6642' failed:
System.AggregateException: One or more errors occurred. ---> System.Data.SqlClient.SqlException: Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception: The wait operation timed out
   --- End of inner exception stack trace ---
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TrySetMetaData(_SqlMetaDataSet metaData, Boolean moreInfo)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader()
   at DataStorageEngine.SQL.SQLHelper.<>c__DisplayClass12_0.<GetExistingTableStructure>b__0(SqlCommand command, SqlTransaction transaction)
   at DataStorageEngine.SQL.SQLStorageExtensions.ExecuteCommand(SqlConnection connection, Int32 commandTimeout, Action`2 action, IsolationLevel isolationLevel)
   at DataStorageEngine.SQL.SQLHelper.GetExistingTableStructure(SqlConnection connection, Int32 commandTimeout, Guid dataSourceId)
   at DataStorageEngine.SQL.SQLStorageEngine.TransferData(IDataSourceEngine dataSourceEngine, ExecutionTaskModel taskModel, List`1 tableModels, Boolean forceFullLoad, SecurityExecutionModel securityExecutionModel)
   at ExecutionEngine.Action.StandardAction.Execute()
   at ExecutionEngine.Action.ExecutionAction.<.ctor>b__13_0()
   at System.Threading.Tasks.Task.Execute()
   --- End of inner exception stack trace ---
---> (Inner Exception #0) System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TrySetMetaData(_SqlMetaDataSet metaData, Boolean moreInfo)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader()
   at DataStorageEngine.SQL.SQLHelper.<>c__DisplayClass12_0.<GetExistingTableStructure>b__0(SqlCommand command, SqlTransaction transaction)
   at DataStorageEngine.SQL.SQLStorageExtensions.ExecuteCommand(SqlConnection connection, Int32 commandTimeout, Action`2 action, IsolationLevel isolationLevel)
   at DataStorageEngine.SQL.SQLHelper.GetExistingTableStructure(SqlConnection connection, Int32 commandTimeout, Guid dataSourceId)
   at DataStorageEngine.SQL.SQLStorageEngine.TransferData(IDataSourceEngine dataSourceEngine, ExecutionTaskModel taskModel, List`1 tableModels, Boolean forceFullLoad, SecurityExecutionModel securityExecutionModel)
   at ExecutionEngine.Action.StandardAction.Execute()
   at ExecutionEngine.Action.ExecutionAction.<.ctor>b__13_0()
   at System.Threading.Tasks.Task.Execute()
ClientConnectionId:358b1204-049e-4538-9078-7822ee246ad2
Error Number:-2,State:0,Class:11
ClientConnectionId before routing:bfe269c0-dced-4d52-984d-218627534393
Routing Destination:e6cf95551cf8.tr8896.uksouth1-a.worker.database.windows.net,11027<---

 

Which to me just seems to be reporting a timeout.

Userlevel 5
Badge +5

Hi @Paul Bridge 

It is only the old one that was not recently created that has this issue?

You mentioned that it only was a few of the tables that existed in the slow one, or maybe it was the other way?

Anyway my question is. If the slow one has tables that does not get used in the ones that are fast, does they continue to be fast once you add them to that?

Hi Thomas

Not sure what you mean with respect to slow and fast? The issue I have is working or not working 😀

In summary, I had approx 15 OCI JDE dataconnects set up that were all working, and had been working for a few months.

Then I performed an ODX Server and TX client upgrade (along with the recommendation to update the data providers) to fix a documentation bug I had logged. I then reran during the day the ODX job to rerun all the data connects and this ran without issue. And then the overnight ODX schedule also ran that night, again without issue.

However, since then, 1 of the 15 data connects no longer runs - it times out after 1 hour. The fact that I had performed an upgrade a day or so earlier I think is possibly/probably a red herring, as I performed a full manual test successfully and an overnight scheduled test successfully before I started having problems.

I have since cloned this failing data connect and ran that and it works fine. It just seems that the 1 original dataconnect has somehow got corrupted and no longer functions.

 

Userlevel 5
Badge +5

Hi @Paul Bridge 

A timeout is due to it being slow at doing the transfer, and the data amount being way larger than expected, or it being stuck accessing the data. So that is not what causes the error, that is just the result of it exceeding the allowed time. So I went with slow giving timeouts.

I have since cloned this failing data connect and ran that and it works fine. It just seems that the 1 original dataconnect has somehow got corrupted and no longer functions.

 

The above is what I wanted to focus on. if there are any differences other than this, E.G. specific tables not existing in the other. If that is not the case, then your suggestion is the logical conclusion. 

Why this happened for only that one and not all is also hard to replicate.

Hi Thomas

There are no other differences between the working and failing data connectors, except that I have now removed all but one of the tables from the failing data connector ( to make it as simple as possible) and it is still not working.

It sounds like this is not something that has been experienced before, and also something that we cannot replicate. If we are not able to get more information on what is causing the failure, and on visual inspection the 2 dataconnects seem identical, then it sounds like we may never understand what has happened here.

Userlevel 5
Badge +5

Hi @Paul Bridge 

That is most likely the case. 

Userlevel 5
Badge +7

Hi @Thomas Lind ,

wouldn't it be possible to compare cloud repository content for the failing connector vs. a successful one pulling the same data? That would at least isolate the problem to something local vs. repository-based.

Hi @Thomas Lind 

Is the suggestion by @rory.smith (thanks Rory) an option, as it would be good to understand as much as possible about what may have happened?

Userlevel 5
Badge +5

@Paul Bridge 

We can try that, but I am not able to see this and you aren’t either. I have made this a support ticket where I can involve those who should be able to look at the cloud repository.

Userlevel 5
Badge +5

Hi @Paul Bridge 

Should we solve this as well?

The ticket is easier to close, but if you can add a description of what you found there so we can set it as solved. While it just suddenly worked again isn’t really a solving method, some reply should be set as this.

Hi @Thomas Lind 

Yes, we should close off this question as well. In summary, we moved this request to a support ticket with the TX engineers who then asked a series of follow up questions. However, in answering the questions I checked again the failing TX connector and found it to be working again after a few weeks of it not working. As it is now not possible to replicate the issue I had we decided to close off the ticket on the basis that if it re-occurs I should re-open the support ticket.

Reply