Skip to main content
Solved

Failing Metadata Sync Task

  • January 15, 2025
  • 4 replies
  • 41 views

Hi

Just upgraded existing TX ingest and Data Integration to latest versions 6848.1 on dev server and now I get a failure when running my ODX Data Source steps, specifically it seems just what used to be know as the synchronise step but now called the import metadata task. Originally each data source has 3 steps, synchronise, transfer and storage management, and they used to run fine. Now I get seemingly random job fail errors on what looks like so far just on what was the synchronise step.

If I run the failed step manually it processes without issue. However, when I run the entire ODX job I have set up it seems that one or more data source sync steps now fails.

 

The execution failed with error:
Exception Type: Microsoft.Data.SqlClient.SqlException
Message: Transaction (Process ID 76) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
             Stack Trace: at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
                          at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, Boolean callerHasConnectionLock, Boolean asyncClose)
                          at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
                          at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.RunParser(BulkCopySimpleResultSet bulkCopyHandler)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinuedOnSuccess(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.CopyBatchesAsyncContinued(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.CopyBatchesAsync(BulkCopySimpleResultSet internalResults, String updateBulkCommandText, CancellationToken cts, TaskCompletionSource`1 source)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestContinuedAsync(BulkCopySimpleResultSet internalResults, CancellationToken cts, TaskCompletionSource`1 source)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServerInternalRestAsync(CancellationToken cts, TaskCompletionSource`1 source)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServerInternalAsync(CancellationToken ctoken)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerAsync(Int32 columnCount, CancellationToken ctoken)
                          at Microsoft.Data.SqlClient.SqlBulkCopy.WriteToServer(DataTable table, DataRowState rowState)
                          at DataStorageEngine.SQL.SQLStorageEngine.<>c__DisplayClass69_0.<ImportDataSourceMetaData>b__0(IDbCommand command, IDbTransaction transaction)
                          at DataStorageEngine.SQL.SQLStorageExtensions.ExecuteCommand(IDbConnection connection, Int32 commandTimeout, Action`2 action, IsolationLevel isolationLevel)
                          at DataStorageEngine.SQL.SQLStorageEngine.ImportDataSourceMetaData(DataSourceModel dataSourceModel, List`1 tableModels)
                          at ExecutionEngine.Action.ExecutionAction.<.ctor>b__11_0()
 

Any advice?

Best answer by rory.smith

Hi,

whether or not to automatically synchronize is an “it depends” question. As an example (for a setup before the new Metadata functionality):

CSV/JSON/XML file source in a folder / SFTP / blob storage: If you synchronize on a schedule, you will obviously detect schema drift. The result will depend on how you set your source up, if there are rules on what columns to load you may not have issues on new fields but changes to fieldnames will result in issues. One of the common problems I encountered was if the source location was empty when a synchonization was triggered. This would wipe the Ingest/ODX structure and require more work to resolve. If you do not schedule, you can resolve the missing files issue and continue loading data happily. Imagine an SFTP that is emptied and refilled, if you synchronize while it is empty you have an issue.

The new Metadata system splits the source structure from the Ingest storage structure, allowing there to be less problems on changes. If schema drift is strictly only additive (new fields, new tables), there shouldn't be an issue other than performance drops. I have never encountered a situation where that is guaranteed though. It is much more common for changes to also include field removal, field rename, field reorder, or a combination of many changes at once. This usually means human interaction is required to find out what should be done.

There isn't a “best” approach as it very much depends on the processes and context in which you operate. In the case of files in a location in the pre-Metadata system, there was a worst approach though: automatic synch.

For your error I guess it is important to know whether the deadlock is in your database, or in the cloud repository. It shouldn't happen though (in my opinion).

View original
Did this topic help you find an answer to your question?

4 replies

rory.smith
TimeXtender Xpert
Forum|alt.badge.img+7
  • TimeXtender Xpert
  • 649 replies
  • January 15, 2025

Hi Paul,

are you scheduling the synchronise steps? You may actually not want to do that as you might require human decisions here. The metadata process is really two parts: extracting schema from the source and applying it to the structure of Ingest storage. For the error I would push a support ticket I think.


  • Author
  • Contributor
  • 25 replies
  • January 15, 2025

Hi Rory

I am scheduling the synchronise steps as part of the overnight data ingest, as that is what I have always done. Is that not best practice, i.e. should the sync steps only ever be run manually?

I should add that I have manually run all the individual data source metadata sync steps first without any issue.


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+7
  • TimeXtender Xpert
  • 649 replies
  • Answer
  • January 16, 2025

Hi,

whether or not to automatically synchronize is an “it depends” question. As an example (for a setup before the new Metadata functionality):

CSV/JSON/XML file source in a folder / SFTP / blob storage: If you synchronize on a schedule, you will obviously detect schema drift. The result will depend on how you set your source up, if there are rules on what columns to load you may not have issues on new fields but changes to fieldnames will result in issues. One of the common problems I encountered was if the source location was empty when a synchonization was triggered. This would wipe the Ingest/ODX structure and require more work to resolve. If you do not schedule, you can resolve the missing files issue and continue loading data happily. Imagine an SFTP that is emptied and refilled, if you synchronize while it is empty you have an issue.

The new Metadata system splits the source structure from the Ingest storage structure, allowing there to be less problems on changes. If schema drift is strictly only additive (new fields, new tables), there shouldn't be an issue other than performance drops. I have never encountered a situation where that is guaranteed though. It is much more common for changes to also include field removal, field rename, field reorder, or a combination of many changes at once. This usually means human interaction is required to find out what should be done.

There isn't a “best” approach as it very much depends on the processes and context in which you operate. In the case of files in a location in the pre-Metadata system, there was a worst approach though: automatic synch.

For your error I guess it is important to know whether the deadlock is in your database, or in the cloud repository. It shouldn't happen though (in my opinion).


  • Author
  • Contributor
  • 25 replies
  • January 16, 2025

Hi Rory

Thanks for your reply.

I am pulling from named, static, non-changing fields in non-changing SQL tables so do not expect schema drift in this instance. But that being the case suggests to me, now that I think about it, that there is not a lot of point in refreshing the metadata each day, as nothing will change from one day to the next. If there ever was to be a new table or field added then not having the scheduled metadata refresh step would not cause a problem with the job running, and the metadata could be manually refreshed and the data pipeline updated in a controlled fashion if the new data was required to be ingested.

I think I'll remove the metadata sync steps from my overnight job and follow up with a support ticket with TX to specifically see if the failure I am getting is something they can investigate.

Thanks again for your detailed reply (for this and all the other posts I see of yours)


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings