TimeXtender will load text files in file order (at least when using the Business Unit approach). In practice this means order from a text file is preserved. This is not guaranteed however, so if there is no field that can be used to order the data you are depending on the current behaviour of SQL Server's bulk import implementation.
As DW_Id is a sequence number, this can be used to sort after data ingestion. It would be better if the source system can output an explicit sorting criterion as that does not depend on implicit behaviour and would be robust against parallel ingestion scenarios.
I do not know if Azure Data Factory can split the transfer of a text file across threads in a pipeline, that could result in out-of-order extraction if ADF supports that.
Thanks Rory.
To anyone reading - have you dealt with this in the past? We are unfortunately not able to get the vendor providing the extracts to include a row number and the order of the rows is very important in our processing of the data.
We are considering using a shell script to add line numbers before loading but testing this is proving to be very slow (using powershell in a windows environment).
Cheers, Mark
Hi Mark - was this finally resolved?
I was just "passing by" and wondered if each source row does not have a full DTG (of the activity carried out in creating each row), thereby creating a unique row ID? Each row would have granularity down to the "second" of the DTG value, if that's enough?
Just a thought.
Cheers,
Jon.