Solved

Tables made empty after ADF load from ODX to DSA

  • 14 March 2023
  • 7 replies
  • 131 views

Badge +2

Hi,

We wonder if someone of you already ran into this issue, and perhaps have an idea on how to solve it ?

Setup :

TX 20.10.38 using Azure SQL DB for DWH and ADLS gen2 for ODX Server.  ADF used to load data from ODX Server into DSA.  All databases have private endpoint, so SelfHosted Integration Runtime is used by ADF.

Issue :

From time to time (random - not always on the same tables - for both incremental and full load tables) we notice that the ADF pipeline takes the normal amount of time to load data from ODX to DSA R-table. However the cleansing takes 0 (zero) seconds and the record count log reports 0 (zero) records in the R-table.  ADF nor TimeXtender Execution report an error.

In the ODX Service Log we often notice this error : Failed to encrypt sub-resource payload… (not on exact the same moment as the execution - so not sure it has something to do with the issue)

Work-around : ADO.net

We had a simular issue last year and started using ADO.net to load data from ODX Server into DSA.  We didn’t have the issue on that environment anymore, but recently we noticed memory pressure on the TX VM running the ODX Server service (and even had some Out of Memory errors when loading huge parquet files) - so we decided to switch back to ADF…

 

Any ideas ?

 

Best regards,

Peter

icon

Best answer by Christian Hauggaard 8 June 2023, 16:27

View original

7 replies

Userlevel 5
Badge +7

Hi @peter.jensen ,

 

I have not encountered that specific error. We have encountered other transient errors with ADF from time to time. You could also choose to use the AutoResolver or a new Azure runtime (those are scalable / more configurable) for your ADLS → DSA transfers, that will reduce load on your SHIR VM.  As to your ADO.net issue: when not using ADF to transfer from Lake to DSA, you should use the “Limit memory use” setting to restrict the memory used to unpack parquet files.

Badge +2

Hi @rory.smith ,

Thanks for the feedback.

We’ll look into the AutoResolver / Azure runtime.

As for the ADO.net solution : we had setup the ‘Limit memory use’, but we still ran into memory issues.

Userlevel 5
Badge +7

OK - TimeXtender itself may also be using more memory itself. Larger projects will make TX consume quite some memory. You could use resource monitor to see what is consuming memory and consider switching to a more memory-optimised VM family like the E-series if you are not already running one of those.

Userlevel 4
Badge +5

Dear Peter,

I've had issues with RAM memory on the VM where the ODX server is running. This was due to the fact that the amount of rows was not capped. So when you pull data from the ODX and you don't set the max rows to copy the ODX tasks will quickly take all the RAM. 
When you go into the global database settings (or the MDW settings in the project) you need to set the max rows to copy to 300.000 (just an example). In previous TX versions this was automatically set, but not anymore and often this cell is emply = All rows. then It will load all the rows in one go instead of (bulk) loading in sets of (in my case) 300.000.
 

Hope this helps

=Daniel

Userlevel 6
Badge +5

Hi @peter.jensen did you manage to test the ideas above in regards to the memory issue, and AutoResolver / Azure runtime?

Badge +2

Hi,
looking into it with TX support, but I did discover that the issue is most likely related to having :

  • multiple environments (TST and PRD)
  • using only & ADF
  • having schedules to load data from ODX to DSA on the same time in TST and PRD (using ADF to load from ODX to DSA)

We noticed that :

  • ADF pipeline objects have the same name for TST and PRD 
  • ADF dataset objects (= ODX and DWH connection) have the same name for TST and PRD 
  • TX generates the ADF objects each time it’s executing the DSA table
  • the ADF folder is just an attribute of and ADF object (it’s not a physical folder)
    This means an ADF object name is unique within ADF.

And run into the issue of :

  • a pipeline for PRD ended up writing data in an R-table of TST (most likely because at the same time in TST a pipeline was generated and had overwritten the dataset object)
  • resulting in duplicate records in TST 
  • resulting in an empty R-table in PRD

Suggestions :

  • Prefered solution : TX should generate ADF objects names including the environment name (100% sure objects have a unique name)
    or
  • You should setup an ADF per environment
    or
  • You should make sure that DSA schedules for TST and PRD are not overlapping in time (hard to do)

 

Userlevel 6
Badge +5

Hi @peter.jensen this issue has now been fixed in 20.10.43. Fix ID 18659

Reply