Solved

CData REST API connector loads incomplete data when paging is used


Userlevel 3
Badge +2

Issue: tables with paging stop loading while data is still incomplete


We have an execution package with loads data from AFAS GetConnectors. There are 7 Business Units in TX, each loading data from a separate AFAS implementation.
 

 

Per implementation AFAS supports up to 20 parallel connections. From this perspective the Max. Threads 10 should not be a problem.

AFAS uses paging with record offset for tables that are too big to load. 

All Business Units have a similar set of tables. I’ll use AfasIncluzio as an example, as this one has the most data.

This table is currently the only one loaded with paging

  • TX_Financiele_Mutaties

When we look at the execution log on package level, we can see that the Total Time varies per day:
 

 

The AFAS API is quite stable in terms of speed. Hence I can tell that all executions under 15 minutes did not load all data. Further investigation learned that this only happens on table Tx_Financiele_Mutaties, which is currently the only one with paging.

This is the Execution Overview of that particular table:
 

 

And the row count:
 

When I filter the log on this particular request, I see that the load just stops instead of requesting the next page.
 

Some findings during today’s tests

  • Table seems to finish correctly when its executed individually
  • Table sometimes finishes correctly during a normal load (10 threads)
  • Other tables that are loaded at once (no paging) are always loaded correctly
  • So far I haven’t been able to reproduce the error when I lowered the Max Threads in the TX package to 8. However, AFAS states that we can use up to 20 concurrent connections per implementation (we have 7)

While lowering the Max Threads may be a workaround here, it is not an explanation for the behavior. Are there limits on concurrent CData connections? Are we overlooking something? Any other hypotheses?

We’re contacting AFAS as well, but their logging and support regarding issues is quite inaccessible.

I will send additional logging via DM.

Please also see legacy ticket 23426

Best regards,

Erik

 

TX version 20.10.39

CData REST API connector version 22.0.8423.0

icon

Best answer by erik.van.mastrigt 8 March 2023, 08:51

View original

10 replies

Userlevel 6
Badge +5

Hi Erik

Thanks for the log.

Just to be sure I understand.

If you created an execution where you used possibly 20 connections, without pagination, it would work fine, but once it does it does not add the correct amount of rows.

Userlevel 6
Badge +5

Hi Erik

To add to your findings, I have seen similar issues with the pagination setup. Here it was related to nested calls though and I am unsure if you use that.

Maybe it will behave as mixing nested calls and pagination when running many transfers at once and mixing that with pagination.

For that issue CData told me that my only option was to not do nested calls or use the old pagination method with a enum loop.

Userlevel 3
Badge +2

Hi Erik

Thanks for the log.

Just to be sure I understand.

If you created an execution where you used possibly 20 connections, without pagination, it would work fine, but once it does it does not add the correct amount of rows.

It sometimes does, it sometimes does not. See the image with the row count: each bar is a daily execution.

Without pagination I haven’t seen issues so far. We don’t use nested calls. I’ve sent you the RSD via email.

Userlevel 6
Badge +5

Hi Erik

That pagination sometimes do not work, should be a CData issue I will pass on your findings and see if they can come with a solution.

Userlevel 3
Badge +2

Update: we lowered the Max Threads to 6, but the issue persists. Last night it stopped at exactly 750.000 records while ca. 2,5 million are expected.

Userlevel 6
Badge +5

Thanks for the update.

Try to set the REST data source to one concurrent execution and see if it passes through.

I assume it will be slower still, but it should still attempt to execute all other tables in 6 threads.

If it works, try to increase the amount of concurrent threads until it does not work anymore.

Then I will pass it on to CData as well.

Userlevel 3
Badge +2

Hi Thomas,

I tested this but cannot reproduce the error while playing with the Max concurrent transfers.

What I did find out (so far, still testing if it goes well in our daily execution schema) is that the issue never occurs when I uncheck ‘Merge Steps’

 

The business units in this package have (among others) the same set of tables. So all of them have a table called ‘Tx_Financiele_Mutaties’. All tables ‘Tx_Financiele_Mutaties’ have pagination.

I noticed that if some of ‘Tx_Financiele_Mutaties’ are running concurrently, they all seem to stop around the same moment in time. Probably one of them is finished, but all of them seem to stop.

 

Per business unit the amount of records differs. ‘Tx_Financiele_Mutaties’ in business unit A has 2,6 million rows; in B 175K rows etc.

We’re currently monitoring the daily load where Merge Steps is unchecked.

Userlevel 6
Badge +5

Hi Erik

Thanks for doing this.
I would agree that what you see with the merge steps is the issue. Because if you set it to 1 in the data source like suggested and then has the same table across many BUs it will still try to run them at the same time.

Let me know how the daily load goes with this unchecked.

Userlevel 3
Badge +2

Hi Thomas,

We had a week without issues now. At this moment it’s quite assumable that the combination of Merge Steps and tables having the same name was the issue here. Solution is to uncheck the Merge Steps in the execution package.

Userlevel 3
Badge +2

A couple of analysis, also by CData later: the issue is that the RSD files used a different page size parameter. In cases like this, always use the same value for page size. 

CData:

In general. The HTTP pager stores the pagesize value in a static, which is probably triggering a race condition when that table is queried at the same time as another table with a different pagesize. [...] change the pagesize in the rsd to be 50k like the other scripts

 

Conclusion: when executing concurrent tables which use page sizing, always use the same page size value.

Reply