Solved

Sync Oracle slow for ODX server

  • 28 April 2023
  • 7 replies
  • 113 views

Hi,

I have a setup of a TimeXtender 20.10.41.64, ODX, ADL, ADF, Self hosted IR which I get to work just fine. 

I need to extract data from an IFS10 source, Oracle database, and the sync is too slow so it times out and fail after 2 hours. The data source is provided by a third party and which means that the customer doesn’t have control to change any settings and 2 hours should be enough time for a sync. The customer have a couple of TX environments 

After many tests and discussing with the customer a similar problem have occurred before, but in an environment not using ODX server, where a sync took over 2 hours and this was fixed in Timextender release 20.10.3, issue “8889: Oracle slow synchronization - affects all bit specific data sources”. After this release the customer says the sync only takes about 10-15 min. 

Could this be a problem in ODX server as well? I managed to get a successful sync with version 20.10.40 which took about 2 hours but after an upgrade to 20.10.41 I’ve  managed to get 1 successful sync on 16 tries over 6 days where I’ve tried to do the sync different times during the day. 

Is it possible to check if the issue 8889 could also affect ODX server and something possible to fix?

icon

Best answer by Thomas Lind 28 April 2023, 11:14

View original

7 replies

Userlevel 6
Badge +5

Hi Aron

No it should not be the same. The main reason it is slow at synchronizing is that it is a ADF data source. If you try or are able to connect to the Oracle source without using the ADF version, see if this one is not faster at synchronizing.

You can also try to compare it with a normal MsSQL database, it will also be slower at synchronizing when using the ADF option.

It has been a common request to increase the speed of this, but I do not know if there has been someone looking at this issue yet.

Hi Thomas, 

Sorry for late response on this topic. I sincerely made an effort to find a solution to this issue. While I understand that using ADF could result in slower performance, there is something that I believe is unrelated to the ADF itself and is causing me confusion.

The Oracle source I'm connecting to has three different Owners (schemas). The Owner is a setting that can be configured in the properties for the ADF connection.

One of the Owners is the system owner, which I'm not interested in. Another one is called IFSINFO and contains approximately 60,000 rows. The last one is called IFSAPP and contains a staggering 600,000(!) rows

When I establish a connection with IFSINFO as the Owner, the synchronization process only takes a couple of minutes. This is because the query sent from TX to ADF instructs it to filter on Owner = IFSINFO.

However, when I create a connection with IFSAPP as the Owner, the synchronization process takes hours. Here's the catch: I'm only interested in about 15 tables, which amounts to roughly 100 rows (fields). Despite this, the query sent to ADF from TX requests all 600,000 rows belonging to the Owner IFSAPP, even though I’m only interested in 100 of them.

The significant difference in synchronization time, whether it's a few minutes or several hours, is directly related to the filtering (the where clause) created by TX and sent to ADF. Since the source remains the same, it cannot be attributed to the source or ADF itself. It must be an issue with the query, which is created by TX, am I correct in my understanding?

Even if I let the synchronization process run for hours, I am unable to make it work correctly. TX reports a timeout failure, although I can see in ADF that the job was successful.

We have advised the customer to utilize ADF in conjunction with the Data Lake, as it aligns with TimeXtender's recommendation for the optimal setup. This configuration effectively distributes a significant portion of the performance load to ADF, which is also the preferred approach.

I would greatly appreciate it if you could provide any assistance or support to ensure that this setup functions correctly.

Thanks a lot  

Userlevel 6
Badge +5

Hi @Spelling 

I have been trying to replicate it. I have, similar to you, two owners of our Oracle server.

One is the main one with a lot of tables and the other is just one table in total, it is a copy of an already existing table from the large one.

What I did was to create two Oracle ADF data sources, one that pointed at the large and one at the small. I made a rule so it only chose the same table in both.

Many tables only one selected.

One table and this specifically selected.

So the times are a bit different, similar to what you experience.

However compared to choosing all fields of both data sources, it is way faster with only one chosen.

Hi @Thomas Lind,

Thank you for your response and your efforts in trying to replicate the problem.

It appears that we are obtaining similar results, with the only distinction being the duration it takes. In your case, with multiple tables, it takes approximately 2 minutes, whereas in my scenario with many tables, it takes around 3 hours.

Regarding your statement, "However compared to choosing all fields of both data sources, it is way faster with only one chosen," are you suggesting that selecting just one field from the chosen table resulted in a faster process compared to selecting all fields?

While I have already selected the necessary fields and cannot further narrow it down, I understand the importance of testing and can perform it for diagnostic purposes. However, it is crucial for me to retain all the fields that have been selected right now.

Even though ADF shows a successful outcome, despite the extended duration, I encounter a failure and timeout in TX, as indicated in the previous post's screenshots. Perhaps there is a configuration or setting that can be adjusted to fix this?

Userlevel 2
Badge +1

Hi, I am experiencing an issue that may be related to this one in TX 6436.1. I am trying to connect to a MySQL data source over an self-hosted integration runtime for ADF. There are about 5 different MySQL servers and none of them are able to synchronize. 

For example, one of them is running the pipeline for 8 hours and 45 minutes before it fails: 

ODX Server Synchronize Task

Other's are failing for various reasons after a couple hours:

Other ODX Server Synchronize tasks

Looking at the failed pipelines, I can see ridiculously slow data transfer speeds like 10bytes/s. What is interesting to me, is that I can right click on the data source and click ‘select tables’. When I click search, it starts a pipeline in ADF that is done in a few seconds and shows me all of the tables: 

‘select tables’ on data source and search with no filters

Selecting columns also works as expected. So it seems it is able to retrieve all of the metadata without issue through ADF in a matter of seconds, but the pipeline run used for synchronize tasks does not work at all. This leads me to believe ODX Server is doing something strange with the synchronize task. 

I have also tried getting rid of ‘include all tables’ in the select tables window and adding all of the tables individually, but the same issue occurs. 

--

In a perhaps unrelated issue, I have an ODBC connection (not through ADF) which has been trying to sync for 22 hours without finishing on this same ODX Server (not at the same time as abovementioned). Other tools are able to show all metadata in a number of seconds on the same machine… 

This is becoming a serious block for us as we cannot start any transfers without a synchronization first completing on the data source, or explicitly creating rules for the tables we want to load. 

Kind regards,

Andrew

Hi Andrew,

From what I’ve learned and from what I’m reading in your post I think the issue is the same. 

What I have found is that the query created by TimeXtender and then sent to the source system via ADF is quite badly written. Why I’m I saying that. Well because even if you only select to include one or two tables, The query sent to the source system via ADF doesn’t filter on only these tables but will include all tables. Compared to a query getting metadata the query includes one or two (I don’t remember right now) joins to get some additional data it would like to have. And this I believe takes a long time to execute. What I have found is also that these joins they are doing to get if some columns are keys or not are all flagged 0 in the end result in our cases, so these joins are in that sense useless.

To me, the better solution would have been to loop over each table, and only the ones that have been included. This would probably only take a few seconds (if only a few tables are included) or minutes (if quite many tables are included).

We have tested to adjust different timeout, both in TX but also directly in ADF on the datasets. We often, but not always, manage to get the ADF pipeline to run successful but could get failure in TX anyway. I believe this is due to time it has already taken to execute this query and then TX needs to handle all this data that it gets as a response, again even if we only have included one table.

I’m not sure if TX is putting any prioritizing on this issue but I would say it is a major block for us as well and actually starts looking at alternative solutions instead cause it has been quite hard to explain to customers why there is a problem to extract data due to this issue.

I like TimeXtender a lot but this is really a huge problem that we not even able to extract data using this solution.

Let see what TX says but I’m for sure interested to follow up on this issue as well

Regards,

Aron

Userlevel 2
Badge +1

Hi Spelling,

Agreed!

The performance of the queries that TimeXtender generates is ever important when we are using ADF, since we are paying for compute time… if we spend 9 hours getting the metadata of a data source, that costs us money as well. It is hard to build trust with the client about the code TX generates when we run into issues like this. 

Kind regard,

Andrew

Reply