Solved

Poor performance with direct read on large tables with incremental load

  • 6 June 2024
  • 7 replies
  • 129 views

Badge

I am having some issues with the transfer step when loading incremental using direct read. The problem seems to be that the generated procedures do not utilize indices on the underlying tables, resulting in full PK scans for every run.

I was especially having problems when running on azure general compute. Switching to hyperscale helped some, but the transfer step still uses approx 40sec scanning ~90m rows causing bottlenecks. 

When using ADO.NET I could set an index on the incremental rule-field speeding up the search to <1sec.



The execution-plan generated by TX looks like this:
 

Here we get a PK index-scan even when there is an index on dw-timestamp.

Testing on a subset of the data (~10mill rows):
 


Adding the option(recompile) to the procedure results in the procedure using the dw-timestamp index:

 



Is there a good reason for why these procedures are generated without the recompile flag?

Is it something that could be added in a later release?

And is there a way to work around this other than reverting to ADO-net transfers or SSIS? Scaling the DB/service tiers to compensate for something using a simple index could do better seems like a poor solution. 

Might be semi-related to this post:ADO.NET Transfer consistently slow on incremental table with no new rows | Community (timextender.com)

icon

Best answer by Christian Hauggaard 17 June 2024, 08:29

View original

7 replies

Userlevel 3
Badge +1

Are you using super natural keys? Because if you are using snk's with read from store it can become slow. Same as if you have alot of lookups in the table and have the options keep lookup values up to date on.

Badge

Are you using super natural keys? Because if you are using snk's with read from store it can become slow. Same as if you have alot of lookups in the table and have the options keep lookup values up to date on.

The problem is reading from a simple-mode table using the TX-generated procedures (transfer step). There’s no cleansing involved yet, and no transformations have been done on the staging-table other than adding an index for the incremental field.
The same behaviour is observed executing the generated procedure directly in ssms

Userlevel 6
Badge +7

Hi @andreas.skagestad ,

interesting find. I wonder if this behaviour is due to the query plan being cached for a initial full load scenario where an index scan is appropriate and key lookups are not. In subsequent runs the cached plan is used instead of re-evaluating until either statistics are invalidated or something else drastically changes.

I wondered whether instead of OPTION (RECOMPILE), OPTIMIZE FOR (@parameter = value) is a better strategy (depending on the recompile overhead. But I think in this specific case not. See Paul White's answer here: https://dba.stackexchange.com/questions/290245/date-index-seek-is-chosen-when-using-option-recompile-but-not-with-option-optimi

Your suggestion is probably faster then OPTIMIZE FOR due to Embedded Parameter Optimization, see: Aaron Bertrand's answer here: https://dba.stackexchange.com/questions/75842/when-to-use-with-recompile

Note that I think this partially explains why I very occasionally see Azure SQL DB performance drop catastrophically, as this clustered index scan also results in more parallelisation waits.

@Christian Hauggaard : I think this should be investigated quickly as this can be a huge performance boost across the board.

Userlevel 6
Badge +5

Hi @andreas.skagestad can you please confirm which version of TimeXtender you are using?

Badge

Hi @Christian Hauggaard ,

The tests in the opening post are using version 20.10.45. I have also tested using version 20.10.51 and observe the same behaviour.

Userlevel 6
Badge +5

Hi @andreas.skagestad thank you for the feedback, I have passed it onto our product team and they will look into this during the next build cycle 

Userlevel 6
Badge +7

Hi,

just came across this today:

it could be that one of the reasons for the reduced performance may be that using @variables in insert/selects that fill tables can break minimal logging. This is resolved by using OPTION (RECOMPILE): GENERATE_SERIES() Breaks Minimal Logging – SQLServerCentral , see point 5. 

Reply