Skip to main content
Solved

PK table in ODX database is growing rapidly in terms of disk space

  • February 10, 2023
  • 9 replies
  • 254 views

bas.hopstaken
TimeXtender Xpert
Forum|alt.badge.img+4

Hi community,

I’ve a question about the ‘PK’ table that is created in the ODX when you enabled the ‘Handle primary key updates’ and ‘Handle primary key deletes’.

I’ve noticed the ODX database (SQL Database) is growing rapidly in disk space after enable an incremental load schedule to reload data every 5 minutes. When I used the default SQL ‘Disk usage by top tables’ I noticed the PK tables are the biggest in terms of disk space.

My PK table of the GL Entry table contains 4.5 billion rows! And is 95 GB. While my DATA table contains only 118 million records.

When I query the PK table and filter the primay key in this table on 1 value. This value is saved 75  times in this table. For every odx_batch_number (odx_batchnumber 0 – 74). Is this normal? I think it is a bit strange that my PK tables are the biggest tables in the ODX in terms of disk space.

Even when I run the storage management task. It doesn’t clean the PK table’s. It always contains the primary key for every odx batch load.

The customer is using version 20.10.37.64

Best answer by rory.smith

Hi Bas,

I expect having older PK states would be useful for replaying history. This way the ODX replaces the “type 2 every field in every table” approach. Handle deletes and updates are quite expensive in performance, so they are not useful for every table generally.

View original
Did this topic help you find an answer to your question?

9 replies

Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 1017 replies
  • February 10, 2023

Hi @bas.hopstaken 

It adds all PK values to the PK table every time you do a execution.

It is used to compare what fields are deleted or updated. It does not matter whether you use SQL or Data Lake to store the data in.

To decrease this size run a full load and use the storage management task to only allow 1 version.

Also this rollup feature may be able to reduce the size.

 

It does not apply to the SQL Storage type though.


bas.hopstaken
TimeXtender Xpert
Forum|alt.badge.img+4
  • Author
  • TimeXtender Xpert
  • 86 replies
  • February 13, 2023

Hi Thomas,

Thanks for the quick reply, I'm only wondering. Is it intended to work like this?

I understand the purpose of the PK table. But if you save the last 2 loads/batches in this PK table should be enough right?

Is it really needed to save all primary keys for every load/batch? 


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+7
  • TimeXtender Xpert
  • 649 replies
  • Answer
  • February 13, 2023

Hi Bas,

I expect having older PK states would be useful for replaying history. This way the ODX replaces the “type 2 every field in every table” approach. Handle deletes and updates are quite expensive in performance, so they are not useful for every table generally.


bas.hopstaken
TimeXtender Xpert
Forum|alt.badge.img+4
  • Author
  • TimeXtender Xpert
  • 86 replies
  • February 15, 2023

Hi @rory.smith,

Replaying history can be a reason to save the PK's for all batches. I know that this option is quite expensive in performance. But I didn't know all PK's for all batches are saved.

Nevertheless, it would be great if we have the option to choose how many batches need to be saved in the PK tables. Maybe the storage management task can be used for this. This way we can save a lot of disk space for the PK tables.

 

 


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+7
  • TimeXtender Xpert
  • 649 replies
  • February 15, 2023

Hi @bas.hopstaken ,

 

I agree: Storage Management tasks should have an option to clean the _PK tables (as this could be quite expensive). I would intuitively expect those tasks to also clean that up I think.


Christian Hauggaard
Community Manager
Forum|alt.badge.img+5

Please post and upvote this in the idea section. Many thanks! 


Bitmetric_Maarten
Contributor
Forum|alt.badge.img

Hi All,

We are running into the same issue. The size of the PK table is absolutely ridiculous compared to the table it is being used for. Is there any progress on cleaning the PK table from TX?


bas.hopstaken
TimeXtender Xpert
Forum|alt.badge.img+4
  • Author
  • TimeXtender Xpert
  • 86 replies
  • January 22, 2024

Hi all, 

As a temporary workaround, I've created a dynamic SQL script to truncate all ‘PK’ tables:

_____________________________________________________________________

USE TX_ODX_LIVE

DECLARE @TableName   NVARCHAR(128)
DECLARE @TableSchema NVARCHAR(128)
DECLARE @SQL         NVARCHAR(MAX)

-- Create a cursor to loop through table names
DECLARE table_cursor CURSOR FOR
SELECT table_name , table_schema
FROM information_schema.tables a
WHERE TABLE_TYPE = 'BASE TABLE' AND table_name LIKE '%_PK_%'


-- Loop through the tables and generate TRUNCATE TABLE statements
OPEN table_cursor
FETCH NEXT FROM table_cursor INTO @TableName, @TableSchema

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @SQL = 'TRUNCATE TABLE ' + @TableSchema +'.[' + @TableName + ']'
    EXEC sp_executesql @SQL

    FETCH NEXT FROM table_cursor INTO @TableName, @TableSchema
END

-- Close and deallocate the cursor
CLOSE table_cursor
DEALLOCATE table_cursor

_____________________________________________________________________

 

Please be aware: Do not run this script while reloads are running.


I created a idea for this a while ago: 

https://support.timextender.com/ideas/add-option-to-clean-pk-tables-889

Let's upvote this idea and hope it will be implemented quickly. 


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+7
  • TimeXtender Xpert
  • 649 replies
  • January 22, 2024

Hi @bas.hopstaken ,

you are dropping the whole _PK set here right? Does that not destroy the deletion handling / PK updating feature? I would only drop everything older than the last state, but that is probably an expensive deletion and might require index maintenance.

As the _PK table seems to be required only for table-internal processing, you should be able to post-script on the _PK table after cleanse without (b)locking other processes. Depending on process duration you might either do a simple post-process or run weeken maintenance or something like that.

 

I would really like to know what the reasoning behind storing all the states is, as I cannot imagine actually using the data without writing a whole load of complex SQL in some kind of disaster recovery scenario.


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings