Solved

PK table in ODX database is growing rapidly in terms of disk space

2 years ago
February 10, 2023
9 replies
272 views

bas.hopstaken
TimeXtender Xpert
91 replies

Hi community,

I’ve a question about the ‘PK’ table that is created in the ODX when you enabled the ‘Handle primary key updates’ and ‘Handle primary key deletes’.

I’ve noticed the ODX database (SQL Database) is growing rapidly in disk space after enable an incremental load schedule to reload data every 5 minutes. When I used the default SQL ‘Disk usage by top tables’ I noticed the PK tables are the biggest in terms of disk space.

My PK table of the GL Entry table contains 4.5 billion rows! And is 95 GB. While my DATA table contains only 118 million records.

When I query the PK table and filter the primay key in this table on 1 value. This value is saved 75 times in this table. For every odx_batch_number (odx_batchnumber 0 – 74). Is this normal? I think it is a bit strange that my PK tables are the biggest tables in the ODX in terms of disk space.

Even when I run the storage management task. It doesn’t clean the PK table’s. It always contains the primary key for every odx batch load.

The customer is using version 20.10.37.64

Best answer by rory.smith

Hi Bas,

I expect having older PK states would be useful for replaying history. This way the ODX replaces the “type 2 every field in every table” approach. Handle deletes and updates are quite expensive in performance, so they are not useful for every table generally.

View original

Did this topic help you find an answer to your question?

Thomas Lind
Community Manager
1100 replies
2 years ago
February 10, 2023

Hi @bas.hopstaken

It adds all PK values to the PK table every time you do a execution.

It is used to compare what fields are deleted or updated. It does not matter whether you use SQL or Data Lake to store the data in.

To decrease this size run a full load and use the storage management task to only allow 1 version.

Also this rollup feature may be able to reduce the size.

It does not apply to the SQL Storage type though.

bas.hopstaken
Author
TimeXtender Xpert
91 replies
2 years ago
February 13, 2023

Hi Thomas,

Thanks for the quick reply, I'm only wondering. Is it intended to work like this?

I understand the purpose of the PK table. But if you save the last 2 loads/batches in this PK table should be enough right?

Is it really needed to save all primary keys for every load/batch?

Bas Hopstaken (Cmotions)

rory.smith
TimeXtender Xpert
713 replies
Answer
2 years ago
February 13, 2023

Hi Bas,

bas.hopstaken
Author
TimeXtender Xpert
91 replies
2 years ago
February 15, 2023

Hi @rory.smith,

Replaying history can be a reason to save the PK's for all batches. I know that this option is quite expensive in performance. But I didn't know all PK's for all batches are saved.

Nevertheless, it would be great if we have the option to choose how many batches need to be saved in the PK tables. Maybe the storage management task can be used for this. This way we can save a lot of disk space for the PK tables.

Bas Hopstaken (Cmotions)

rory.smith
TimeXtender Xpert
713 replies
2 years ago
February 15, 2023

Hi @bas.hopstaken ,

I agree: Storage Management tasks should have an option to clean the _PK tables (as this could be quite expensive). I would intuitively expect those tasks to also clean that up I think.

Christian Hauggaard
Community Manager
1162 replies
2 years ago
February 20, 2023

Please post and upvote this in the idea section. Many thanks!

Bitmetric_Maarten
Contributor
14 replies
1 year ago
January 22, 2024

Hi All,

We are running into the same issue. The size of the PK table is absolutely ridiculous compared to the table it is being used for. Is there any progress on cleaning the PK table from TX?

bas.hopstaken
Author
TimeXtender Xpert
91 replies
1 year ago
January 22, 2024

Hi all,

As a temporary workaround, I've created a dynamic SQL script to truncate all ‘PK’ tables:

_____________________________________________________________________

USE TX_ODX_LIVE

DECLARE @TableName NVARCHAR(128)
DECLARE @TableSchema NVARCHAR(128)
DECLARE @SQL NVARCHAR(MAX)

-- Create a cursor to loop through table names
DECLARE table_cursor CURSOR FOR
SELECT table_name , table_schema
FROM information_schema.tables a
WHERE TABLE_TYPE = 'BASE TABLE' AND table_name LIKE '%_PK_%'

-- Loop through the tables and generate TRUNCATE TABLE statements
OPEN table_cursor
FETCH NEXT FROM table_cursor INTO @TableName, @TableSchema

WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQL = 'TRUNCATE TABLE ' + @TableSchema +'.[' + @TableName + ']'
EXEC sp_executesql @SQL

FETCH NEXT FROM table_cursor INTO @TableName, @TableSchema
END

-- Close and deallocate the cursor
CLOSE table_cursor
DEALLOCATE table_cursor

_____________________________________________________________________

Please be aware: Do not run this script while reloads are running.

I created a idea for this a while ago:

https://support.timextender.com/ideas/add-option-to-clean-pk-tables-889

Let's upvote this idea and hope it will be implemented quickly.

Bas Hopstaken (Cmotions)

rory.smith
TimeXtender Xpert
713 replies
1 year ago
January 22, 2024

Hi @bas.hopstaken ,

you are dropping the whole _PK set here right? Does that not destroy the deletion handling / PK updating feature? I would only drop everything older than the last state, but that is probably an expensive deletion and might require index maintenance.

As the _PK table seems to be required only for table-internal processing, you should be able to post-script on the _PK table after cleanse without (b)locking other processes. Depending on process duration you might either do a simple post-process or run weeken maintenance or something like that.

I would really like to know what the reasoning behind storing all the states is, as I cannot imagine actually using the data without writing a whole load of complex SQL in some kind of disaster recovery scenario.

Reply

Rich Text Editor, editor1

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

How to recognize deleted records in ODX Server Storageicon

Add and Configure Data Sources

What is an Ingest Instance Server

TimeXtender 6143.1

How to resolve Ingest Instance error: Property 'OwnerId' does not exist for 'object specified'

Most helpful members this week

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings