Solved

PK table in ODX database is growing rapidly in terms of disk space

  • 10 February 2023
  • 6 replies
  • 76 views

Userlevel 1
Badge +1

Hi community,

I’ve a question about the ‘PK’ table that is created in the ODX when you enabled the ‘Handle primary key updates’ and ‘Handle primary key deletes’.

I’ve noticed the ODX database (SQL Database) is growing rapidly in disk space after enable an incremental load schedule to reload data every 5 minutes. When I used the default SQL ‘Disk usage by top tables’ I noticed the PK tables are the biggest in terms of disk space.

My PK table of the GL Entry table contains 4.5 billion rows! And is 95 GB. While my DATA table contains only 118 million records.

When I query the PK table and filter the primay key in this table on 1 value. This value is saved 75  times in this table. For every odx_batch_number (odx_batchnumber 0 – 74). Is this normal? I think it is a bit strange that my PK tables are the biggest tables in the ODX in terms of disk space.

Even when I run the storage management task. It doesn’t clean the PK table’s. It always contains the primary key for every odx batch load.

The customer is using version 20.10.37.64

icon

Best answer by rory.smith 13 February 2023, 09:51

View original

6 replies

Userlevel 3
Badge +5

Hi @bas.hopstaken 

It adds all PK values to the PK table every time you do a execution.

It is used to compare what fields are deleted or updated. It does not matter whether you use SQL or Data Lake to store the data in.

To decrease this size run a full load and use the storage management task to only allow 1 version.

Also this rollup feature may be able to reduce the size.

 

It does not apply to the SQL Storage type though.

Userlevel 1
Badge +1

Hi Thomas,

Thanks for the quick reply, I'm only wondering. Is it intended to work like this?

I understand the purpose of the PK table. But if you save the last 2 loads/batches in this PK table should be enough right?

Is it really needed to save all primary keys for every load/batch? 

Userlevel 3
Badge +2

Hi Bas,

I expect having older PK states would be useful for replaying history. This way the ODX replaces the “type 2 every field in every table” approach. Handle deletes and updates are quite expensive in performance, so they are not useful for every table generally.

Userlevel 1
Badge +1

Hi @rory.smith,

Replaying history can be a reason to save the PK's for all batches. I know that this option is quite expensive in performance. But I didn't know all PK's for all batches are saved.

Nevertheless, it would be great if we have the option to choose how many batches need to be saved in the PK tables. Maybe the storage management task can be used for this. This way we can save a lot of disk space for the PK tables.

 

 

Userlevel 3
Badge +2

Hi @bas.hopstaken ,

 

I agree: Storage Management tasks should have an option to clean the _PK tables (as this could be quite expensive). I would intuitively expect those tasks to also clean that up I think.

Userlevel 4
Badge +5

Please post and upvote this in the idea section. Many thanks! 

Reply