12

Dedicated settings for snapshot tables

Snapshot fact tables are common in many data warehouse projects. Often the data that they store can not be re-loaded as the dw does the snapshotting and is it's only storage.

I dont think there is a bulletproof way to do this currently in TX in terms of performance and data safety. There should be dedicated settings for snapshot tables.
I've used incremental loading but that's not a safe way and I always take a backup of any snapshot tables before a deployment just to be safe.

I've also tried using the history setting but I found the performance to be really bad for big tables as the unnecessary comparison logic is such an overkill for what needs to done.

My suggestions for how this setting should work:

1. It should make sure that the valid table is never ever truncated or dropped (unless with a flashy warning dialog asking you to confirm)
2. You would set the snapshot value field.
3. It would do a delete for the incoming snapshot value before inserting the dataset.

Extra
----
4. It's features could be expanded to have configuration for defining the snapshotting value based on the exection time or a parameter, having it do a daily/weekly/monthly snapshot.
5. It could do a daily snapshot but then have a setting to only store the latest along with week-end/month-end snapshots.

 

 

17 comments

Please sign in to leave a comment.