Raw Only fields in history tables

Allow raw only fields in these tables.
For instance we have incremental loads but the date field is only needed in the raw step for TX to know which records to query.
Now we are forced to store it in the valid step as well.

Another example if you have a snowflake you do not want to store the business key but only the SK_ of the snowflaked table. You only need the business key in the RAW and Transform steps.

To decided if you have to update, deleted or insert you only need to have the SK_ in the hash calculation. The first step anyway in the cleansing procedure is to update the raw table with all lookup values and go from there using the transformation view to insert into the valid step. 

Performance wise it is not such a huge impact on one table but if you have a massive project with 100ths of dimension tables it will allow you to reduce some IO when writing to the valid step. Also you keep your valid tables small in terms of page size on sql server because you do not have the overhead of unneeded fields, this gives you better query performance for later queries/actions as well.


Please sign in to leave a comment.