In fact tables we are using subrogate keys from the dimension tables using supernatural keys.
The relationship between the fact table and the dimension table it’s using the bussines key and getting the subrogate key from the dimension table into fact table. But some times it doesn’t exists, and I’m using -1 to put that value.
What it’s the best way to include the unknow member (-1) in the dimension table?
Page 1 / 1
Hi @rvgfox ,
you can either use Custom Data (slightly less visible) or Table inserts from a master dummy table (slightly more involved).
I agree with Rory.
If you set -1 as the default value for fields in a fact table and wants it to be able to select this value in a dimension table, using custom data in the dimension table explaining what the -1 value is, would be my method as well.
Other methods are possible, as mentioned, but the custom data option is added to help with stuff like this. It will automatically be executed before running the data cleansing procedure.
Managing unknow members in fact and dimension tables
An "unknown member" in a fact table in relation to a dimension table refers to a record in the fact table that does not have a corresponding or matching value in the dimension table.
This can happen for several reasons, such as:
1. Data entry errors: For example, a typo in a product ID could cause it to find no match in the dimension table.
2. Delays in data updating: For example, if the data in the fact table is updated more frequently than the dimension table, new records might appear in the fact table that don't yet have a match in the dimension table. This case must be avoided using the order in loading the tables.
3. Data quality issues: In some cases, unknown members may indicate issues with data quality, such as inconsistency in data entry.
In TX to control this issue in the fact tables we’ll assign to the fields Id_sk_<dimension table> a fix value using a transformation:
-1 if the field it’s numeric
01/01/1900 if it’s a date
For example, in the next image I’m marking two cases but are the same in all surrogate keys:
To control the unknow member in the dimension tables, we’ll use a TX’s feature called “Add custom table insert”:
In the setup we must follow the next rules:
The name of the custom insert must be “Unknow member”.
The destination table must be the valid table (default value).
In the code we can see two parts:
Delete the record with sk_<table name>=-1 or ‘01/01/1900’ if it’s datetime.
Insert the record assigning the value to the sk_<table name> field and the meaningful values to the attributes that will be visible to the users in the model.
On time that the custom table insert has configurated, we can see it in:
@rvgfox have you considered using the “add related records” feature? I have used it to handle early arriving facts by adding distinct records from the fact table to the dim table. Since most dimension attributes are missing in the facts, fixed values with something like “missing in source system” are usually used for transparancy towards end user.
The produced code is something like this:
INSERT INTO DIMENSION_TABLE ( KeyField, Field1, Field2, Field3, ... )
SELECT DISTINCT KeyField, Field1, Field2, 'Missing in source system', ... FROM FACT_TABLE F
WHERE NOT EXISTS ( SELECT 1 FROM DIMENSION_TABLE D WHERE F.KeyField = D.KeyField )
Hi @pontus.berglund the performance of your solution is worse. In my case I am putting -1 in the unknown member in the fact table and it will be the nightly full load to update it if the correct value appears, but we need to the -1 value in the dimension.