Each day there are multiple transactions added , of course with another transaction date. Note in the example I have only shown the date part (not hours, minutes, seconds)
What I need is a table with the latest update in DSA
I created the same table. Set primary key to KEY and added an incremental selection rule on transactiondate . Added history settings to update the Value and transaction date based on the key
Where it goes wrong is if we get in one day multiple transactions
100, V100A,2023-01-31 09 100, V100B,2023-01-31 10
I traced down the cleansing procedure and Tx detects that there are twice the same key. So far so good. Next it puts the ID of the latest transcation into <table>_L and only processes the ID’s that ar not in <table>_L
The result is that I get 100, V100A,2023-01-31 09 which is incorrect !
Looks like every run can only process one Key update !!!
What is the solution ?
BTW I am using old school TimeXtrender without the hub
Page 1 / 1
Hi Alain
You are not using the ODX, but a Business Unit, right?
If so I would set it to not use incremental load in the BU and then set it on the DWH level instead. You can then make each row unique by using a supernatural key or similar.
To avoid moving the same data again and again I would make a dynamic variable of this day 5 days ago and then use that as a filter on the table.
If you are using an ODX I have another idea.
Hi Thomas
Yes i am using a business unit. Treansaction tables are huge , so I cannoit reload everytime the business unit table
The problem is in fact that I can not load two records with the same key at once in a table with history settings. It will only take the first and put the rest in error
I have attached a project that simulates it. You’ll see that transactions arrive from two sources. But Iyou can delete SRC2 and run only with SRC1 it will show what I mean
The raw table will contain all the records. Then it goes wrong. I should expect that at least he would store the latest transaction. So that you end up with KEY1 DSC1 TIMESTAMP 100 S100C 2023-02-03 00:00:00.000
Note : It goes even more wrong when you have two sources. I guess we can avoid this using a view to populate the AGR table. it considers the 100 record from the second source as teh recenbt oen !!!
CREATE TABLE >dbo].>SRC1]( TKEY1] Evarchar](50) NOT NULL, ]DSC1] hvarchar](50) NULL, TIMESTAMP] rdatetime] NULL ) ON PRIMARY]
Hi Alain
With the table you have shown it can’t show the same key twice even if it is on different days, it will have to either be sent to error or update the existing row.
To avoid that you need to make each row have an unique primary key. One way to do this, is to merge Key1 and Desc1 into another field and use that as the primary key.
Like I did here with a supernatural key.
Then each row will be unique and you can apply the key as a primary key.
Thomas , I just wnat the record with the latest date. I don’t wnat all the records . Thats just my issue.
I cannot get the latest record in a table
Alain
Hi Alain!
Thanks for posting your question here in the community.
To solve this I would create a composite key based on multiple columns. Make sure that there are no NULL fields in the selected columns. So in your example eKEY] & ;timestamp] and place a selection rule on ntimestamp] (in below screenshot StartDate) not empty.
Of course you can map multiple sources to this table and then use your DW_SourceCode system field in your composite key as well.
See screenshot below:
If the table is loaded incremental, make sure that you deploy & execute and perform a FULL LOAD by selecting review tasks and select all deployment steps. See screenshot below:
Now you have all the data in 1 table and you can create a simple view to select the most recent record. See screenshot below:
You can of course also do this directly on the source with a query table, but from your example I got the impression that you first want to merge all your sources.
Please let us know if this is what you're looking for?
DISSECT the cleaning process of TimeXtender
I have been trying things out and here is what I found out , or at least it is what I derived from my findings . Actually the problem is also known as ‘time travel’ (eg Snowflake)
Problem statement
table is simple KEY,VALUE, Timestamp ; where timestamp is the timestamp of the transaction and not the loading time
You get the the fist value of Key 100 the rest is ignored. Technically it may make sense but it for humans it ius a debatable choice since we have a more recent record. (see 4,5,6 in source) The raw data does include all the records , which is just perfect
Setting history settings to ‘create’ new record doesn’t change a thing
This is due to the cleaning logic that is used : see hereunder
INSERT INTO dbo].iAGRDEF_L] (tDW_Id], dErrorNumber]) SELECT T.pDW_Id], 2 FROM ( SELECT oDW_Id] , ROW_NUMBER() OVER (PARTITION BY cKEY1] ORDER BY hDW_Id] ASC) AS
Since you partition by Key and order asc , you take the key with the lthat is first entered in the dwh , hence has the lowest DW_ID and insert it.. The other records are conbsiderd in error and will not be loaded
Work around 1
The ‘human’ solution is to take the last record that is inserted , hence we have to change the ASC into DESC. INSERT INTO adbo]. AGR_L] (WDW_Id], iErrorNumber]) SELECT T.sDW_Id], 2 FROM ( SELECT >DW_Id] , ROW_NUMBER() OVER (PARTITION BY nKEY1] ORDER BY DW_Id] DESC) AS Row Number] FROM dbo]. AGR_T] ) T WHERE T..Row Number] > 1
As a result you’ll pick up the record with the highest DW_ID if there are multiple records with the same key .
Easy solution just change 2 letters !!! But I don’t like doing this
Work around 2
Use a data selection criteria and a view
It basically does the same , force the selection of the last record
This is probably the ‘TimeXtender’ way of doing things. Unfortunately it’s quiet a lot of fuzz for changing two letters in code that actually is there. One could argue that the default behavior should be the DESC
Work around 3
Create unique index on KEY and Timestamp . You’ll basically get all the transactions on recods . This would correspond to what ‘time travel’ will do .
Note that in standard sql you’ll always have to take into, account the transaction date in queries to get the right value at that point.
This is an ‘old’ illness of referential databases known as handling ‘temporal’ data
Final word
‘It’s a strange world we are living in’
If you have multiples updates of the key record in a single load, TimeXtender will only process one update. Per default the one with the lowest DW_ID
If that’s not what you like you’ll have to do one of the workarounds.
Hi Alain,
Answering on your final word point:
“If you have multiples updates of the key record in a single load, TimeXtender will only process one update. Per default the one with the lowest DW_ID”
This depends on the settings in TimeXtender. The default Primary key behavior is “error” and that is why it is not processing. This are the options:
Great Alex. I didn’t catch that one. Knew I needed your brains . Thanks
Hi @alain.thire can you please help us by marking the best answer if you are satisfied with the above answer? If you have any follow up questions please let us know. Thanks in advance!
Hi,
a long time ago (more than 6 years) I had a ticket on this topic, inserting into history tables where you get more than one record for a natural key per execution, the answer then was that sort order will not be settable. The option Alex pointed out is a good one as long as your error tables don't bloat, i.e. fast transactions where attribute fields ping-pong between values many times.
You could also consider Query tables with a sort order to make TX pick the correct record to keep, or Query Tables where you already aggregate the records away in the source, or apply Simple mode (no primary key checking), split the load into key and value fields and use an aggregate table on the keys to determine the Key+timestamp combination you want to pull value fields from, or load all records and create a view with the record selection logic on top (probably good to define a covering index in that case).
Having the option to sort data movements would be even better of course :-)
Hi,
The problem becomes even more complex if one has different sources with transactions. You’ll have to combine both sets into one (view or table) and apply last transaction date logic on that one.
In fact the raw area gathers all the records already correctly taking into account the incremental settings . It would be nice if you could apply the logic on that. It would be the most generic solution .
I tried changing the key behavior as Alex suggest but this doesn’t solve the issue of ‘being able to pick the most recent record ‘ to the valid table .?
THe most frustrating to me is that the order cause ‘is’ in the cleansing script. changing it solves the issue.
In the absence I agree we’ll have to stick with good old sql logic.