Solved

Loading transactions


I have a transaction table in odx with the following fields (simplified)

  • Key,Value, transactiondate 

Example

100, V100, 2023-01-01
100, V100A,2023-01-31 09
100, V100B,2023-01-31 10

 Each day there are multiple transactions added , of course with  another transaction date.
Note in the example I  have only shown the date part (not hours, minutes, seconds)

 What I need is a table with the latest update in DSA

I created the same table. Set primary key to KEY and added an incremental selection rule on transactiondate . Added history settings to update the Value and transaction date based on the key

Where it goes wrong is if we get in one day multiple transactions 

100, V100A,2023-01-31 09
100, V100B,2023-01-31 10

I traced down the cleansing procedure and Tx detects that there are twice the same key. So far so good.
Next it puts the ID of  the latest transcation into <table>_L  and only processes the ID’s that ar not in <table>_L

The result is that I get  100, V100A,2023-01-31 09 which is incorrect !

Looks like every run can only process one Key update !!!

 

What is the solution ?

 

 

BTW I am using old school TimeXtrender without the hub

 

 

icon

Best answer by rory.smith 2 February 2023, 11:24

View original

11 replies

Userlevel 6
Badge +5

Hi Alain

You are not using the ODX, but a Business Unit, right?

If so I would set it to not use incremental load in the BU and then set it on the DWH level instead.
You can then make each row unique by using a supernatural key or similar.

To avoid moving the same data again and again I would make a dynamic variable of this day 5 days ago and then use that as a filter on the table.

If you are using an ODX I have another idea.

Hi Thomas

 

Yes i am using a business unit.
Treansaction tables are huge , so I  cannoit reload everytime the business unit table 

The problem is in fact that  I can not load  two records with the same key at once in a table  with history settings. It will only take the first and put the rest in error 

I have attached a  project that simulates it.
You’ll see that transactions arrive from two sources.  But Iyou can delete SRC2 and run only with SRC1 it will show what I mean

The raw table will contain all the records. Then it goes wrong. I should expect that at least he would store the latest transaction.   So that you end up with  
KEY1    DSC1    TIMESTAMP
100    S100C    2023-02-03 00:00:00.000

  Note : It goes even more wrong when you have two sources. I guess we can avoid this using a view to populate the AGR table. it considers the 100 record from the second source as teh recenbt oen !!!

 

 

CREATE TABLE [dbo].[SRC1](
    [KEY1] [varchar](50) NOT NULL,
    [DSC1] [varchar](50) NULL,
    [TIMESTAMP] [datetime] NULL
) ON [PRIMARY]
 

 

 

Userlevel 6
Badge +5

Hi Alain

With the table you have shown it can’t show the same key twice even if it is on different days, it will have to either be sent to error or update the existing row.

To avoid that you need to make each row have an unique primary key. One way to do this, is to merge Key1 and Desc1 into another field and use that as the primary key.

Like I did here with a supernatural key.

 

Then each row will be unique and you can apply the key as a primary key.

Thomas , I just wnat the record with the latest date. I don’t wnat all the records . Thats just my issue.

I cannot get the latest record in a table

Alain

Badge

Hi Alain!

 

Thanks for posting your question here in the community.

 

To solve this I would create a composite key based on multiple columns. Make sure that there are no NULL fields in the selected columns. So in your example [KEY] & [timestamp] and place a selection rule on [timestamp] (in below screenshot StartDate) not empty.

Of course you can map multiple sources to this table and then use your DW_SourceCode system field in your composite key as well.

See screenshot below:

 

If the table is loaded incremental, make sure that you deploy & execute and perform a FULL LOAD by selecting review tasks and select all deployment steps. See screenshot below:

 

Now you have all the data in 1 table and you can create a simple view to select the most recent record. See screenshot below:

 

You can of course also do this directly on the source with a query table, but from your example I got the impression that you first want to merge all your sources.

 

Please let us know if this is what you're looking for?

DISSECT the cleaning process of TimeXtender

I have been trying things out and   here is what I found out , or at least it is what I derived from my findings . Actually the problem is also known as ‘time travel’ (eg Snowflake)

Problem statement

table is simple  KEY,VALUE, Timestamp ; where timestamp is the timestamp of the transaction and not the loading time 

test set is as follows 

DW_Id    KEY1    DSC1    TIMESTAMP    DW_Batch    DW_SourceCode    DW_TimeStamp
1    100    S100    2023-01-30 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270
2    101    S101    2023-01-30 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270
3    102    S102    2023-01-30 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270
4    100    S100A    2023-02-01 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270
5    100    S100B    2023-02-02 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270
6    100    S100C    2023-02-03 00:00:00.000    98    DEMO    2023-01-31 16:16:05.270

Key is set to KEY1
History settings are set on ‘update’  DSC1 and Timestamp

The default behavior will result in 

DW_Id    KEY1    DSC1    TIMESTAMP    DW_Batch    DW_SourceCode    DW_TimeStamp    SCD Is Current
1    100    S100    2023-01-30 00:00:00.000    112    DEMO    2023-02-01 12:02:14.860    1
2    101    S101    2023-01-30 00:00:00.000    112    DEMO    2023-02-01 12:02:14.860    1
3    102    S102    2023-01-30 00:00:00.000    112    DEMO    2023-02-01 12:02:14.860    1

You get the the fist value of Key 100 the rest is ignored. Technically it may make sense but it for humans it ius a debatable choice since we have a more recent record. (see 4,5,6 in source)
The raw data does include all the records , which is just perfect

Setting history settings to ‘create’ new record doesn’t change a thing

This is due to the cleaning logic that is used : see  hereunder

INSERT INTO [dbo].[AGRDEF_L] ([DW_Id], [ErrorNumber])     SELECT T.[DW_Id], 2     FROM     (         SELECT             [DW_Id]             , ROW_NUMBER() OVER (PARTITION BY [KEY1] ORDER BY [DW_Id] ASC) AS [Row Number]         FROM [dbo].[AGRDEF_T]     ) T     WHERE         T.[Row Number] > 1

Since you partition by Key and order asc  , you take the key with the lthat is first entered in the dwh , hence has the lowest DW_ID and insert it.. The other records are conbsiderd in error and will not be loaded 

Work around 1

The ‘human’ solution is to take the last record that is inserted , hence we have to change the ASC into DESC.
    INSERT INTO [dbo].[AGR_L] ([DW_Id], [ErrorNumber])     SELECT T.[DW_Id], 2     FROM     (         SELECT             [DW_Id]             , ROW_NUMBER() OVER (PARTITION BY [KEY1] ORDER BY [DW_Id] DESC) AS [Row Number]         FROM [dbo].[AGR_T]     ) T     WHERE         T.[Row Number] > 1

As a result you’ll pick up the record with the highest DW_ID if there are multiple records with the same key .

Easy solution just change 2 letters  !!! But I don’t like doing this 

Work around 2

 Use a data selection criteria  and a view

 

It basically does the same , force  the selection of the last record

This is probably the  ‘TimeXtender’ way of doing things. Unfortunately  it’s quiet a lot of fuzz for changing two letters  in code that actually is there.  One could argue that the default behavior should be the  DESC 

Work around 3

Create unique index on  KEY and Timestamp . You’ll basically get all the transactions on recods . This would correspond to what ‘time travel’  will do .

Note that in standard sql you’ll always have to  take into, account  the transaction date in queries to get the right value at that point. 

 This is an ‘old’ illness of referential databases known as  handling ‘temporal’ data 

Final word 

‘It’s a strange world we are living in’ 

  • If you have multiples updates of the key  record in a single load, TimeXtender will only process one update. Per default the one with the lowest DW_ID
  • If that’s not what you like you’ll have to do one of the workarounds.
Badge

Hi Alain,

 

Answering on your final word point:

“If you have multiples updates of the key  record in a single load, TimeXtender will only process one update. Per default the one with the lowest DW_ID”

This depends on the settings in TimeXtender. The default Primary key behavior is “error” and that is why it is not processing. This are the options:

 

Great Alex.
I didn’t catch that one. Knew I needed your  brains .
Thanks 

 

 

Userlevel 6
Badge +5

Hi @alain.thire can you please help us by marking the best answer if you are satisfied with the above answer? If you have any follow up questions please let us know. Thanks in advance!

 

Userlevel 5
Badge +7

Hi,

a long time ago (more than 6 years) I had a ticket on this topic, inserting into history tables where you get more than one record for a natural key per execution, the answer then was that sort order will not be settable. The option Alex pointed out is a good one as long as your error tables don't bloat, i.e. fast transactions where attribute fields ping-pong between values many times.

You could also consider Query tables with a sort order to make TX pick the correct record to keep, or Query Tables where you already aggregate the records away in the source, or apply Simple mode (no primary key checking), split the load into key and value fields and use an aggregate table on the keys to determine the Key+timestamp combination you want to pull value fields from, or load all records and create a view with the record selection logic on top (probably good to define a covering index in that case).

Having the option to sort data movements would be even better of course :-)

Hi,

The problem becomes even more complex if one has different sources with transactions. You’ll have to combine both sets into one  (view or table) and apply last transaction date logic on that one.

In fact the raw area gathers all the records already correctly taking into account the incremental settings . It would be nice if you could apply the logic on that. It would be the most generic solution .

I tried changing the key  behavior as Alex suggest but this doesn’t solve the issue of ‘being able to pick  the most recent record ‘ to the valid table .?

THe most frustrating to me is that the order cause ‘is’ in the cleansing script.  changing it solves the issue. 

In the absence I agree we’ll have to stick with good old sql logic.

 

 

 

 

Reply