Solved

How can I conditionally fill down/flash fill NULL values with previous values based on certain criteria?

  • 11 April 2023
  • 5 replies
  • 519 views

Userlevel 2

I would like to be able to flash fill down NULL values in my DSA table with certain conditions.

In the table below I have multiple NULL values. Take for example the column ‘CardCode DUAL’.
Row 2 with Company key MTW and project 1201121979 shows for CardCode DUAL DB0006. I would like to show value DB0006 also for all other rows where company key = MTW and project = 1201121969.

Same for Route Bron column. I would like to fill down NULL values on the most recent NON Blank value for that Company_Key+Project combination.

I think it should be possible with a self join or self select, but not sure how. 

 

 

icon

Best answer by fwagner 11 April 2023, 14:58

View original

5 replies

Userlevel 2
Badge +3

Hi @KCMT,

thanks for your question!

There’s several ways you can achieve the fill down (or “fill forward” or “last non empty”) - four of them are described here:

https://www.andrewvillazon.com/forward-fill-values-t-sql

The most transparent and concise way to achieve this is the LAST_VALUE/FIRST_VALUE functions of T-SQL: https://learn.microsoft.com/en-us/sql/t-sql/functions/last-value-transact-sql?view=sql-server-ver16

 

SELECT 
FIRST_VALUE( [CardCode DUAL] ) OVER (
PARTITION BY [company key], [project]
ORDER BY some_timestamp_or_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS CardCode_DUAL_FILLED_FORWARD,

FIRST_VALUE( [Route Bron] ) OVER (
PARTITION BY [company key], [project]
ORDER BY some_timestamp_or_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS Route_Bron_FILLED_FORWARD

FROM [actual_table]

 

Within TimeXtender you would create a custom view that has the filled-forward columns, and then update the actual table from that view.

Please let us know if that info helps you taking the next steps.

Userlevel 5
Badge +7

Hi @KCMT and @fwagner ,

the general approach suggested by Frank will work - just be aware of a pitfall: https://sqlperformance.com/2019/08/sql-performance/t-sql-bugs-pitfalls-and-best-practices-window-functions (see under Implicit frame with FIRST_VALUE and LAST_VALUE) to get the correct LAST_VALUE.

You could also load the PK fields + fields to grab the “best” value from as a separate table and use an Aggregate table to find the value you want. Then use the Aggregate table as a lookup source.

Which approach you choose depends on the performance vs. clarity balance you wish to achieve.

 

Userlevel 2
Badge +3

thanks @rory.smith you’re absolutely correct about the balance between clarity and performance :-)

I adjusted the above SQL statement to be more aligned with best practices for anyone who’s looking for a reference

Userlevel 5
Badge +7

Just a little note: if your data has the filled value you want to propagate in a random records in the subset, you may want to use FIRST_VALUE with IGNORE NULLS. This is new for SQL Server 2022 and Azure SQL DB only.

So in a contrived AW2014 SalesOrderDetail example:

 

Userlevel 6
Badge +5

Hi @KCMT did the answers above resolve the issue? If so can you please help us by marking the best answer above? Please let us know if you have any follow up questions 

Reply