Skip to main content
Question

Columns with source data type money behaving incorrect


Hi All,

 

Hopefully you can help me with this one. We have a source table from our POS systems containing sales transactions. There are a few value columns we transfer that are stored as a money data type and we see some weird things happening with the values of the columns. Differences of 0.01 up to 0.0001 are appearing from nowhere in the transfer from ODX to DWH and i don't know how this i happening.
The columns that have money data type are called ‘Bedrag', ’Korting’ and ‘NettoBedrag’
 

Take transaction 52. In the source the values are like this:
 

After transfer with ODX, i looked in the parquet file using ParquetViewer and that looks like this:
 

Still the same, nothing wrong here.

I imported the table without any modifications in TimeXtender and it looks like this:
 

After a transfer the data looks like this:
 

If you look closely we lost 0.0001 on the first record ‘NettoBedrag’. Also wel lost 0.0001 on the Bedrag of the second record.

I've tried a few things to mitigate this. Different numeric like data types like a float do not solve it. However when stored as a varchar it does work:
 

 

Another workaround i found was making a QueryTable with all data type conversions on the source (to DECIMAL(18,4) and that also worked out:
 


We have 8 sources of this POS System we transfer every night, so before i make the big change to Query Tables i wanted to throw this on here to see if anyone has any idea how this can happen and how to resolve it.

We are using TX 6645.1. ODX in ADLS, DWH in Azure SQL DB.

14 replies

Userlevel 6
Badge +7

Hi,

 

Money types are somewhat problematic in certain cases. Microsoft's own documentation states: "You can experience rounding errors through truncation, when storing monetary values as money and smallmoney. Avoid using this data type if your money or currency values are used in calculations. Instead, use the decimal data type with at least four decimal places.” See: (https://learn.microsoft.com/en-us/sql/t-sql/data-types/money-and-smallmoney-transact-sql?view=sql-server-ver16). More details to be found here: https://stackoverflow.com/questions/224462/storing-money-in-a-decimal-column-what-precision-and-scale .

I assume the problem stems from the fact that Parquet has no Money type and TimeXtender will either convert this to String or some Decimal type (hopefully with the correct scale and precision). Subsequently, when loading from ODX into DWH there may be another conversion to a Decimal(38,4) which is incorrect. Given that TimeXtender nowadays usually correctly applies precision other than 38 when defined in the source, I am expecting the Parquet format to be the problem here. Otherwise it would be Decimal(19,4). This may introduce differences in rounding because the Money type uses Banker's rounding and Decimal applies truncation (your deviating results are always lower than the expected). This is why you usually store one more decimal than you require; how many you require depends on the compliancy environment you are working in. Any calculation with Money (and with Decimals) requires you to use enough space to capture the result without truncation, see also: https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16 for precision and scale under mathematical operations.

 

I would check to see what data type is allocated to Money in the Parquet file, if it is something other than Decimal(19,4) then that will be a cause for deviation.

Hi @rory.smith ,

Thanks for your reply.
Only way for me to check the Parquet file is the program ParquetViewer. This has a built in metadata viewer which shows the following:

{
"Path": "Bedrag",
"Type": "FLOAT",
"TypeLength": null,
"LogicalType": null,
"RepetitionType": "OPTIONAL",
"ConvertedType": "",
"Children": []
},
{
"Path": "Korting",
"Type": "FLOAT",
"TypeLength": null,
"LogicalType": null,
"RepetitionType": "OPTIONAL",
"ConvertedType": "",
"Children": []
},
{
"Path": "NettoBedrag",
"Type": "FLOAT",
"TypeLength": null,
"LogicalType": null,
"RepetitionType": "OPTIONAL",
"ConvertedType": "",
"Children": []
}

It clearly isn't a decimal, but i'm totally unsure what i'm looking at. Is there another way to determine data types in Parquet files?

What do you think is the right way to move forward? Should override data types provide a solution? Or is a Query Table the only way to solve this?

Userlevel 6
Badge +7

Hi,

so the data is written to Parquet as Float apparently. That is incorrect (assuming the source is SQL Server and the column has a Money type) and will lead to interpretation issues. Floats and Decimals are both approximations of real numbers with different drawbacks and advantages for each. In this case, to conserve the exact value it should go: Money(source) → LogicalType: Numeric(19,4) (ODX) → Numeric(19,4) (DWH)

See: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md

For now I would use QueryTables to work around this.

 

@Christian Hauggaard : I think this requires some investigation

Hi Rory,

Thanks for your advice. We'll go forward with Query Tables and converting the money data types to DECIMAL(19,4). 

I'm also curious if this is considered a flaw in ODX and if this can be mitigated.

Kind regards,

 

Jogchum

Userlevel 6
Badge +5

Hi @JogchumSiR 

I am unable to reproduce the issue you describe in 6698.1 

I have added a table with the following 2 columns on my local SQL server, and inserted the following values

 Then I created a TX SQL data source and mapped it to my ODX with data lake storage

Then I executed the sync and the transfer task, and brought this table into my Azure SQL db

Do you notice anything different between our setups? Which data source are you using?

Userlevel 6
Badge +7

Hi @Christian Hauggaard ,

for other types of data source it may be different but for SQL Server the money and smallmoney type should not be converted to decimal(38,4). That looks like the parquet file is not getting the complete metadata or the ODX → DWH conversion is not taking the metadata into account. While that should not influence the value getting loaded into the DWH, it will trip people up once they start multiplying or dividing those fields and data type conversions can be performance killers.

If the source system has some kind of money type that has more accuracy than SQL's 4, you will potentially get rounding issues in the conversion as 147.817430 will become 147.8174 but 147.817460 will not due to truncation vs. banker's rounding (in the case of (19,6)). I would expect any system dealing with multiple currencies to store with (x,6) as this is a standard used internationally.

 

Hi Both,

 

Only thing i can think of is the specific setting on the column and the SQL Server version of the source. Our source is running SQL Server 2016 (On-Prem). Also our data set size is about 45 mln rows, so there may also something going wrong because of the size of the data set (just guessing).

 

Kind regards,

 

Jogchum

Userlevel 6
Badge +5

Hi @JogchumSiR 

I tested with SQL Server 2016 as the source

 And I still get the following values in my Azure SQL db DW

I am using the following properties (default) for money and small money source fields, are you using the same or different settings?

 

Hi @Christian Hauggaard,

Those precisions are matching those of my columns. Also i want to notice not every row was being wrong. Some did go through just fine. I'm sorry but i couldn't tell you more on how to recreate the issue. We did not change the precision or any other field setting ourselves.

Userlevel 6
Badge +7

Hi @JogchumSiR ,

 

are any of the money/smallmoney fields in your source calculated columns?

Hi @rory.smith ,

The columns are purely values, only populated by software. There is no calculation definition set on the column itself

Userlevel 6
Badge +5

Hi @JogchumSiR can you please try to setup a simple test as I have done with one record by creating the same table as you have currently (i.e. running the create table as script and giving the table a new name) and then insert only one record with a value 147.8174? 

Do you get the same rounding error with this test?

Hi @Christian Hauggaard ,

I've retested this.
I created a seperate table in SQL for this testing purpose:
 

Loaded into ODX and transferred from ODX to a testing Data Area in our dev environment:
 

The data is now different as shown from the preview:
 

 

Hope this helps.

Userlevel 6
Badge +5

@JogchumSiR I have created a support ticket for this, so we can schedule a meeting to investigate further

Reply