Skip to main content
Solved

Not trim trailing spaces in key column


Hi, we are loading a table from our source data base. A column which is part of the key, contains values that are similar but not the same. The difference is in a trailing space. Now, when TimeXtender reads this dataset, it claims a violation of uniqueness. The column is set to be VARCHAR(), which probably causes the system to trim the trailing space, resulting in a duplicate value.

As the source system recognizes this as different entries, we want to keep both. How to solve this?
 

 

13 replies

Userlevel 6
Badge +7

Hi,

if the trailing space is trimmed I would consider that an error as a space is a character. What is the source datatype and what connector are you using to extract?

Userlevel 4
Badge +6

Hey,

Trailing spaces are indeed a character and TX should regard them as such. It really looks like the connector has some kind of TRIM build in or maybe the length of the field is not big enough?

maybe this is just me but an difference between an entry which is a space would be something I would have the data owner change in the source. Aldo data validation and grouping can be very difficult if the difference between two values is a space.

Hi, 

Thanks for your swift replies.

@rory.smith 
We connect to the Oracle data source with provider 'Microsoft Oracle Data Provider for .NET'
The data source itself claims the column is VARCHAR2 type.
 

@daniel 
I agree that it would be more robust if the owner would not use these entries, but at the moment we have this in the data source and found out that we only process 1 entry (the latest). Which is basically an inaccurate representation of the data. So even if the owner would change the behavior, I'd like to improve the data processing.

By the way, we found this in more fields than just this one, so it is ‘generic’ behavior in our model.

Userlevel 6
Badge +7

Hi,

if it is a varchar2 there should also be a parameter indicating length semantic type (byte or char) and length. You may be able to configure deeper settings in the provider but for Oracle in the 20.10.x series this can be a bit more difficult. Are there any string-related settings in Edit source (potentially under advanced)?

Hey @rory.smith , are you now talking about settings in TX or in the source? The data length would be 40, while the character length is set to 10 (at least this is what metadata of the source describes).


But interestingly enough we know that TX sees both values, as it we can query the individual records from TX:
 

Querying individual entries

 

Userlevel 3
Badge +1

@rwetelin if your table in tx is excluding the records and this field is set to the primary key you can go to table settings(f4) and in the general tab set Primary key behavior to warning or none. It probably is on Use project settings, this ussualy is handle it as an error. Handeling it as an error means it will delete the second rule it finds and put's it in the error table. You can also expand your primary key if you like. But setting the behavior to warning or none for this table should fix the problem

 

 

Userlevel 6
Badge +7

Hi,

I was referring to both really. It could be that the fact that varchar2 can be specified by byte or character length is not being properly picked up by the source synchronization. I would expect to see an nvarchar(10) as data type in TimeXtender for an Oracle varchar2(40 byte) as you can put other encodings than single byte inside Oracle's varchar2 (which is why byte length is longer than char length).

Disabling primary key validation like @Bernarddb suggests would work, but would require you to handle that downstream. It might help you bide time until a proper fix is available. I would raise a support ticket with TimeXtender.

Userlevel 3
Badge +1

I Agree with Rory, my solution is only a fix in TX which i would not preffer but is a workaround to get around the bug. 

Thanks for your replies. Your suggestion indeed is a workaround but not a true fix and may have unwanted implications. Will see to get some support.

Userlevel 6
Badge +5

@rwetelin this is how SQL behaves for PKs, please see the following documentation for reference: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/string-comparison-assignment?view=sql-server-ver16#remarks 

As shown in the example below the PK insert fails for values where one value has a trailing space  

 

This seems to be a data quality issue where the data owner should change the data in the source. If the data cannot be changed in the source, then a mapping table could be used to assign a more distinct value for the record - Exmon Data Management can be used to create such a mapping table, or Exmon Data Governance can be used to handle data quality issues.

Hi @Christian Hauggaard, thanks for the info! While kindly be aware that the constrain is applicable for SQL Server Database (SQL-92 standard) indeed, however, the data source as @rwetelin (my colleague) mentioned is Oracle Database which dons't give any error in this case. I agree with you logically the tailing space should not be part of the pk as best practice anyhow. But as a matter of fact in Oracle it is allowed and it is used to differentiate two records in our source in some cases. Thus even if the owner would change the behavior, we still want to look for a way to improve the data processing to bridge the gap.

Userlevel 6
Badge +5

@Xiaoqing Hu although it is an oracle data source, the PK is being applied on a SQL DW, which explains the behavior you describe. As mentioned, Exmon Data Management and Exmon Data Governance are great tools to handle these data quality/input issues. Please let me know if you have any further questions

@Christian Hauggaard  Thanks. We understand Exmon is a great tool.  Meanwhile we are checking if the gap can be fulfilled within TimeXtender ODX first as we are using a connector in TXD. We will raise a support ticket with TimeXtender and see.

Reply