Solved

Change Data Type in Cleansing

2 years ago
September 9, 2022
2 replies
149 views

tobias.schmocker
Visitor

Hi,

Let's assume I am importing from CSV file. The Data is very dirty and somehow I have stings in my int filed "ID" so the filed is interpreted as NVARCHAR(MAX). If I change data type of the field (right click > Edit Field) from string back to integer I receive a runtime error during the execution. What I was expecting was to see was corresponding entries in the _M and _L tables. Is this not part of data cleansing after all?

We have this problem all over and could not find a better solution than to build custom views with TRY_CAST etc. where we lose lineage etc.

What is the best way to clean up such data (keep valid rows, write errors/warnings on invalid rows)? We are talking 100+ tables so I'm looking for a highly scalable solution here not a one time work around.

Thanks and BR, Tobias

Best answer by fwagner

Hi Tobias,

Thank you for your question!

A quick solution I could come up with is creating custom data selection rules to separate valid from invalid records:

Creating log messages for the invalid rows takes some additional work - so that may or may not be feasible for large numbers of tables.

This will allow you to separate out the dirty records from the clean ones:

I will propose to the product team to offer TRY_CAST as an option and log messages for dirty records in a more "natural/integrated" way.

View original

Did this topic help you find an answer to your question?

fwagner
Employee
33 replies
Answer
2 years ago
September 26, 2022

Hi Tobias,

Thank you for your question!

A quick solution I could come up with is creating custom data selection rules to separate valid from invalid records:

Creating log messages for the invalid rows takes some additional work - so that may or may not be feasible for large numbers of tables.

This will allow you to separate out the dirty records from the clean ones:

I will propose to the product team to offer TRY_CAST as an option and log messages for dirty records in a more "natural/integrated" way.

fwagner
Employee
33 replies
2 years ago
September 27, 2022

Hi Tobias,

my colleague Thomas suggested that you can also use RSD files to clean "dirty" columns.

I think this is a great solution also for larger numbers of columns/tables.

These are the steps to take:

In your CSV data source you set these two parameters: Generate Schema Files and Schema Location

Running the Synchronise task will generate a RSD file.
Then you modify the generated RSD files to clean the data on reading, in the columns you specify:

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.cdata.com/ns/rsbscript/2" xmlns:other="http://apiscript.com/ns?v1">

  <api:info title="dirtyfacts.csv" other:catalog="CData" other:schema="CSV" description="" other:version="20">
    <attr name="cleanint" xs:type="int" isnullable="true" other:internalname="cleanint" />
    <attr name="cleanstring" xs:type="string" isnullable="true" other:internalname="cleanstring" />
    <attr name="dirtyint" xs:type="int" isnullable="true" other:internalname="dirtyint" />
  </api:info>

  <api:set attr="uri" value="[_connection.uri]" />

  <api:script method="GET">
    <api:call op="csvproviderGet">

      <api:set attr="_out1.dirtyint" value="[dirtyint | isnumeric([dirtyint], -999)]" />

      <api:push />


    </api:call>
  </api:script>

</api:script>

This will give you the columns, cleaned, with invalid values already replaced with a default integer value.

From this source:

This data will be produced:

I hope this helps.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

storing dsl verbindingicon

Geen vast internet Waddinxveenicon

internet rood lampjeicon

geen verbindingicon

storing internet verbinding thuisicon

Most helpful members this week

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings