Hi @Fench,
Did you run the Synchronize task after setting the data type overwrite? This needs to be done after adding or changing data type overwrites.
Hi,
I assume you are using CData CSV? Try generating an .rsd file and change the column type to be string (you can also define the columnsize if you know the max length to avoid varchar(2000) columns). As an alternative you can set the rowscan to scan the whole file, but that will obviously affect performance.
I believe the default settings will only scan the first x records and base the field types on that. Any parsing failure will then break the load.
Hi @bas.hopstaken,
Thanks for the quick reply, Yes, i did and repeated the procedure just to make sure. No change.
Hi @rory.smith,
Thanks. CData indeed. I’ll pick that up and report back after.
Hi @Fench
You have to change the field in a RSD file if you want to change the data type.
However this issue would also be resolved if you set the Row Scan Field to 0 and synchronized.
Then it would read all 23000 rows before deciding the data types and since the last row contains a character it would become a string.
Hi @Thomas Lind
Our IT dept would like to stick to the CSV file format rather than switching to RSD.
Your alternative solution “set the Row Scan Field to 0” works, the table gets fully populated now and is available in the mdw. It does take longer to process as expected, but at least it’s operational again. Thank you!
Remaining question is, what is the purpose of the ‘Override Data Types’ functionality in the ODX?
I’d expect it to suit exactly the purpose of individually setting the column data types as required in the subject case. Should that not be the way to go?
Hi,
the .rsd file is just metadata/script to tell CData what to do with the data. It is similar to the oldschool schema.ini files. The path data takes is the following:
source → 1) → source connector (CData) → 2) → ODX storage → 3) → DWH instance
You can affect types in the following places
- CData .rsd file: changes the target type of the data pulled from source
- ODX data type override
- Overriding the data type of the target field in the DWH instance
You are getting an error because CData by default does an approximate type scan (similar to Excel) which results in a numeric type which can never store alphanumeric data. Loading the data results in an error. The same happens if the scan results in a string of length 32 but there is a record with 128 characters later in the file. You then need to explicitly implement a LEFT in the .rsd or expand the field type length. The ODX Server's data type override only happens after CData has read and interpreted the data and therefore after the parsing error has occurred.
Hi @Fench
Yeah that it takes longer to synchronize is the downside, but is also rather uncommon to have changes in row 23000, so normally it isn’t needed to set it to 0 when having so many rows in the file.
If you used the Aggregate feature where you merge multiple files, using the Meta Data Discovery URI can be used to add a template file with the correct data types that all other files then needs to follow.
About the override feature.
For all of these CData file based providers the Data Type override do not have an effect as the data types are decided before we apply the conversions. So it will first incorrectly read the field as a int and then you apply the change on that, but the error happens before we get a chance to convert it.
For all those CData providers that isn’t file based you can use it fine as it behaves similar to a SQL data source.
@rory.smith
Thank you for the explanations on the data type determination, clear enough.
@Thomas Lind
“If you used the Aggregate feature” i’ll certainly keep that in mind.
Thanks all!