Follow

Connect to CSV with CData

Before you attempt this you need to add the CSV provider to your list of available sources.

In the business unit we already have two data source providers that can connect to CSV/TXT files and for the most part I would suggest using those. There is however a few times where this can be used. Especially because this can connect to a folder with multiple files with different columns and fields in each of them.

Update

It is now possible to aggregate multiple files with different names, but the same structure into a single file, with the file name as a field.

Go through this guide before you start.

Add a CData data source

Contents

CSV Information

Here is a link to the file I use in this example

http://mysafeinfo.com/api/data?list=moviesbyearnings2005&format=csv

This one is set up with elements like this.

Year,Rank,Title,Gross $,Month Opened,Theatres
2005,1,Star Wars: Episode III - Revenge of the Sith,380270577.000000,2015-05,3663
2005,2,The Chronicles of Narnia: The Lion the Witch and the Wardrobe,291710957.000000,2015-12,3853
2005,3,Harry Potter and the Goblet of Fire,290013036.000000,2015-11,3858
2005,4,War of the Worlds,234280354.000000,2015-06,3910

CData Setup

Add a CData data source and choose the CSV provider.

The first step in setting up a CData connector should always be to click the ? in the top right corner.

Sp_rgsm_lstegn.PNG

CSV guide from CData

Connecting to one file

You need to save the link in the information area as something csv and save it in a folder your user has rights to.

I saved mine in C:/Flat Files/Movies2005.csv. Then you just add a link to it in the Data Source field.

You can then set it up like this.

1.PNG

You just need to add the location of the file in the URI or Data Source field and then it should work.

Connecting to more than one file

If you have multiple files, you can connect to a folder. Like this

2.PNG

The location is containing multiple CSV files with different names but a similar setup.

Connect to CSV files on a FTP/SFTP server

You also have the option of connecting to a FTP server that contains CSV files. The connection string is as follows.

URI=ftps://localhost:990/folder1; User=user1; Password=password1;

I have a FTP server with a folder that contains some CSV files and here is how I set it up.

3.PNG

So I added my password to the Password field, my user account to the User field and in the URI field I added the location of the server. The default ftp port is 21.

If the files are stored on a SFTP server it works similarly, but you need to change the string slightly.

URI=sftp://127.0.0.1:22/folder1; User=user1; Password=password1;

mceclip0.png

So the URI needs to start with sftp:// and the default port is 22 instead.

The file is not using the default delimiter settings

As long as it is comma separated, this is all you need. If it is not comma separated there are two options.

1st option

There are two fields you can use to change the default settings.

The first field is the FMT field. Seemingly it has three options.

mceclip0.png

That is actually not the case. As explained here Link it can actually be filled with whatever sign you want to use as a delimiter.

Here I overwrote the std options with a 'pipe' | sign.

mceclip1.png

This when synchronized will give the same as if it was a , sign like default.

The other field is this Row Delimiter field.

mceclip2.png

So if your file doesn't use CR LF as the row delimiter, you can add it here.

2nd option

http://mysafeinfo.com/api/data?list=moviesbyearnings2005&format=csv&delimiter=|

It still needs to be saved in that folder in C:/Flat Files/Movies2005.csv. Additionally, you need to create a file called schema.ini. It has to be named exactly that, or it wont work.

You can read about the various parameters by googling csv and schema.ini, but for this we need it setup like this.

[Movies2005.csv]
Format=Delimited(|)

First off you need to have the full file name as the first line in square brackets. Then you need to add each change to the file that is not default, mainly delimiters and such.

Then when you synchronize it looks at this file and finds the correct delimiter. This also works for multiple files, they just need to be added below with the same setup.

Like this for a folder.

[Company.csv]
Format=Delimited(;) [Customer.csv]
Format=Delimited(;) [Date.csv]
Format=Delimited(;) [Item.csv]
Format=Delimited(;) [Sales Invoice Line.csv]
Format=Delimited(;)

Are you connecting to files on a FTP, they also need the ini file added to its folder.

Getting the data

After setting up the provider, all needed is to synchronize/read the data source.

After that you can set up incremental load and make data type overwrites.

If your data doesn't come out as you expecting, go through the CData guide and see if you can turn on a setting that will give you the correct look.

Aggregate multiple files into one

A new feature of CSV is the option to aggregate files in a folder into one.

I have this folder with some files that contain data from a specified month.

Other than that they are exactly the same. This is important for it to work, they have the exact same field names in all of the files. If not there are some options which is present in the issues segment.

I connect to the files and use the following settings.

mceclip0.png

URI: points to the folder where the files are stored. As described before, this could also be a folder on a FTP/SFTP server.

Aggregate Files: Needs to be set to True for the CSV provider to attempt to merge the files.

Specifically for this I also change the default Row Scan Depth to 10000. This is because I had no decimal points in a field for at least 100 rows, but there where one in a row above that number.

When I synchronize the data source it will generate a table called AggregatedFiles.

mceclip2.png

It contains data from all 10 files with the URI field showing the file name.

mceclip3.png

Issues

If you cannot control that the field names always is the same, you can set it to ignore the field names and skip the 1st row in each file. Such as below.

mceclip5.png

Another option is to point to a specific Metadata file in the Metadata Discovery URI field and pull the info from there.

The above settings will give this. Where all field names is called Col1 and so forth.

mceclip6.png

Dates are shown in the wrong timezone

Sometimes your datetime fields will have added or subtracted hours. This is due to it being converted to your timezone. The CData provider expects all dates will come in UTC and add hours according to the settings on your computer. It even knows about summertime.

If you do not want this to happen, you need to add this to the other field

mceclip0.png

ConvertDateTimeToGMT

Was this article helpful?
0 out of 0 found this helpful

2 Comments

  • 1
    Avatar
    Hansen, Mikkel Hecht DK - NOH

    Hi can you create a connection to a file folder and only include some but not all of the files in the folder, using some kind of wildcard, Regular expression or other?  like File_Actuals_*.CSV or like?

  • 1
    Avatar
    Nils Östborg

    Good question, Mikkel! And also, can you get the file name as part of the data? Other things like file size and OS timestamp? Thanks

Please sign in to leave a comment.