Skip to main content

You can use this provider to connect to Excel files. It can connect to multiple files at once and merge them based on specific options such as files, sheets or both. There also is options making it possible to connect to setups that is not straight forward, using table definitions, culture settings and more.

Configuration manual

The following settings can be used with the TimeXtender Excel data source connector.

Connection settings

These settings relate to the connection to where the Excel file is located.

Location

Drop down box where you can choose what type of location your Excel file is located at. 

The following location types are supported:

  • Local file or folder
  • Azure Blob Storage
  • AWS S3 Bucket
  • SharePoint or OneDrive
  • Google Cloud Storage
  • SFTP

Local file or folder

  • When the file is locally stored on some drive.

Azure Blob Storage settings

AWS S3 Bucket settings

Currently the connector only supports connecting with Access Key Id. You can learn more in AWS documentation.

  • AWS region: the related region string (eg. eu-west-2)
  • AWS access key ID: access key id to S3 bucket
  • AWS secret access key: secret access key to the S3 bucket
  • AWS bucket: the bucket name

SharePoint or OneDrive settings

Note: If pointing to a specific Excel file in SharePoint or OneDrive site root folder then enter path as filename.xlsx. If you would like to extract all the files in the SharePoint site then enter path as /. If you would like to extract excel files from a specific folder in the SharePoint site then enter the folder name. If it is a nested folder then enter the following path: foldername1/foldername2. If you would like to point to a specific file in a nested folder in a SharePoint site then enter the following path: foldername1/foldername2/filename.xlsx

  • SharePoint/OneDrive client ID: The Client ID of the app you use for access. The app used needs the following granted Application permissions Files.Read.All and Sites.Read.All. To be sure you have enough rights you may also add Group.Read.All and User.Read.All.
  • SharePoint/OneDrive client secret: The secret code that is set for the app used.
  • SharePoint/OneDrive Tenant ID: The tenant id of the company that hosts the SharePoint site.
  • SharePoint/OneDrive Drive ID: The drive id pointing to the Document Library you connect to. You can find these by using Postman and by looking at the Graph API collection Graph Fork. You can use the app to connect to this and locate the SharePoint folder under Application. In there you have two requests, https://graph.microsoft.com/v1.0/sites and https://graph.microsoft.com/v1.0/sites/{{SiteID}}/drives. Use sites to find the site ID and use the SiteID to find the Drive ID.

Google Cloud Storage settings

The connector supports GCM authentication with service account keys. You can learn more about them here: Create and delete service account keys  |  IAM Documentation  |  Google Cloud

  • Google credential file: Path to the Service Account private key (JSON or P12)
  • Google storage bucket name: the bucket name

SFTP settings

The connector supports SFTP authentication with password or a public key file. It is mandatory to use one of these methods.

  • SFTP host
  • SFTP port
  • SFTP user name
  • SFTP key path

Path

When “Location” is set to “Local file or folder” you can here provide the path to where the Excel file(s) are located. If a path to a folder is provided all files in the folder and if the option “Include sub-folders” is set then all files in the subfolders as well are ingested.

Include sub-folders

As mentioned above this option defines if sub-folders are also traversed to find Excel files to ingest.

Metadata URI

Here you can provide a separate path to where to read metadata from. It has the same behavior as Path and will read single or multiple files and also include/exclude subfolders, but is only used for metadata calls.

Excel settings

These settings relate to the behavior of the Excel reader and how to treat the content of the Excel documents.

Set first non-empty row as column names

Check if your Excel documents have header records. This will then use the first non-empty record to set the names of your resulting columns. If unchecked generic column names will be generated. E.g. “Column_1, Column_2, …”

Table definitions

Here you can provide a comma-separated list of table definitions to ingest from your Excel document(s). A table definition has the following structure:

<Table name>=<Sheet name>!<Starting cell>:<Ending cell>

E.g. “Table1=Sheet1!E6:H9”

The table will then be ingested with data from within the range provided in the definition. The definition is also dependent on the option “Has header record”, if checked the first record is used for column names. Aggregation options will also affect defined tables.

It is also possible to provide wildcards for some of the parts in the definition. A wildcard is defined by using a “*” star symbol. You can provide wildcards for “Sheet name”, this will then ingest data from all sheets within the range defined. This data is not aggregated unless aggregation options are selected. A wildcard can also be provided for the row part of the “Ending cell” part. E.g. “H*”. The reader will then read all rows within the defined columns.

Example of a table definition using wildcards: “Table1=*!E6:H*”

Only use defined tables

When checked only tables defined using table definitions will be ingested. If not checked all tables defined using table definitions will be ingested along with the whole sheet(s) tables from the file(s).

Aggregation

Here you can choose the type of aggregation you want to apply when ingesting Excel document(s). The options are:

  • None (All files & sheets in the path will be loaded as separate tables)
  • Files (Matching sheetnames and/or table definitions will be merged across files)
  • Sheets (All sheets and/or table definitions within files are merged)
  • Files & sheets (A combination of Files and Sheet aggregation)

None – No aggregation is applied and all tables are returned with unique names using the following naming structure:

<filename>_<sheet name> or <filename>_<sheet name>_<table definition name>

Files – Tables within files are named using the following naming structure:

<sheet name> and/or <sheet name>_<table definition name>

The data is then aggregated over files by table name.

Sheets – Data is aggregated over sheets and returned as a single table per file plus any table definition tables. Table definitions are also aggregated.

Files & sheets – Same as above, but in the order that Sheets are aggregated first for each file and then files are aggregated, will return a single table plus a single table for any table definitions.

Included sheets

Here you can specify a list of sheet names to select, by applying the names in a comma-separated bracketed list.

Infer datatypes

Check this to infer data types from your Excel documents. If unchecked, all data types will be strings.

Number of rows to infer data types

If you have checked “Infer datatypes”, you can here provide the number of rows to use for inferring the datatypes. This is because there is a small overhead of doing the inference. Setting the number to 0 or a negative number will result in all rows being used for inference.

Culture

Here you can specify the culture of the read Excel files. For example, if the source is different than yours or does not match InvariantCulture. 

Number decimal separator

Here you can provide the number decimal separator used when inferring numeric data types.

Date formats

Here you can specify a custom date format. You can apply more than one by applying them in a comma-separated format within brackets.

Misc behavior

Test connection

Clicking the test connection button has the following behavior:

  • If the path is empty or not set will return a failed connection
  • If a path is provided and the file can not be opened will return a failed connection
  • If a path is provided and the file or folder does not exist will return a failed connection

If none of the above a connection is returned as a success.

This means that we do not check whether the file or folder contains files that are Excel documents or if they are corrupted or contain any data.

Seems that path cannot be a UNC pathname? A drive letter does work. I’m I right?

If yes something for improvement?


Another question. It looks like that meta data is required. Is it?

In general it would be good to know what is required and what is optional.


@Erik de Vries please submit a product idea for UNC path name

the metadata URI is not required. I am able to connect to a local workbook simply by specifying the path property

 


I am connecting to a SharePoint environment that contains a folder with multiple Excel files in various structures. According to the documentation, I should be able to connect to this folder:

"Note: If pointing to a specific Excel file in SharePoint or OneDrive site root folder then enter path as filename.xlsx. If you would like to extract all the files in the SharePoint site then enter path as /. If you would like to extract excel files from a specific folder in the SharePoint site then enter the folder name. If it is a nested folder then enter the following path: foldername1/foldername2. If you would like to point to a specific file in a nested folder in a SharePoint site then enter the following path: foldername1/foldername2/filename.xlsx

Currently, I can only connect to a specific file within the folder using the path format:
foldername1/foldername2/filename.xlsx.
However, all other options fail, and I receive an error message:

Exception Type: System.IO.InvalidDataException
Message: End of Central Directory record could not be found.
             Stack Trace: at System.IO.Compression.ZipArchive.ReadEndOfCentralDirectory()
                          at System.IO.Compression.ZipArchive.Init(Stream stream, ZipArchiveMode mode, Boolean leaveOpen)
                          at Sylvan.Data.Excel.XlsxWorkbookReader..ctor(Stream iStream, ExcelDataReaderOptions opts)
                          at Sylvan.Data.Excel.ExcelDataReader.Create(Stream stream, ExcelWorkbookType fileType, ExcelDataReaderOptions options)
                          at TimeXtender.ODX.ExcelCustomDataSource.ExcelEngine.GetDataTables()
                          at TimeXtender.ODX.ExcelCustomDataSource.ExcelDataSource.GetTables(DataSourceQueryTableSettings _, Func`3 tableHasFilterMatch, Func`4 columnHasFilterMatch)
                          at DataSourceEngine.Custom.CustomSourceEngine.SynchronizeSource(List`1 existingTableModels, IDataStorageEngine dataStorageEngine)
                          at ExecutionEngine.Action.SynchronizeAction.Execute()
                          at ExecutionEngine.Action.ExecutionAction.<.ctor>b__11_0()
                          at System.Threading.Tasks.Task.Execute()

The documentation indicates that connecting in other ways should be possible. Could you provide more detailed examples or clarify the expected outcomes for these scenarios?


@DylanW. Which version of TDI and the data source are you using?

I am using TimeXtender Excel data source 22.0.0.0 and TDI 6822.1 and as shown below I can see multiple excel files when specifying my nested sharepoint folder (I tried both with “CSV/MyXL” and “CSV/MyXL/” and both worked) and then executing the import metadata task

Do you notice any differences between our setups? Did you execute the import metadata task?

 


@Christian Hauggaard TimeXtender Excel data source 22.0.0.0 and TDI  6766.1.
my settings:
 

Do you mean synchronize metadata with import metadata task?

Only difference is, the Victa folder is somehow  locked in sharepoint:

 


@DylanW. first you must execute the import metadata task to extract the latest metadata (e.g. new column info)

Then you synchronize (in the metadata manager) this will compare the new metadata that was imported, to the metadata currently committed. Once you have reviewed/made changes click OK to commit the changes based on the new metadata.

Therefore please run the import metadata task

Then synchronize metadata

Do you see the files now?

I have tried to reproduce the lock icon on my sharepoint folder by stopping the inheritence of permissions under manage access advanced settings for the folder, which search results online suggest but I still do not see the lock icon. Can you please test with a folder without the lock icon?


@Christian Hauggaard  Should this option be available in 6766.1?

Isn't the synchronize task doing exactly the same?


@DylanW. No please see the release notes

“Redesigned metadata synchronization and table selection: We've completely reimagined how you manage metadata and select tables in the Ingest layer. With these changes, we aim to make it easier to combat schema drifts, e.g. when you change data source providers, and put you in firm control of what goes into your Ingest storage. 'Synchronize Tasks' are now known as 'Metadata Import Tasks' and will no longer do a full synchronization of the data source. Rather, it will import the metadata from the data source and store it in the data storage of the Ingest. The Data Source Explorer has become the Metadata Manager and is now the place for synchronizing data sources - selecting and mapping tables in the data source to tables in the Ingest storage - all based on the metadata imported by the Metadata Import Tasks.“

Therefore the functionality for data source ingestion is different now. Please let me know if you have any follow up questions


@Christian Hauggaard TimeXtender Excel data source 22.0.0.0 and TDI  6766.1.
my settings:
 

Do you mean synchronize metadata with import metadata task?

Only difference is, the Victa folder is somehow  locked in sharepoint:

 

Hi ​@DylanW. 

Are you adding Documenten as a folder?
It is not a folder it is the Document Library. You can see the document libraries in the top.
 

The Document Library is the Drive ID. The folders is the level below that.

The path would be:

iDas Volvo Dealers\Dealers\Victa


@Thomas Lind  Hi Thomas, 

No I am not adding Documenten as a folder.

My path:

iDas Volvo Dealers\Dealers\Victa(same as yours and Christian's)

If I add the filename to the path it works fine.

Extraction of multiple excel files does not work.

I will update timextender as ​@Christian Hauggaard  suggested and try again! 

Or do you have a different solution?

 


Hi ​@DylanW. 

I see in the image that it is set to 

iDas Volvo Dealers/Dealers/Victa

Try to change the / to \ if it isn’t already changed.


If you still use 6766.1 it may be due to you checking the Include sub-Folders part. I can’t replicate the issue in my 6823.1 release, but I think it may have been a bug earlier if there were no sub level in the folder where you were looking.
I have these files

I made this setup to get these files.

 


Reply