Follow

Connecting to Data Sources

Connecting to Data Sources

Note: This chapter does not apply if you use an ODX server. Instead of connecting to data sources through the business unit, you should connect to data sources through the ODX server.

Data sources contain the data that you want to load into your data warehouse and use for analysis. Discovery Hub supports two different approaches for connecting to source systems: simple data sources and intelligent application adapters.

A data source connector simply connects to the source and enables you to browse the content of the source. The current version of Discovery Hub connects to the following data sources:

In addition to that, Discovery Hub can connect to other data sources using one of the following approaches:

An adapter is a component that enables you to easily extract and synchronize data from different source systems. The adapter knows how a given system organizes and stores data, which enables the adapter to simplify the table structure you see in Discovery Hub.

For instance, data for each company in a Dynamics NAV system is stored in a separate set of tables. The Dynamics NAV adapter merges these tables together and lets you select companies on a global level.

Discovery Hub includes Application Adapters for the following systems:

Data Extraction

Once you have added and configured a data source, the next step is to select the tables and fields you want to copy into the staging database. For more information, see Moving Data from a Data Source to a Staging Database.

Selection templates offer another way of selecting what data to copy into the staging database. For more information, see Selection Templates.

This section covers the settings for data extraction associated with data sources in general.

Integration services

On most data sources, you will find the setting Use Integration Services For Transfer that determines how Discovery Hub transfers data between a source and a destination table. You have the following options:

  • As Parent: The setting will be taken from the project setting 'Use Integration Services'.
  • Yes: A SQL Server Integration Services (SSIS) package is used. This requires that the SQL Server component Integration Services is installed on the machine that deploys and executes the tables.
  • No: ADO .NET is used. This does not require any SQL components except SQL Server Management Objects.

Using SSIS packages for transferring data is generally considered to be faster for transferring large amounts of data. When transferring data between tables with fewer records, using ADO .NET can sometimes be faster, as it takes time to load the SSIS packages from the SQL Server where they are stored before data transfer can begin.

It takes significantly longer to deploy SSIS packages than it does to deploy ADO .NET transfer. This setting can have a great impact on the overall deployment time of a project.

SSIS and ADO .NET use different technologies to transfer data. This means that if you get erroneous data through SSIS, you can sometimes get correct data using ADO .NET.

Simple Mode

Simple mode is a setting on tables on business units aimed at maximizing performance when you need to copy large amounts of data into a staging database to create an exact copy. When a table is in simple mode, everything but the most basic functionality is disabled:

  • Tables in simple mode do not support field transformations, field validations, conditional lookup fields.
  • Tables in simple mode only have the valid instance of the table unless incremental load is enabled.

Per default, a data source inherits the simple model setting from the business unit, but you can override the setting on the data source and on individual tables on the data source.

To enable simple mode for a data source

  • Right click the data source, click Data Source Settings and click Enable under Simple mode.

Guarding a Data Source

You can guard a data source in Discovery Hub, which prevents tables that get their data from the data source from being deployed, executed or both. In general, the guard feature is useful for tables that contain data that never changes, e.g. from a legacy system. You can also guard a single table. See Guarding a Table for more information.

To guard a data source

  • Right click the data source and click Data Source Settings. Select Guard on deployment to prevent Discovery Hub from deploying the table and/or Guard on execution to prevent Discovery Hub from executing the table.

Limiting Concurrent Transfers on Data Source

You can put a limit on the number of concurrent transfers from a specific data source. Some data sources can only handle a certain number of transfers before adding more transfers will actually slow down the overall performance of the transfer rather than speed it up.

To limit the amount of concurrent transfers from a data source

  • Right click the data source, click Data Source Settings and enter the allowed number in the Max concurrent transfers box. "0" equals unlimited.

Allowing a Data Source to Fail

If you have some source systems in your solution that are less than critical for your reporting, you can configure your solution so that the entire execution does not stop just because Discovery Hub cannot reach these noncritical systems. Instead, you can choose to keep the newest data from the system in question until fresh data can be fetched.

Follow the steps below to allow a data source to fail.

  1. Right click the data source and click Data Source Settings.
  2. Under When transfer fails, click the option you want to use if the transfer should fail. You have the following options:
    • Fail and stop the execution: Stops execution and reports the execution as failed (default setting).
    • Continue execution without data: Continues the execution, pretending that the source contains no data.
    • Continue with existing data: Continues execution, retaining the existing data from the source.

Note: If you use additional connectors, you need to configure this setting on the additional data source as well as the template data source.

If a data source, that you have allowed to fail, fails during execution, the following execution message will be recorded: “Execution was successful, but one or more data sources failed".

Filtering What Objects to Extract

For large data sources, the amount of tables and fields can make it hard to maintain an overview when you select what data you want to use in your project. To enable you to limit the number of objects brought into your project before the data selection stage, Discovery Hub contains a data extraction filter.

The data extraction filter is available on most data sources. To apply a data extraction filter, follow the steps below.

  1. Right click the relevant data source, click Edit [data source] and click Data Extraction Settings
  2. Under Objects to extract clear the checkboxes next to the object types you do not want to extract.
  3. Under Object filtering, click Add to add a rule that tables and views must pass to be included in the extraction. For instance, a "like: sales" rule in the schema column means that only tables and views in the "sales" schema will be included.
  4. Click OK to close the window and click OK in the Edit [data source] window to save the filter.
  5. Right click the data source and click Synchronize Data Source to apply the filter.

Disabling the Column Cache

When you select data for extraction from a source, you can filter, sort and group the available tables and columns to find the exact objects you are looking for.

With large data sources, loading the meta data for this can be time and memory consuming. Discovery Hub contains an option to toggle the caching of columns to cut the number of objects that are loaded and cached down to a manageable size. The downside is that you can then only filter on table names, not column names.

To disable the column cache

  • Right click on the source, click Data Source Settings and clear Enable full column caching under Data Extraction.

Note: This option is not available on Dynamics NAV and Dynamics AX adapters and text file data sources.

Data Type Overrides

You can override the data type of fields on the data source level. While you can choose the data type for each individual field in the data warehouse it is easier to do on the data source level if you use the same field more than once in the data warehouse.

Data type overrides are implemented as rules that can be added and ordered to create a hierarchy. For each field, Discovery Hub goes through the rules and applies the first rule that matches if there is any that do.

Under the data source in the tree, a field's original data type is displayed, while the overridden data type is displayed anywhere else.

Adding a Data Type Override to a single Field

To add a data type override to a single field on a data source, follow the steps below.

  1. Right click the field and click Override Data Type. In the Converts to list, click on the data type you want to use and then adjust any settings for the data type you have selected.
  2. Click OK.
  3. Right click on the data source and click Synchronize Data Source to apply the overrides.

Note: This creates a rule that matches the schema, table and field name of the field. If one of these names change, the rule will no longer match.

Adding a Data Type Override Rule

If you want to add a data type override that matches more than a single field, follow the steps below.

  1. Right click a data source, click Data Source Settings and click the Data type override tab.

  2. Click Add. A new rule is added to the list.
  3. Under Convert from group, choose what criteria a field should fulfill to match the rule. You can add criteria on schema, table and field name as well as the data type. In the list of operators for the data type, you can click on "<Unknown>" to match data types that are not compatible with SQL Server.
  4. Under Convert to, click on the data type you want to convert the matching fields to in the Data type list. Adjust any settings for the data type you have selected.
  5. Click OK.
  6. Right click on the data source and click Synchronize Data Source to apply the overrides.

Ordering data type overrides

Data type overrides are processed for each individual table. The first override from the top that matches the field is applied and any other matches are ignored. Follow the steps below to order the overrides.

  1. Right click a data source, click Data Source Settings and click the Data type override tab.
  2. Click on the override you want to reorder in the list and click on Move Up or Move Down to move the override up or down respectively.

Note: The default rule that matches any unknown data types cannot be reordered. However, you can edit the data type it converts to. The purpose of this rule is to ensure that all fields end up with a data type SQL Server can handle.

Query Tables

When you connect to a data source in Discovery Hub, you can simply use the read objects feature to list the content of the source and pick the tables and fields you want to use in your solution.

However, some AnySource providers allows you to connect to a data source, but cannot list the contents of it. In other cases, it is simply useful to be able to create a table, that does not already exist on a data source, from a query. To enable you to use these data sources in Discovery Hub, we have included the Query Tables feature. While the SQL behind ordinary tables is created by Discovery Hub, you write the query that brings query tables to life.

The following data sources and adapters support the query tables feature:

  • AnySource adapter
  • Dynamics AX adapter
  • Dynamics NAV adapter
  • Oracle data source
  • SQL data source

Adding Query Tables

To add a query table, follow the steps below.

  1. Connect to a data source using one of the supported data sources or adapters.
  2. Right click the data source, click Data Source Settings and the click the Query Tabletab.
  3. Click Add. A new table is added to the list.
  4. In the Name box, type a name for the table.
  5. (Optional) In the Schema box, type a schema to use.
  6. In Query, enter the query you want to use for creating the table. The query should contain a SELECT statement and follow the syntax required by the source.
  7. Select Subquery needed if you are using an alias in your query. Otherwise, selection rules will fail.
  8. Repeat step 3-6 to add the tables you need and click OK.
  9. Right click the data source and click Read Objects from Data Source. The tables are listed in the panel in the right-hand side of Discovery Hub and can be included in the project like any other table.

Handling Accounts in Dynamics NAV

When you create query tables for Dynamics NAV, you will have to consider how you handle accounts.

To get data from one account, remember the account in the FROM part of your statement:

SELECT * FROM [dbo].[MyCompany$MyTable]

To get data from multiple accounts, in the same way the Dynamics NAV Adapter does it, you can use placeholders:

SELECT * {0} FROM [dbo].[{1}$MyTable]

Discovery Hub will replace the digits in curly brackets during execution to create the following statement for each account:

SELECT
* ,CAST(‘MyCompany’ AS nvarchar(30)) AS (DW_Account)
FROM
[dbo].[MyCompany$MyTable]

Template Data Sources

If you have more than one data source with an identical data structure, you can connect all of them together and use them as if they were one and the same. This feature is known as template data sources.

Adding an Additional Connection

Once you have added the first of the identical data sources as usual, you can add the additional data sources to the first data source. Since all of the data sources will have a similar data structure, it does not matter which one is added first.

To add an additional connection, follow the steps below.

  1. Right-click a data source and click Add <data source> Data Source.

    An Add <data source> Data Source window appears.
  2. Configure this second data source with the necessary parameters. This additional data source will then appear under the original data source in an Additional Connections folder.

When tables and fields are added to the data source, these changes will seamlessly be applied to all of the data sources that under Additional Connections.

The "DW_SourceCode" field on tables on the data source contains the name of the source from which the row was copied. If you have a setup like the one displayed above with two sources, "SQL" and "SQL2", a table on this source would look something like the list below.

Selection Templates

Once you have selected all the tables you need from a data source, added primary keys and set up incremental load, you can export all this information to a selection template. The template can then be applied to another data source in the same or another project. The data sources do not need to be identical.

Selection templates are stored in XML format.

Exporting a selection template

To export a selection template

  • Right click a data source, click Selection Template and click Export. In the window that appears, chose a file name and folder and click Save.

The tables and fields selected, the primary keys chosen and the incremental load setup is saved in the exported file.

Importing a Selection Template

To import and apply a selection template to a data source, follow the steps below.

  1. Right click the data source, click Selection Template and click Import. The Import Selection Template window opens.
  2. Click Browse to find and select the template to import.
  3. Under Apply the following details from the template, remove the check mark from any parts of the template you do not want to apply. If you remove the check mark from Tables and fields, but keep Primary keys and/or Incremental Load Setup selected, these settings will be applied to the tables and fields already selected from the data source.
  4. Click OK.

Changing Data Source Providers

If you have moved your data to a new database, e.g. from Oracle Database to Microsoft SQL Server, you do not have to add a new data source to reflect this. Instead, you can change the type of your existing data source in Discovery Hub with the change provider feature.

To change the provider for at data source

  • Right click the data source, click on Change Provider and click on the database you want to change provider to. A connection settings window appears. Refer to the section on the relevant data source in this user guide to learn more about the settings.
Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.