A data warehouse in Discovery Hub is a database on either SQL Server or Azure SQL Database where your data is stored for queries and analysis. Often, a Discovery Hub project consists of one data warehouse where you consolidate data from one of more staging databases and a number of data sources.
During execution of a project, Discovery Hub extracts data from the staging database or ODX data storage and transfers it to the data warehouse. Initially, the data resides in what is known as raw table instances in the data warehouse. Discovery Hub applies data transformations and data cleansing rules and saves the resulting data in valid instances of the tables, ready for queries and analysis.
- In the Solution Explorer, right-click Data Warehouses, and click Add Data Warehouse.
- In the Name box, type a name for the data warehouse. The name cannot exceed 15 characters in length.
- In the Server name box, type the name of the server that you want to store the database on. If it is a named instance, type the server name and the instance name. Click the ellipsis (...) to choose one of the available servers in you local Active Directory, if any.
- In the Authentication list, click the mode of authentication you want to use. You have the following options:
- Windows Authentication: Use the logged-in Windows user's credentials for authentication.
- SQL Server Authentication: Use a login set up on the SQL Server. Enter the username and password in the corresponding fields.
- Azure AD Password Authentication: Use Azure AD credentials from a domain that is not federated with Azure AD. Enter the username and password in the corresponding fields.
- Azure AD Integrated Authentication: Use the logged-in Windows user's credentials for authentication, provided that he is logged in using Azure AD credentials from a domain that is federated with Azure AD.
- In the Database box, select an existing database, or type the name of a new database, and then click Create.
- (Optional) In the Collation list, click on the collation you want to use. You can choose a specific collation or one of the following:
- <Application Default>: Uses the application default, which is Latin1_General_CI_AS.
- <Server Default>: Inherits the collation from the specified server.
- (Optional) In the Connection timeout box, enter the number of seconds to wait before terminating the attempt to connect to the server. Set it to 0 to wait indefinitely.
- (Optional) In the Command timeout box, enter the number of seconds to wait before terminating a command.
- In the Max. rows to copy box, enter the batch size when using ADO.net transfer. '0' equals unlimited.
- (Optional) In the Encrypt connection list, you can enable encryption of the connection, which is recommended when you are not in a private network (e.g. when your server is on Azure). You have the following options:
- No: The communication is not encrypted (default).
- Yes: The communication is encrypted. The server's certificate is verified by a certificate authority.
- Yes, trust server certificate: The communication is encrypted. but the server's certificate is not verified. This setting is not recommended for use on public networks.
- (Optional) In the Direct read list, you can enable direct read from the staging database(s) to the data warehouse database. With direct read enabled, data is transferred using a stored procedure containing a simple SELECT statement. This can, especially if Discovery Hub is not on the same machine as the SQL Server, give better performance than SSIS or ADO.net since transfers using these technologies happen via TimeXtender . For direct read to work, some prerequisites must be met: On SQL Server, the databases need to be on the same server. On Azure SQL Database, the staging and data warehouse databases need to be in the same database. You have the following options for direct read:
- Matching Server: Direct read is used if the data warehouse and staging database server names match.
- Matching Server and Database: Direct read is used if the data warehouse and staging database server names and database names match.
- In the Use SSIS for transfer list click on Yes to enable SQL Server Integration Services data transfer, No to disable it or leave it at As Parent to respect the project setting.
- (Optional) Enter a SSIS server name in the SSIS Server box.
- Click Test Connection to verify that the connection is working.
- Click Advanced Settings to access the advanced settings for the data warehouse. These are all optional.
- If you want to add additional connection strings, enter them in the Connection String Properties box.
- If you are deploying your data warehouse on Azure Synapse Analytics/Azure SQL Data Warehouse, click on SQL Data Warehouse in the Deployment target list. For other versions of SQL Server, the version and edition can usually be auto-detected, and the setting can be left at its default. However, you can use the option to set the deployment target explicitly if you experience issues.Note: If you use Azure Synapse Analytics, some options and settings are not available due to the differences between this and other flavors of SQL Server. However, additional table settings are are available to control Azure Synapse Analytics specific options. The differences are noted when applicable throughout the user guide.
- See Adding a Database Role for an explanation of the Drop role options setting.
- If your SSIS Server is installed under a different name than the database, enter the name in the SSIS Server Name box.
- Select Use Remote SSIS package execution to enable the execution of SSIS packages on a remote server and enter the details for the remote server below. Note that you will need to install the Remote SSIS Execution service on the remote server - see Installing the Remote SSIS Execution Feature
- Under On incremental load using Azure Databricks, select the approach you want to use. You have the following options:
- Write through file: The ODX server downloads files with data and writes it into the data warehouse. If you have data selection and/or incremental load applied, Databricks is used to process the files before the download.
- Write directly: Data is written directly into the data warehouse through Databricks. This avoids the trip around the ODX server, but only works if the data warehouse is on Azure SQL Database.
Cleaning up the Database
To prevent accidental data loss, deleting a table in Discovery Hub does not delete the physical table in the data warehouse or staging database. The downside is that tables deleted in Discovery Hub still takes up space in the database. The SQL Database Cleanup Tool enables you to identify tables left behind by Discovery Hub and delete - drop - them to free up space. Note that database schemas are not deleted from the database. You will need to drop those manually in SQL Server after deleting them in Discovery Hub.
Warning: When you drop a table with the SQL Database Cleanup Tool, it is deleted. Take care when you use the tool.
Identifying and Deleting Tables with SQL Database Cleanup
To clean up your data warehouse or staging database, follow the steps below.
- Right click a data warehouse or staging database, click Advanced and click SQL Database Cleanup Tool. Discovery Hub will read the objects from the database and open the SQL Database Cleanup Tool window.
- The objects in the database that are no longer, or never was, part of the currently opened project are listed.
- (Optional) Right click a table, view, procedure or function and click Script to display the SQL script behind the object.
- Right click a table, view, procedure or table and click Drop to drop the object from the database.
- If the item does not have subordinate items, click Yes when Discovery Hub asks you to confirm the drop.
- If the item has subordinate items, a window will open with a list of the objects that will be dropped. Clear the selection for any tables you want to keep and then click Drop.
Note: Discovery Hub will automatically clear the selection for any incrementally loaded tables to prevent accidental data loss. Discovery Hub will ask you to confirm if you want to drop an incrementally loaded table.
- When you have dropped the all the objects you want to delete from the database, close the window.