This guide will cover how to create and add Azure Data Lake storage for the ODX in Discovery Hub.
If you are creating a Discovery Hub Environment from scratch, we highly recommend using one of the supported configuration options for deploying Discovery Hub in Azure.
If you have already deployed one of the Azure Marketplace templates with Azure Data Lake then you already have all of the necessary data lake resources and can skip step 1 and proceed with step 2 to register an application.
If you already have an existing Discovery Hub deployed in Azure, but don't have not yet configured an Azure Data Lake storage account, you can use this guide to add a Data Lake storage option by starting at Step 1.
Complete the following steps to create Azure Data Lake Storage for the ODX in TimeXtender:
- Add Data Lake Storage
- Create an App Registration
- Enable App Registration access to Data Lake
- Add Azure Data Lake Storage in TimeXtender
1. Create an Azure Storage Account
Note: If you already have a Data Lake Storage account, you can skip this step.
- In Azure portal, create a Data Lake Storage account that will be used to host ODX database.
- Azure Portal -> Create a new Resource -> Storage account - blob, file, table, queue -> Create storage account
- Assign Subscription name, Storage account name, Location and other properties.
- Select Account kind = StorageV2 (general-purpose v2 )
- Advanced tab -> set Hierarchical namespace to Enabled
For more details, refer to Microsoft Azure documentation at Creating Azure Data Lake Storage Gen 2
2. Create an App Registration
In order to access the data lake resources from Discovery Hub, you will need to configure an App Registration in the Azure portal.
Note: The following steps for access control describe the minimum permissions required in most cases. In your deployment/ production, you may fine-tune those permissions to align with your business rules and compliance requirements. Refer to Microsoft Azure documentation for details.
- In the Azure Portal menu, click on Azure Active Directory, then click on App Registrations in the menu bar on the left. Then click New Registration.
- Enter a name and select Accounts in this organizational directory only. The value of Redirect URI is the URL at which your application is hosted. Click Register when you are done.
- For the newly added App Registration, select Certificates & secrets to create a New Client Secret. This key is encrypted after save, so it needs to be documented somewhere safe. The secret will appear after you click Add.
3. Enable App Registration access to Data Lake
After the App Registration is created, you need to configure access to Data Lake.
- Go back to the resource group where your data lake resources are located and select the Data Lake storage account resource.
- In the menu bar on the left, select Access Control (IAM) and add a role assignment.
- Add the <App Registration Name> you just created to the role of Storage Blob Data Contributor of the resource.
Note: When you add or remove role assignments, wait for 5 minutes before executing an ODX task. It can take up to 30 minutes for changes to take effect. For more details, review this article Troubleshoot Azure RBAC
4. Add Azure Data Lake Storage in TimeXtender
After the configuration is completed in Azure portal, you may create an ODX Azure Data Lake Storage in Discovery Hub.
In the ODX Server tab, right-click to Add Azure Data Lake Gen 2 Data Storage.
Name: Type the name you want to use for storage
Tenant ID: This is the [Directory ID] found under properties of Azure Active Directory.
Application ID: This is the Application ID of the App Registration created above. This can be found in the Azure portal>Azure Active Directory>App Registrations
Application Key: Use the key (secret) you created under the application.
Account Name: Use the name of your Data Lake Store. The input only needs the name of the resource instead of the entire URL.
Container name: Enter the name of the container (click Create if it is a new container).
(Optional) To add a Databricks account, check the box next to Use Azure Databricks.
Token: Enter the token needed to authenticate with Azure Databricks. Click here to see how to generate a token.
Cluster Name: Enter the name of the cluster you want to use or leave it as the default.
URL: Enter the URL of the Azure Databricks service you want to use or leave it as the default.
**Adding a Databricks account enables the following features:
- The option to handle updated and deleted records on incremental load into the data lake
- Incremental load and selection rules when transferring data for data storage to data warehouse and significantly better performance.
- Direct transfer from data lake to data warehouse
With the information provided above, you should be able to successfully create Azure Data Lake Storage in TimeXtender.