This guide will cover how to create and add Azure Data Lake storage for the ODX in Discovery Hub.
STOP! If you are creating a Discovery Hub Environment from scratch, we highly recommend using one of the supported configuration options for deploying Discovery Hub in Azure.
If you have already deployed one of the Azure Marketplace templates with Azure Data Lake then you already have all of the necessary data lake resources and can skip to Step 2 to register an application.
If you already have an existing Discovery Hub deployed in Azure, but don't have the necessary Azure Data Lake services, you can use this guide to add a Data Lake storage option by starting at Step 1.
Complete the following steps to create Azure Data Lake Storage for the ODX in Discovery Hub:
- Add Data Lake Store
- Register Application
- Assign Application Role
- Add Azure Data Lake Storage in Discovery Hub
1. Add Data Lake Store
Note: If you already have a Data Lake Storage account you can skip this step.
1. In Azure portal, create a Data Lake Storage account that will be used to host ODX database. Discovery Hub supports both Azure Data Lake Storage Gen 1 and Gen 2.
2. Register Application
In order to access the data lake resources from Discovery Hub, you will need to register an application.
- In the Azure Portal, click on Azure Active Directory in the left column, then click on App Registrations in the menu bar on the left. Then click New Application Registration
- Enter a name and select Accounts in this organizational directory only. The value of Redirect URI is the URL at which your application is hosted. Click Register when you are done.
- Go to your newly created App and go to API Permissions Settings and click Add a Permission.
- Find Azure Data Lake and select it. Next, under Select Permission check the box that says Have full access to the Azure Data Lake Service
- Click Done to save your changes and go back Certificates & secrets to create a New Client Secret. This key is encrypted after save so the application key needs to be documented somewhere safe. The secret will appear after you click Add.
3. Assign Application Role
After the application registration is created, you need to go back to the previously created Data Lake Store make the app an owner of the resource. If you are using ADLS Gen 2, you will also have to add the app to the Storage Blob Data Contributor role.
- Go back to the resource group where your data lake resources are located and select the Data Lake Store resource. In the menu bar on the left select Access Control (IAM) and add a role assignment.
- Add the app you just created to the role of Owner of the resource.
- Add the app you just created to the role of Storage Blob Data Contributor of the resource. **This role is only required for ADLS Gen 2 storage**
*You must be an OWNER of the resource to add an app as an owner.
5. Add Azure Data Lake Storage in Discovery Hub
After the configuration is completed in Azure portal, we can start to create an ODX Azure Data Lake Storage in Discovery Hub.
In the ODX Server tab, right-click an ODX Server and click either Add Azure Data Lake Gen 1 Data Storage or Add Azure Data Lake Gen 2 Storage depending on the type of storage account you configured.
Name: Type the name you want to use for storage
Tenant ID: This is the [Directory ID] found under properties of Azure Active Directory.
Application ID: Use the application you registered previously. The ID can be found under the registered app.
Application Key: Use the key you created under the application.
Account Name: Use the name of your Data Lake Store. The input only needs the name of the resource instead of the entire URL.
Folder/Container name: For ADLS Gen 1, enter the name of the folder you want to create. This folder doesn’t have to be created in Data Lake Store in advance. The root folder is always Operational data eXchange and this folder name will be under that. For ADLS Gen 2, enter the name of the container you want to create.
(Optional) To add a Databricks account, check the box next to Use Azure Databricks.
Token: Enter the token needed to authenticate with Azure Databricks. Click here to see how to generate a token.
Cluster Name: Enter the name of the cluster you want to use or leave it as the default.
URL: Enter the URL of the Azure Databricks service you want to use or leave it as the default.
**Adding a Databricks account enables the following features:
- The option to handle updated and deleted records on incremental load into the data lake
-Incremental load and selection rules when transferring data for data storage to data warehouse and significantly better performance.
-Direct transfer from data lake to data warehouse
With the information provided above, you should be able to create Azure Data Lake Storage successfully in Discovery Hub.