Skip to main content

This is a standard reference architecture to implement TimeXtender Data Integration fully in Azure, the goal is to balance performance and cost, when working in the cloud.

To prepare your TimeXtender Data Integration environment in Azure, here are the steps we recommend.

 

1. Create Application Server - Azure VM

To serve the TimeXtender Data Integration application in Azure, we recommend using an Azure Virtual Machine (VM), sized according to your solution's requirements.

Guide: Create Application Server - Azure VM

Considerations: 

  • Recommended Sizing: DS2_v2 (for moderate workloads). See Azure VM Sizes documentation for more detail.
  • This VM will host the below services, and must remain running for TimeXtender Data Integration to function
    • TimeXtender Ingest Service
    • TimeXtender Execution Service

2. Create Storage for Ingest instance - Azure Data Lake Storage Gen2

ADLS Gen2 is highly performant, economical, scalable, and secure way to store your raw data.

Guide: Create Storage for Ingest instance - Azure Data Lake Storage Gen2

Considerations: 

  • When creating the ADLS Gen2 data lake service, you must enable Hierarchical Namespaces
  • TimeXtender Data Integration writes files in Parquet file format, a highly compressed, columnar storage in the data lake.
  • It is possible for Ingest instances to store data in Azure SQL DB (rather than in a data lake), but this adds cost and complexity but no additional functionality
  • When using Azure Data Lake for Ingest instance storage and SQL DB for the Prepare instance storage, it is highly recommended to use Data Factory to transfer this data
  • ADLS will require a service principle, called App Registration in Azure, for TimeXtender Data Integration to access your ADF service. 
    • Both Data Lake and ADF, may share the same App Registration if desired. 

3. Prepare for Ingest and Transport - Azure Data Factory (optional)

For large data movement tasks, ADF provides amazing performance and ease of use for both ingestion and transport.

Guide: Prepare for Ingest and Transfer - Azure Data Factory (recommended)

Considerations: 

  • When creating ADF resources use Gen2, which is the current default
  • A single ADF service can be used for both transport and ingestion
    • Ingestion from data source to Ingest instance storage
    • Transport from an Ingest instance to a Prepare instance
  • The option to use ADF is not available for all data source types, but many options are available.
  • ADF Data sources do not support Ingest Query Tables at this time. 
  • ADF's performance can be quite costly for such incredible fault-tolerant performance
  • ADF will require a service principle, called App Registration in Azure, for TimeXtender Data Integration to access your ADF service. 
    • Both Data Lake and ADF, may share the same App Registration if desired. 

4. Create Storage for Prepare instance - Azure SQL Database

With its ability to auto-pause, Azure SQL Single DB Serverless is a great, potentially cost-saving option for the Prepare instance storage.

Guide: Create Storage for Prepare instance - Azure SQL DB

Considerations: 

  • Recommended SQL Single DB (vCore - General Purpose) sizing:
    • Serverless -  Min 10 vCores - Max 10 vCores
    • Data Max Size - 50 GB
  • When Serverless is conducting load balancing, it will drop connections, therefore, when using Serverless for your Prepare instance, you want to set the Min vCores and Max vCores to the same level
  • The Serverless compute tier can be a cost saving option, if you do not require your database to be online more that 50% of the time. If you do require >50% uptime for the Data Warehouse, then the Provisioned compute tier will be more economical. 

5. Configure Power BI Premium Endpoint (Optional)

If you have Power BI Premium, deploy and execute Semantic Models within Deliver instances directly to the Power BI Premium endpoint.

Guide: Configure PowerBI Premium Endpoint (Optional)

6. Estimate Azure Costs

Balancing cost and performance requires monitoring and forecasting of your services and needs.

Guide: Estimate Azure Costs

Considerations: 

  • Azure provides a pricing calculator to help you estimate your costs for various configurations

Note: this Azure pricing calculator does not include the cost of TimeXtender instances and usage

Quick question: is this reference architecture still valid for the current, non-legacy TimeXtender (TimeXtender Next Gen)?


@RLB yes this reference architecture is up-to-date and aligned with the latest version. 


Thank you Christian. It's the answer I was hoping to get.


Nice overview of steps!

Two questions though, the storage account (Azure Data Lake gen2) in this manual is created with a public endpoint which is not the most secure option as the account is connected to the Internet. Would it be possible to create it by using private link / endpoint only. In that way it would only be accessible from within the Azure environment and on-prem (when VPN Gateway or ExpressRoute configured). But it would no longer have an Internet connected endpoint. Any thoughts or issues with that? I guess the issue in that case is the screen within Time Xtender that is used to connect to the storage account; this does not allow connecting through a private endpoint option?!

Second question, would it be possible to use a managed identity (either system-assigned or user-assigned and tied to the application server VM) instead of the service principle for accessing the storage account and ADF? And simple assign this managed identity the correct roles to access both? In my opinion, this would bring the same results, but is more secure (as there is no known secret).


@gjongeneel Thank you for your questions. We do not support use of managed identity currently. However, the use of private endpoints is possible for some infrastructure setups. If you create a Data Lake storage account that is setup to use private endpoints, then you should be able to create and use a container within this storage account. If you have further questions regarding how to set this up, then please submit a question.


@Christian Hauggaard Many thanks for your answers!

Are there any plans to support managed identity in the (near) future? It would greatly improve security and Microsoft often recommends it as a best practice. Specifically if one Azure resource needs to connect to another Azure resource.

What would be the modified configuration steps when defining the storage for the ODX instance if we wanted to connect to the Data Lake storage account through private endpoint (Step 4 in Use Azure Data Lake Storage for your ODX Instance | Community (timextender.com))? What other components could be set up with a private endpoint within the Azure SQL Database Reference Architecture?


@gjongeneel i don't know if you are still curious or not, but I've come across the same question about private endpoint when setting up our Azure environment for TimeXtender.

We've found out the following:

  • For Azure Data Lake Storage, you need to configure your private endpoint on dfs sub-resource, not blob. The ODX service looks to connect directly to the DFS url. If you want to use Azure Storage Explorer to browse the data, a private endpoint for blob is needed as this software connects to the blob endpoint.
    Also make sure when connecting from on-premise, you've setup the DNS conditional forwarders as mentioned in Microsoft's documentation.
  • For Azure SQL Database, configuring the endpoint is very straight forward. It worked ‘out-of-the-box’ for us. Also here is some extra configuration needed when connecting from on-premise.

Hope this helps you or anyone in the future.

Cheers!


Hi all,

 

just a few things for Private Endpoints:

  • For ADLS you need two endpoints: both blob and dfs. If you want to use the Portal built-in viewer on the data in your blob. You will also need enough rights to do this, see: https://learn.microsoft.com/en-us/azure/storage/blobs/assign-azure-role-data-access?tabs=portal#assign-an-azure-role
  • If you start using ADF you will need to generate Managed Private Endpoints to your ADLS and Azure SQL instances from there (ADF Studio). Those MPEs need to be authorized from the resource you are generating them for
  • Don't forget to turn off all public networking to ensure all connections run through your VNET only. This means that the Azure Explorer in Portal will also be only reachable from machines that are inside the VNET.

Reply