Follow

Reference Architecture with Azure Synapse SQL Pool

Leverage Azure Synapse SQL Pool for big data with TimeXtender

mceclip1.png

 

This is a reference architecture to implement TimeXtender for MDW Storage using Azure Synapse Dedicated SQL Pool, for maximum performance as data becomes very big (for example, when data is at least 1 TB, or with tables of more than 1 billion rows). Dedicated SQL Pool uses Massively Parallel Processing (MPP) architecture which distributes processing across multiple compute nodes, allowing for very performant analytics queries. For more information, please see "When should I consider Azure Synapse Analytics?", in the article What is Azure Synapse Analytics?.

 

(Please note that a similarly named Azure Synapse service, Serverless SQL Pool, cannot store data but is only used for high performance, low-cost queries on the Azure Data Lake Storage (Gen2) resource associated with the Synapse resource. For more information, please see Query ODX Parquet files with Azure Synapse Workspace.)

 

To prepare your TimeXtender environment in Azure, here are the steps we recommend.

  1. Create Application Server - Azure VM
  2. Create Project Repository - Azure SQL DB
  3. Create ODX Storage - Azure Data Lake Storage Gen2
  4. Prepare for Ingest and Transport - Azure Data Factory (recommended)
  5. Create MDW (and DSA) Storage - Azure SQL DB
  6. Create Azure Analysis Services Endpoint (Optional)
  7. Estimate Azure Costs

 

mceclip1.png

1. Create Application Server - Azure VM

To serve the TimeXtender application in Azure, we recommend using an Azure Virtual Machine (VM), sized according to your solution's requirements.
Guide:

Create a TimeXtender Application Server in Azure

Considerations:
  • Recommended Sizing: DS2_v2 (for moderate workloads)
  • If Azure VM (App Server) serves the ODX Server, it must remain running for TimeXtender to run.
  • This VM will host the services to run TimeXtender.
    • ODX Service
    • Scheduler Service
    • Server Services

mceclip2.png

2. Create Project Repository - Azure SQL DB

When you save a project, metadata is written to the repository database, and when you open a project, this metadata is read from the project repository and presented in the UI.
Guide:

Use Azure SQL Single DB with TimeXtender

Considerations:
  • Recommended SQL Single DB (vCore - General Purpose) Sizing:
    • Provisioned - Min 2 vCores
    • Data Max Size - 50 GB
  • You may increase the vCores to decrease the time it takes to save a TimeXtender project. 
  • One Project Repository can contain multiple projects
  • Each environment requires a separate Project Repository database.

mceclip6.png

3. Create ODX Storage - Azure Data Lake Storage Gen2

ADLS Gen2 is highly performant, economical, scalable, and secure way to store your raw data.
Guide:

Use Azure Data Lake Storage with TimeXtender

Considerations:
  • When creating the ADLS Gen2 data lake service, you must enable Hierarchical Namespaces
  • TimeXtender writes files in Parquet file format, a highly compressed, columnar storage in the data lake.
  • It is possible for ODX Server to store data in Azure SQL DB, but this adds cost and complexity but no additional functionality
  • When using Azure Data Lake for ODX and SQL DB for the Data Warehouse, it is highly recommended to use Data Factory to transport this data
  • ADLS will require a service principle, called App Registration in Azure, for TimeXtender to access your ADF service. 
    • Both Data Lake and ADF, may share the same App Registration if desired. 

mceclip0.png

4. Prepare for Ingestion - Azure Data Factory (recommended)

For large data movement tasks, ADF provides amazing performance and ease of use for both ingestion and transport.
Guide:

Use ADF for Data Movement with TimeXtender

Considerations:
  • You cannot use ADF for transport with Synapse, since PolyBase is best practice and is used automatically by TimeXtender.
  • When creating ADF resources use Gen2, which is the current default
  • A single ADF service can be used for both transport and ingestion
    • Ingestion from data source to ODX Storage
  • The option to use ADF is not available for all data source types, but many options are available.
  • ADF Data sources do not support ODX Query Tables at this time. 
  • ADF's performance can be quite costly for such incredible fault-tolerant performance
  • ADF will require a service principle, called App Registration in Azure, for TimeXtender to access your ADF service. 
    • Both Data Lake and ADF, may share the same App Registration if desired. 

mceclip2.png

5. Create MDW Storage - Azure Synapse SQL Pool (SQL DW)

As your organizations data grows and performance is a key consideration Azure Synapse SQL Pool, a massively parallel processing database, can be a great option for your data warehouse storage at scale.

Guide:

Use Azure Synapse SQL Pool (SQL DW)

Considerations:
  • Recommended Synapse Use Case Criteria:
    • More that 1 TB of data
    • Tables with at least 1 billion rows
    • DWH tuning is needed to achieve max performance
  • Data tables are split up across 60 distributed nodes, in a process called sharding. There are three types of distributions (Round-robin, Replicated, and Hash), each with ideal use cases.
  • TimeXtender automatically switches to PolyBase for transport when using Synapse for a data warehouse. Note: the user interface will not reflect this change has been made.

mceclip4.png

6. Create Azure Analysis Services Endpoint (Optional)

To serve the Semantic Model in TimeXtender, Azure Analysis Services provides enterprise-grade, scalable performance.
Guide:

Use AAS with TimeXtender

Considerations:
  • Recommended AAS Tier: Developer - D1 (for prototyping and modest workloads, but may not be suitable for production workloads.)
    • There are three tiers and multiple options for various use cases.
  • Like ADF, Azure Analysis Service requires a service principle, called an App Registration, for TimeXtender to connect to the service
  • AAS can be quite costly, though it provides great performance if that fits the solution requirements
  • TimeXtender stores semantic models in Analysis Services Tabluar model behind the scenes.

mceclip3.png

7. Estimate Azure Costs

Balancing cost and performance requires montioring and forecasting of your services and needs.
Guide:

Azure Pricing Calculator*

Considerations:
  • Azure provides a pricing calculator to help you estimate your costs for various configurations.

*Please note, this Azure pricing calculator does not include the price of the TimeXtender License or Consulting services.

 

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.