This article describes the benefits of Ingest Instances. Please see some of the key benefits below:
- Data Sources
- A growing list of providers for various data sources that are regularly updated.
- These data sources can be updated to new versions without having to upgrade TimeXtender to a new version
- Reaping the performance benefits of Azure Data Factory (ADF)
- Access to ADF data sources
- When using a Ingest Instance server, ADF can be leveraged for transferring source data to storage, and to a prepare instance. The data pipeline architecture behind ADF offers improved performance and is highly scalable, meaning that transfer times can be rapidly reduced
- The TimeXtender Ingest Service (TIS) can also be installed on-premise and use on-premise data sources. This is possible because of Self-hosted Integration Runtime, which allows the ADF engine to access and transfer on-premise data, and allows TX to leverage ADF pipeline architecture, while the firewall remains in place as-is
- Promoting the Data Lake Concept and Infrastructure
- An Ingest Instance Server does not support the transformation of data or provide the ability to override original field names during ingestion of data by the Ingest Instance server. This promotes the data lake concept by ensuring that the data is initially ingested and stored in a raw format. Data scientists often request raw data, and initially storing data in its raw form also allows for greater insight into data lineage (i.e. what is the original source of the data and how is it later being transformed). It also makes the transfer of data from source to storage faster due to fewer transformations. However, it is worth noting that it is possible to “query tables” using SQL code when setting up data sources, and thereby apply transformations and renaming of fields – although in general this is not considered best practice
- An Ingest Instance Server allows for data storage within the Azure Data Lake infrastructure, which cannot be achieved when using a Business Unit.
- The Ingest Instance Server creates multiple versions of source data, which are stored after each execution of a transfer task (e.g. one version of a file per transfer). TX projects pointing to the Ingest Instance server automatically uses the latest version of data. It is possible to configure and schedule a storage management task to delete, and manage, old versions of data to free up storage. This archival process is driven by inexpensive storage, and lends itself to the data lake concept. It also allows for the creation of backup files which may be used for recovery
- Select data quickly and dynamically
- Selection of data is quick, as it is simple to dynamically choose which tables and columns from data source are brought in (e.g. all tables from a particular schema, or tables with names containing a particular term)
- You can use a different simple selection option to individually choose what tables and fields your data source should connect to.
- TIS server improves team development and supports the server-client experience
- TimeXtender Data Integration (TDI) does not have to be installed on the Ingest Instance Server. Developers can install TimeXtender Data Integration (TDI) on their own machines, and load source data into storage through the Ingest Instance server transfer tasks, without ingesting the data onto the developer machine. The data is transferred directly to the Ingest Instance destination server
- This provides much more server-client like experience, as opposed to a desktop-client experience. Using Ingest Instances in combination with ADF Transfer to prepare instances, means the TimeXtender Data Integration (TDI) application becomes purely an orchestration tool, transferring data without involving the application server
- You can use incremental loading of tables into your Ingest Instance storage
- Doing this will automatically set up tables in your Prepare Instance to incrementally load data from the store with no setup required.
- Instead of generating a new folder containing all the data a batch file is added to the existing folder with only the new or updated rows.