We are loading data from a SQL Server (onprem) to a Datalake (ingest instance). Data is stored as parquet, but TX is creating a huge parquet file for each table. We are using ADF to ingest the data into the ingest instance. Is there a way to load the data in such a way that when it gets into the Datalake the parquet can be divided in parts? table_xxx_0001.parquet, table_xxx_0002.parquet, etc. (I know this is not the parquet format it is just an example). Is this possible? We are having performance issues reading from parquet to AZURE SQL DB prepare instance. We are using ADO.NET from ingest to prepare.
Solved
Partition Parquet Files
Best answer by rory.smith
Hi,
Azure SQL DB S6 at 400 DTU delivers at most 4 * 400 = 1600 IOPS at 10ms latency. Log rate is a result of that as far as I am aware.
Azure SQL DB Hyperscale Premium at 2 vCore would deliver 8500 IOPS at 4ms latency with 150 mb/s Log rate.
Price-wise a 400 DTU database is more or less equivalent to a 2 vCore Hyperscale Premium machine.
https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tiers-dtu?view=azuresql
So I would check wait statistics to see if I/O is your bottleneck here, if so you could consider trying out a different type of PaaS database.
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.