Get latest parquet file from external blob storage

Question

Hello,I am connecting to an external blob storage using a private endpoint. The blob storage is structured as follows:parquet/	2025-12-29/		Acquisition.parquet			Asset.parquet			AssetBalanceSheet.parquet			…			User.parquet			ValuationAssumption.parquet				2025-12-30/		Acquisition.parquet			Asset.parquet			…				Etc.	I am using the TimeXtender Parquet Data Source provider. The connection works, but the results I get from the Metadata Manager are not what I expect.Current settings	Path: parquet/			Include subfolders: Yes			Included file types: parquet			File aggregation pattern: Asset.parquet	Expected behaviorI expect the Metadata Manager to return a single table called “Asset”, aggregated across all available date folders.Actual behaviorInstead, all files/tables are returned, with what appears to be an "_<number>" suffix added for each additional date folder (e.g. Asset, Asset_1, Asset_2, etc.).I tried many setting combinations:Test 1: Exact filename, no wildcardsPath: parquet/	Include subfolders: Yes	File aggregation pattern: Asset.parquet	Expected: Should aggregate all Asset.parquet files into one tableTest 2: Multiple exact filenamesPath: parquet/	Include subfolders: Yes	File aggregation pattern: Acquisition.parquet, Asset.parquet	Expected: Two tables - one for Acquisition, one for AssetTest 3: Wildcard at startPath: parquet/	Include subfolders: Yes	File aggregation pattern: *Asset.parquet	Expected: All files ending with "Asset.parquet"Test 4: Path wildcard with subfolder patternPath: parquet/	Include subfolders: Yes	File aggregation pattern: */Asset.parquet	Expected: All Asset.parquet in any subfolderTest 5: Double wildcardPath: parquet/	Include subfolders: Yes	File aggregation pattern: **/Asset.parquet	Expected: All Asset.parquet in any nested folderTest 6: Date pattern in aggregationPath: parquet/	Include subfolders: Yes	File aggregation pattern: 20??-??-??/Asset.parquet	Expected: All Asset.parquet in date-formatted foldersTest 7: Wildcard in path insteadPath: parquet/*/	Include subfolders: No	File aggregation pattern: Asset.parquet	Expected: Aggregate from all first-level subfolders But all these tests returned the same metadata as in the image from before. The only test that somewhat did what I wanted it to do was:Test 8: Specific date in pathPath: parquet/2025-12-29/	Include subfolders: No	File aggregation pattern: (empty)	Expected: One table per entity from that partition onlyThis returned one table per Parquet file in that date folder, which is correct but it only works for a fixed date.What I actually want is to ingest the newest available data each day, and as far as I know, the path cannot be made dynamic using a variable.Should I be using a different data source provider, or should I contact my data supplier with a request to change their folder structure?Thanks!

Thomas Lind · Answer

Hi ​@MvestDid you tryFile aggregation pattern: Asset*.parquetI would also try:*Asset*.parquetAlso totally alternative just:*Asset*

Current settings

Expected behavior

Actual behavior

Test 1: Exact filename, no wildcards

Test 2: Multiple exact filenames

Test 3: Wildcard at start

Test 4: Path wildcard with subfolder pattern

Test 5: Double wildcard

Test 6: Date pattern in aggregation

Test 7: Wildcard in path instead

Test 8: Specific date in path

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded