Skip to main content

Hello, 

TimeXtender 20.10.67 with ODX. 

We have a source (azure datalake) that will geta json file dumped with the following format table.name/”date-when-file-dumped”. The files will be15million + rows and dumped atleast once a week.

What I want to achieve is for the connector to only extract the latest file and have it dynamically change. I can’t figure out a way to effectively do this in the connector, as the uri connection string wants to point to a specific filename or take everything in a concat.

I can’t figure out a way to change the URI dynamically and the only option I can think of in theory is concatenating all the json in the connector, including resourcecolumns and doing a custom query table on the connector with a DSR rule based on the MAX filedate.

However, in a year or two when there will be 50+ files (750 million rows in theory with concat), I am not sure how the performance or scalability of that solution will be. 

Anyone have done something similar or have a better idea how to do this?

Thanks, 
Victor

Hi,

I would move the files to load to a separate folder for ingestion and move them back after succesful load. There may be other options depending on the source connector you are using, but this will work regardless.


Hi Rory!

Do you mean that the integrations should move current file into a different folders as part of their dumping procedure? Or that the connector has the possibility to do this?


Hi,

in 20.10.x you don't have a built-in way of doing this - I would run Powershell around my reload. If you are running the full Orchestration for 20.10.x, you can automate it there. Are you using CData or Enhanced? In CData you should be able to build up the URI to the file from logic I guess, but then you would lose that capability after CData connectors go away. In TDI, Powershell can be added from within the TX process and that would be easier.


Reply