Skip to main content
Question

Help needed: 85gb json (500 million + rows) + odx crash memory timeout

  • January 28, 2026
  • 4 replies
  • 29 views

Hello, 

Technical: 20.10.67 using ODX
Connector: Cdata Ado.net provider for Json (2024)
Settings: 
Source: Azure blob 
Data Model=FlattenedDocuments;
Flatten Row Limit=1000000

I have a situation with a odx transfer task failing and crashing the ODX due to memory.

I need to jump in and help in a project where they need to get a very large json (85gb) into TX, because any previous attempts to ingest it fails. I ran a sample size to test (49mb) which resulted in roughly 550k rows and took 60s to run with a transfer task. One page (5000) takes around 200-300ms from looking at logs.

If it is linear it would mean upwards of 900 million rows (we do not know the actual row amount) and it could take upwards of 30h if we assume linear scaling here. One easy problem was timeout time, which I fixed by increasing the ODX timeout from 2 hours to 24 hours during testing.

Currently I can get it to run to around 6-7 hours but then it fails due to the ODX crashing with the following error. System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. 

This is where I am currently stuck.

I need help to get the data into the system without crashing the entire ODX. I do not care if the transfer takes 30 hours, since it’s will only be done once a year.

Thoughts I have.

1. One possible issue I am not sure about, was that previously at around 3 hours the load would get the error: “Over 250000 flattened rows would be generated from a single element. Please choose a different DataModel or increase the FlattenRowLimit “. I adjusted it to 1million and then it gets past that point. Any lower amount seems to fail, can this cause the problems?

2. The ODX is set up with a batch size of “single batch”. Can changing this to something like 300k possibly fix the memory failure? 

3.One idea I have is for the source to split the json into multiple files. However, then we need to aggregate the files in the connector and I assume run into the same problems? Or we need to manually handle 100+ RSD files which does not seem sustainable.

4.Possibly changing to TX own enhanced connectors, but this cdata json connector works for every other file so I do not know if this will make any difference since its memory timeout failing. 

5. Any suggestions on pagesize? Couldn’t see that much of a difference between 1000 and 5000.


I am open to any suggestion or help, if anyone has solved this or has a workaround. 

Thank you, 
Victor

4 replies

Christian Hauggaard
Community Manager
Forum|alt.badge.img+5

Hi ​@Victor 

Have you tried scaling up the application VM running the ODX service?

Please try to use the TimeXtender enhanced data source to see if this resolves the issue


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+8
  • TimeXtender Xpert
  • January 28, 2026

Hi,

the JSON structure will be key to what is possible. 85GB in a single JSON is obviously not a good idea by itself, but if the data can easily be split that would help. Other than CData settings that may affect this, there is no way to cause ODX to split the file into smaller parquet chunks. This means it may require full in-memory loading of the JSON or a very large part of it, depending on the internal structure of the JSON.

The “Over x flattened rows error” is due to an internal limit in CData preventing the flattening/pivoting from blowing up your resource use. In this case one element (at the level your are pointing in xpath) requires unwrapping into more than 250K rows. This implies your structure may have subtables / elements that can be extracted separately by using explicit xpaths.

The TX Enhanced connector will want to load the whole file into memory and write out to a temporary parquet. Depending on cardinality you may require anywhere from 4 to more than 10 times the size of the file in free RAM to do that.

It may be possible to split the JSON into chunks of 1GB or so and load those. In that case the TX Enhanced connector should be able to aggregate the files, but I don't have any experience trying that with such large files.

I expect your best bet will be to use another tool (ADF for instance) to push JSON records from the file into SQL Server and then load that into TX. If you can create a table with a JSON field that contains the JSON for that record, you should be able to ingest that faster than directly ingesting a JSON file.


  • Author
  • Participant
  • January 28, 2026

Hi,

the JSON structure will be key to what is possible. 85GB in a single JSON is obviously not a good idea by itself, but if the data can easily be split that would help. Other than CData settings that may affect this, there is no way to cause ODX to split the file into smaller parquet chunks. This means it may require full in-memory loading of the JSON or a very large part of it, depending on the internal structure of the JSON.

The “Over x flattened rows error” is due to an internal limit in CData preventing the flattening/pivoting from blowing up your resource use. In this case one element (at the level your are pointing in xpath) requires unwrapping into more than 250K rows. This implies your structure may have subtables / elements that can be extracted separately by using explicit xpaths.

The TX Enhanced connector will want to load the whole file into memory and write out to a temporary parquet. Depending on cardinality you may require anywhere from 4 to more than 10 times the size of the file in free RAM to do that.

It may be possible to split the JSON into chunks of 1GB or so and load those. In that case the TX Enhanced connector should be able to aggregate the files, but I don't have any experience trying that with such large files.

I expect your best bet will be to use another tool (ADF for instance) to push JSON records from the file into SQL Server and then load that into TX. If you can create a table with a JSON field that contains the JSON for that record, you should be able to ingest that faster than directly ingesting a JSON file.




The json response looks like this (measurement data on a hourly level)

{"col1":[{"col2":"col3","col3":"val1","val2":true,"col3":[{"id":"val4","val5":[{"col5":"col6","values":[{"startDate":"2023-01-01 00:00:00","endDate":"2023-02-01 00:00:00","value":val5.1,"vol2":val3}, …

and so on per hour. 

json paths
 <api:set attr="JSONPath" value="$.val1;$.va1.data;$.val1.data.val2;$.val1.data.val2.values"/>

We are using a SQL Database for the ODX, so it is not a azure data lake so no parquet files. Does that leave us any more options?

The batch size on the ODX would make no difference, or is there no way to actually batch the json extraction to try and make it not crash ? 

 


rory.smith
TimeXtender Xpert
Forum|alt.badge.img+8
  • TimeXtender Xpert
  • January 28, 2026

Hi,

if you are using SQL then I think the batch size also influences how data is sent to ODX storage. Your data looks like it isn't a very nice structure though, one record might already be very large and contain many JSON arrays which you may not want to unwind to speed things up.