Question

Help needed: 85gb json (500 million + rows) + odx crash memory timeout

Forum|Forum|1 month ago
January 28, 2026
7 replies
115 views

Victor
Participant

Hello,

Technical: 20.10.67 using ODX (Updated: SQL Database not data lake)
Connector: Cdata Ado.net provider for Json (2024)
Settings:
Source: Azure blob
Data Model=FlattenedDocuments;
Flatten Row Limit=1000000

I have a situation with a odx transfer task failing and crashing the ODX due to memory.

I need to jump in and help in a project where they need to get a very large json (85gb) into TX, because any previous attempts to ingest it fails. I ran a sample size to test (49mb) which resulted in roughly 550k rows and took 60s to run with a transfer task. One page (5000) takes around 200-300ms from looking at logs.

If it is linear it would mean upwards of 900 million rows (we do not know the actual row amount) and it could take upwards of 30h if we assume linear scaling here. One easy problem was timeout time, which I fixed by increasing the ODX timeout from 2 hours to 24 hours during testing.

Currently I can get it to run to around 6-7 hours but then it fails due to the ODX crashing with the following error. System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.

This is where I am currently stuck.

I need help to get the data into the system without crashing the entire ODX. I do not care if the transfer takes 30 hours, since it’s will only be done once a year.

Thoughts I have.

1. One possible issue I am not sure about, was that previously at around 3 hours the load would get the error: “Over 250000 flattened rows would be generated from a single element. Please choose a different DataModel or increase the FlattenRowLimit “. I adjusted it to 1million and then it gets past that point. Any lower amount seems to fail, can this cause the problems?

2. The ODX is set up with a batch size of “single batch”. Can changing this to something like 300k possibly fix the memory failure?

3.One idea I have is for the source to split the json into multiple files. However, then we need to aggregate the files in the connector and I assume run into the same problems? Or we need to manually handle 100+ RSD files which does not seem sustainable.

4.Possibly changing to TX own enhanced connectors, but this cdata json connector works for every other file so I do not know if this will make any difference since its memory timeout failing.

5. Any suggestions on pagesize? Couldn’t see that much of a difference between 1000 and 5000.

I am open to any suggestion or help, if anyone has solved this or has a workaround.

Thank you,
Victor

Christian Hauggaard
Community Manager
Forum|Forum|1 month ago
January 28, 2026

Hi @Victor

Have you tried scaling up the application VM running the ODX service?

Please try to use the TimeXtender enhanced data source to see if this resolves the issue

rory.smith
TimeXtender Xpert
Forum|Forum|1 month ago
January 28, 2026

Hi,

the JSON structure will be key to what is possible. 85GB in a single JSON is obviously not a good idea by itself, but if the data can easily be split that would help. Other than CData settings that may affect this, there is no way to cause ODX to split the file into smaller parquet chunks. This means it may require full in-memory loading of the JSON or a very large part of it, depending on the internal structure of the JSON.

The “Over x flattened rows error” is due to an internal limit in CData preventing the flattening/pivoting from blowing up your resource use. In this case one element (at the level your are pointing in xpath) requires unwrapping into more than 250K rows. This implies your structure may have subtables / elements that can be extracted separately by using explicit xpaths.

The TX Enhanced connector will want to load the whole file into memory and write out to a temporary parquet. Depending on cardinality you may require anywhere from 4 to more than 10 times the size of the file in free RAM to do that.

It may be possible to split the JSON into chunks of 1GB or so and load those. In that case the TX Enhanced connector should be able to aggregate the files, but I don't have any experience trying that with such large files.

I expect your best bet will be to use another tool (ADF for instance) to push JSON records from the file into SQL Server and then load that into TX. If you can create a table with a JSON field that contains the JSON for that record, you should be able to ingest that faster than directly ingesting a JSON file.

Victor
Author
Participant
Forum|Forum|1 month ago
January 28, 2026

Hi,

The json response looks like this (measurement data on a hourly level)

{"col1":[{"col2":"col3","col3":"val1","val2":true,"col3":[{"id":"val4","val5":[{"col5":"col6","values":[{"startDate":"2023-01-01 00:00:00","endDate":"2023-02-01 00:00:00","value":val5.1,"vol2":val3}, …

and so on per hour.

json paths
<api:set attr="JSONPath" value="$.val1;$.va1.data;$.val1.data.val2;$.val1.data.val2.values"/>

We are using a SQL Database for the ODX, so it is not a azure data lake so no parquet files. Does that leave us any more options?

The batch size on the ODX would make no difference, or is there no way to actually batch the json extraction to try and make it not crash ?

rory.smith
TimeXtender Xpert
Forum|Forum|1 month ago
January 28, 2026

Hi,

if you are using SQL then I think the batch size also influences how data is sent to ODX storage. Your data looks like it isn't a very nice structure though, one record might already be very large and contain many JSON arrays which you may not want to unwind to speed things up.

Victor
Author
Participant
Forum|Forum|1 month ago
January 28, 2026

Hi @Victor

Have you tried scaling up the application VM running the ODX service?

Please try to use the TimeXtender enhanced data source to see if this resolves the issue

Hello Christian, thanks for the response.

Does the enchanced json data source allow for RSD files?

Looking at the connector I can’t find where to point to a RSD file like the cdata one has which I could create on a small data size.

The issue is when trying to “sync” the source (either 100+ smalls files to aggregate or the 95GB json) either in the ODX or BU to make it possible to transfer the tables, it uses up a huge amount of memory and I had to kill it after 2 hours because it seems unreasonable to take that long. Without that succeeding I can’t even try to transfer the data.

I am trying to convince them to upgrade the VM, but with either size I am afraid that it will get a memory timeout either way eventually, unless we find a way to reduce it. After the crash 7 hours in, the ODX table had only reserved around 50gb in the odx database. I would guess that a 95 GB json is going to significantly be higher then that for everything once in a database.

Thomas Lind
Community Manager
Forum|Forum|1 month ago
February 2, 2026

Hi @Victor

RSD files are a CData specific feature. So only CData providers can use them.

What option in that is it you want to use?

From what options we have in the Enhanced provider the only one I can think of is the file aggregation pattern.

It will allow you to specify what parts of the file you aggregate and if you do not use it each file in the folder will become its own table.

How much control do you have of the file location?

Victor
Author
Participant
Forum|Forum|21 days ago
February 17, 2026

Update (sorry for slow response, have been away)

The file is in a azure data lake, no possibility to move it.

I kept trying the enhanced connector but I could not get it to sync the file without crashing. Most likely due to it trying to sync 100’s of json files and 900+ million rows.

I had to keep the cdata connector and I managed to find a setting for batching the loads in the connector which made it possible to load without memory crashing atleast. A RSD could be set up on a sample size. Still took over 48 hours to get the data in however.

They decided to put this on hold for now, but we would as some point need to find a solution that works.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded