Hello,
Technical: 20.10.67 using ODX
Connector: Cdata Ado.net provider for Json (2024)
Settings:
Source: Azure blob
Data Model=FlattenedDocuments;
Flatten Row Limit=1000000
I have a situation with a odx transfer task failing and crashing the ODX due to memory.
I need to jump in and help in a project where they need to get a very large json (85gb) into TX, because any previous attempts to ingest it fails. I ran a sample size to test (49mb) which resulted in roughly 550k rows and took 60s to run with a transfer task. One page (5000) takes around 200-300ms from looking at logs.
If it is linear it would mean upwards of 900 million rows (we do not know the actual row amount) and it could take upwards of 30h if we assume linear scaling here. One easy problem was timeout time, which I fixed by increasing the ODX timeout from 2 hours to 24 hours during testing.
Currently I can get it to run to around 6-7 hours but then it fails due to the ODX crashing with the following error. System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
This is where I am currently stuck.
I need help to get the data into the system without crashing the entire ODX. I do not care if the transfer takes 30 hours, since it’s will only be done once a year.
Thoughts I have.
1. One possible issue I am not sure about, was that previously at around 3 hours the load would get the error: “Over 250000 flattened rows would be generated from a single element. Please choose a different DataModel or increase the FlattenRowLimit “. I adjusted it to 1million and then it gets past that point. Any lower amount seems to fail, can this cause the problems?
2. The ODX is set up with a batch size of “single batch”. Can changing this to something like 300k possibly fix the memory failure?
3.One idea I have is for the source to split the json into multiple files. However, then we need to aggregate the files in the connector and I assume run into the same problems? Or we need to manually handle 100+ RSD files which does not seem sustainable.
4.Possibly changing to TX own enhanced connectors, but this cdata json connector works for every other file so I do not know if this will make any difference since its memory timeout failing.
5. Any suggestions on pagesize? Couldn’t see that much of a difference between 1000 and 5000.
I am open to any suggestion or help, if anyone has solved this or has a workaround.
Thank you,
Victor