Skip to main content
Solved

REST connector cashing folder contains a data_.raw and a data_.xml. The result is no data

  • December 16, 2024
  • 5 replies
  • 71 views

Forum|alt.badge.img+1
  • Contributor
  • 25 replies

Hello,

 

I want to get metadate from my Azure datalake using their Blob API. 

I wasn't seeing any data in the Ingest storage so I turned on cashing to file, to try to see what's happening. 

 

There are three files in my cashing folder: 

  • Data_.raw: The return of the call, i.e. my actual data. This look excellent, except that it's a .raw file. Contents:
    <?xml version="1.0" encoding="utf-8"?>
    <EnumerationResults ServiceEndpoint="https://xxxx.blob.core.windows.net/" ContainerName="datalake">
    	<Prefix>my_prefix</Prefix>
    	<Blobs>
    		<Blob>
                ....
            </Blob>
    	</Blobs>
    	<NextMarker/>
    </EnumerationResults>

     

  • Data_.xml: Basically the same as the Data_.raw, but with the content of Data_.raw as the data of a value-element. The data also contains the XML header (so now the document has two headers) and the brackets have been encoded (i.e. all the `<` are now `<`). 
    <?xml version="1.0" encoding="utf-8"?>
    <Table_flattening_name>
    	<value>
            &lt;?xml version="1.0" encoding="utf-8"?&gt;
            &lt;EnumerationResults 
                ServiceEndpoint="https://xxxx.blob.core.windows.net/" 
                ContainerName="datalake"&gt;
                &lt;Prefix&gt;my_prefix&lt;/Prefix&gt;
                &lt;Blobs&gt;
                    &lt;Blob&gt;
                        ...
                    &lt;/Blob&gt;
                &lt;/Blobs&gt;
            &lt;NextMarker /&gt;
            &lt;/EnumerationResults&gt;
        </value>
    </Table_flattening_name>

     

  • Data_transformed_1.xml: The result of my XSLT on Data_.xml

Data_transformed_1.xml contains one empty element, which is caused by Data_.xml being malformed. 

I can't really figure out what's going on. In other APIs I only had two files. Not sure what the Data_.raw file is doing, but everything would work if that file were Data_.xml. 

 

What could be causing this? Why is there a Data_.raw file? How can I fix this?

 

Best answer by Thomas Lind

Hi ​@Benny 

Because we had this conversation in Zendesk I wanted to add what was done to resolve this for others to see.

The raw file is having a BOM. This means that the XML does not see the < sign at the beginning of the file and instead will read the BOM definition.
 
The developers believe that this behavior will be fixed in the next release, where you can force the code to treat data as XML.
 
In the meantime, you may get it to work by simply turning off caching to a file and keeping it set to in memory.

 

The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16 or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.

BOM

For reference to others, turning off caching of files is what was done to make it work and it was confirmed as working by Benny.

 

View original
Did this topic help you find an answer to your question?

5 replies

Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 1031 replies
  • December 16, 2024

Hi ​@Benny 

It would seem like there is no data in both of the examples, unless the …. is supposed to mean that one contains data.

The two files are the raw example of the source and the XML is the data when it is converted into that.

What are you connecting to, it seems like some sort of Microsoft?


Forum|alt.badge.img+1
  • Author
  • Contributor
  • 25 replies
  • December 16, 2024

The ellipsis is indeed meant to replace actual data. That all looks fine. 

I figured that the .raw represents raw data. But in the cashing of REST endpoints that do work, I don't see this file. There's just an XML with the raw data and a transformed.xml containing the transformed data. 

So that leaves me puzzled to what's happening. Especially since the .xml is malformed. 

 

I’m trying to get data from an Azure storage, specifically list the blobs in a certain container: https://learn.microsoft.com/en-us/rest/api/storageservices/list-blobs?tabs=microsoft-entra-id.

 

From other rest endpoints I connected to I expected to see something like:

Data_.xml --> Data_transformed.xml

In stead the cashing folder implies

Data_.raw --> Data_.xml --> Data_transfromed.xml

Data_.raw contains exactly the XML I need. Some weird transformation is applied that turns in into the malformed Data_.xml. Then my table flattening XSLT is applied, which can't make sense of the malformed XML. 


Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 1031 replies
  • December 19, 2024

Hi ​@Benny 

I have been trying to replicate this behavior, but it just creates one Data_.xml file.

Are you in version 6814.1 or newer working on version 7.1.0.0 of the REST data source?


Forum|alt.badge.img+1
  • Author
  • Contributor
  • 25 replies
  • December 19, 2024

Hello Thomas,

 

Ingest and DI were updated this morning to 6848.1. REST data source is on version 7.1.0.0.


Thomas Lind
Community Manager
Forum|alt.badge.img+5
  • Community Manager
  • 1031 replies
  • Answer
  • January 6, 2025

Hi ​@Benny 

Because we had this conversation in Zendesk I wanted to add what was done to resolve this for others to see.

The raw file is having a BOM. This means that the XML does not see the < sign at the beginning of the file and instead will read the BOM definition.
 
The developers believe that this behavior will be fixed in the next release, where you can force the code to treat data as XML.
 
In the meantime, you may get it to work by simply turning off caching to a file and keeping it set to in memory.

 

The Byte-Order-Mark (or BOM), is a special marker added at the very beginning of an Unicode file encoded in UTF-8, UTF-16 or UTF-32. It is used to indicate whether the file uses the big-endian or little-endian byte order. The BOM is mandatory for UTF-16 and UTF-32, but it is optional for UTF-8.

BOM

For reference to others, turning off caching of files is what was done to make it work and it was confirmed as working by Benny.

 


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings