Question

Getting BLOOB file into DVH

  • 15 March 2024
  • 8 replies
  • 63 views

I need some guidence saving contents ofan endpoint as BLOOB. I can see that the version we use doesnot have an option for BLOOB but may be Text or Binary with maximum length can give me what i need.

So I have this RSD that i created 

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
<api:info title="GetAllStops" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
<!-- You can modify the name, type, and column size here. -->
<attr name="kortnavn" xs:type="string" readonly="false" other:xPath="/json/kortnavn" />
<attr name="tittel" xs:type="string" readonly="false" other:xPath="/json/tittel" />
<attr name="virksomhet" xs:type="string" readonly="false" other:xPath="/json/virksomhet" />
<attr name="opprinnelse" xs:type="string" readonly="false" other:xPath="/json/opprinnelse" />
<attr name="navn" xs:type="string" readonly="false" other:xPath="/json/navn" />
<attr name="vedleggID" xs:type="string" readonly="false" other:xPath="/json/elements/id" />
</api:info>

<api:set attr="urlbase" value="https://some.url.io"/>
<api:set attr="filein.DataModel" value="DOCUMENT" />
<api:set attr="filein.JSONPath" value="$." />
<api:set attr="filein.URITemplate" value="[urlbase]?kortnavn=ABCD&fomDato=1999-01-01"/>
<api:set attr="filein.ElementMapPath#" value="/json/uuid" />
<api:set attr="filein.ElementMapName#" value="file_id" />

<api:set attr="stopin.DataModel" value="FLATTENEDDOCUMENT" />
<api:set attr="stopin.EnablePaging" value="TRUE" />
<api:set attr="stopin.JSONPath" value="$.;$.elements;" />
<api:set attr="stopin.URITemplate" value="[urlbase]/file_id"/>


<!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
<api:script method="GET">
<api:set attr="filein.URI" value="[filein.URITemplate]"/>
<api:call op="jsonproviderGet" in="filein" out="fileout">
<api:set attr="stopin.URI" value="[stopin.URITemplate | replace('file_id', [fileout.file_id])]"/>
<api:call op="jsonproviderGet" in="stopin" out="stopout">
<api:set attr="out.kortnavn" value="[stopout.kortnavn | allownull()]"/>
<api:set attr="out.tittel" value="[stopout.tittel | allownull()]"/>
<api:set attr="out.virksomhet" value="[stopout.virksomhet | allownull()]"/>
<api:set attr="out.opprinnelse" value="[stopout.opprinnelse | allownull()]"/>
<api:set attr="out.vedleggID" value="[stopout.vedleggID | allownull()]"/>
<api:push item="out"/>
</api:call>
</api:call>
</api:script>
</api:script>

Now, the BLOOB file has another endpoint: [urlbase]/file_id/attachment/vedleggID
I have urlbase which is https://some.url.io
I have file_id from that code above
I have vedleggID from the code above

So the end pint loosk like this: [urlbase]/123abc456defghi/attachment/222eee444ggg66677

How do i now build the RSD futher to get the attachment into the DVH?


8 replies

I forgot to include the screen short for what the last endpoint looks like in POSTMAN

This is not in JSON format. 

The last endpoint is a BLOOB file 

I have comeup with a code

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
<api:info title="GetAllStops" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
<!-- You can modify the name, type, and column size here. -->
<attr name="kortnavn" xs:type="string" readonly="false" other:xPath="/json/kortnavn" />
<attr name="tittel" xs:type="string" readonly="false" other:xPath="/json/tittel" />
<attr name="virksomhet" xs:type="string" readonly="false" other:xPath="/json/virksomhet" />
<attr name="opprinnelse" xs:type="string" readonly="false" other:xPath="/json/opprinnelse" />
<attr name="navn" xs:type="string" readonly="false" other:xPath="/json/navn" />
<attr name="vedleggID" xs:type="string" readonly="false" other:xPath="/json/elements/id" />
<attr name="bloobfile" xs:type="string" readonly="false" other:xPath="dummy" />

</api:info>

<api:set attr="urlbase" value="https://some.url.io"/>
<api:set attr="filein.DataModel" value="DOCUMENT" />
<api:set attr="filein.JSONPath" value="$." />
<api:set attr="filein.URITemplate" value="[urlbase]?kortnavn=ABCD&fomDato=1999-01-01"/>
<api:set attr="filein.ElementMapPath#" value="/json/uuid" />
<api:set attr="filein.ElementMapName#" value="file_id" />

<api:set attr="stopin.DataModel" value="FLATTENEDDOCUMENT" />
<api:set attr="stopin.EnablePaging" value="TRUE" />
<api:set attr="stopin.JSONPath" value="$.;$.elements;$.elements.attachments" />
<api:set attr="stopin.URITemplate" value="[urlbase]/file_id"/>
<api:set attr="stopin.ElementMapPath#" value="/json/elements/attachments/id" />
<api:set attr="stopin.ElementMapName#" value="vedlegg_id" />
<api:set attr="stopin.URITemplate" value="[urlbase]/file_id/vedlegg/vedlegg_id"/>


<!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
<api:script method="GET">
<api:set attr="filein.URI" value="[filein.URITemplate]"/>
<api:call op="jsonproviderGet" in="filein" out="fileout">
<api:set attr="stopin.URI" value="[stopin.URITemplate | replace('file_id', [fileout.file_id])]"/>
<api:call op="jsonproviderGet" in="stopin" out="stopout">
<api:set attr="out.kortnavn" value="[stopout.kortnavn | allownull()]"/>
<api:set attr="out.tittel" value="[stopout.tittel | allownull()]"/>
<api:set attr="out.virksomhet" value="[stopout.virksomhet | allownull()]"/>
<api:set attr="out.opprinnelse" value="[stopout.opprinnelse | allownull()]"/>
<api:set attr="out.vedleggID" value="[stopout.vedleggID | allownull()]"/>
<api:set attr="out.bloobfile" value="[stopout.bloobfile | allownull()]"/>
<api:push item="out"/>
</api:call>
</api:call>
</api:script>
</api:script>

This however gives me an error

Could not execute the specified command: Formatter replace failed in the evaluation of [stopin.URITemplate | replace('file_id', [fileout.file_id])| replace('vedlegg_id', [fileout.vedlegg_id])]. The error was:
Formatter [ vedlegg_id ] failed in the evaluation of [fileout.vedlegg_id]. The error was:
The value of the attribute could not be accessed: The attribute does not exist.
Value: https://some.url.io/0123abc456def/vedlegg/vedlegg_id
Parameters:
vedlegg_id
Unable to evaluate: [fileout.vedlegg_id]

So what am i doing wrong?

Userlevel 5
Badge +7

Hi @aftabp ,

 

as far as I can tell it cannot resolve the [fileout.file_id] where you are building up stopin.URI. It looks like it wants there to be a fileout.vedlegg_id, but not finding it. I think you may want to see if you can print of the attributes and set contents so you can track what is not going correctly.

Note that storing BLOBs in a database is usually not very good for storage and performance. If this files are larger than a few KB, I would store them in Azure Blob storage and store a link to the file in a database. Depending on whether you are using Azure SQL PaaS or IaaS SQL Server there are slightly different options.

The binary and varbinary types are what you typically would use in Azure SQL DB, those go up to 8000 bytes for ‘normal’ lengths and 2GB if you use varbinary(max). As they store inside the database, you will run into fragmentation issues and the hard limits on log i/o won't help either. I would strongly consider only grabbing the metadata and build up a file path to Blob storage that you store in the DB. Then run some Powershell to grab the actual files and store them in the blob storage under the path you prepared.

If you are running SQL Server on a VM you would probably be better of using the FILESTREAM type which store files in the file-system if they are larger than trivial. See: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filestream-sql-server?view=sql-server-ver16

Userlevel 6
Badge +5

Hi @aftabp 

Does adding another enablepaging part make a difference?

    <api:set attr="filein.DataModel"         value="DOCUMENT" />
    <api:set attr="filein.EnablePaging"     value="TRUE" />
    <api:set attr="filein.JSONPath"         value="$." /> 
    <api:set attr="filein.URITemplate"         value="[urlbase]?kortnavn=ABCD&fomDato=1999-01-01"/>
    <api:set attr="filein.ElementMapPath#"     value="/json/uuid" />
    <api:set attr="filein.ElementMapName#"     value="file_id" />

 

@Thomas Lind 
I did what you suggested and shortened the code to only get necessary data, the error presists 

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
<api:info title="GetAllStops" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
<!-- You can modify the name, type, and column size here. -->
<attr name="filnavn" xs:type="string" readonly="false" other:xPath="dummy" />
<attr name="vedleggID" xs:type="string" readonly="false" other:xPath="dummy" />
<attr name="bloobfile" xs:type="string" readonly="false" other:xPath="dummy" />

</api:info>

<api:set attr="urlbase" value="https://som.url.io"/>
<api:set attr="filein.DataModel" value="DOCUMENT" />
<api:set attr="filein.JSONPath" value="$." />
<api:set attr="filein.URITemplate" value="[urlbase]?kortnavn=tilskudd/handikapprosjektet&fomDato=1999-01-01"/>
<api:set attr="filein.ElementMapPath#" value="/json/uuid" />
<api:set attr="filein.ElementMapName#" value="file_id" />

<api:set attr="stopin.DataModel" value="FLATTENEDDOCUMENT" />
<api:set attr="stopin.EnablePaging" value="TRUE" />
<api:set attr="stopin.JSONPath" value="$.;$.elements" />
<api:set attr="stopin.URITemplate" value="[urlbase]/file_id"/>

<api:set attr="bloobin.DataModel" value="FLATTENEDDOCUMENT" />
<api:set attr="bloobin.EnablePaging" value="TRUE" />
<api:set attr="bloobin.JSONPath" value="" />
<api:set attr="bloobin.URITemplate" value="[urlbase]/file_id/vedlegg/vedlegg_id"/>
<api:set attr="bloobin.ElementMapPath#" value="/json/elements/bilderHCprosjektet/filnavn" />
<api:set attr="bloobin.ElementMapName#" value="filnavn" />
<api:set attr="bloobin.ElementMapPath#" value="/json/elements/bilderHCprosjektet/id" />
<api:set attr="bloobin.ElementMapName#" value="vedlegg_id" />


<!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
<api:script method="GET">
<api:set attr="filein.URI" value="[filein.URITemplate]"/>
<api:call op="jsonproviderGet" in="filein" out="fileout">

<api:set attr="stopin.URI" value="[stopin.URITemplate | replace('file_id', [fileout.file_id])]"/>
<api:call op="jsonproviderGet" in="stopin" out="stopout">

<api:set attr="bloobin.URI" value="[bloobin.URITemplate | replace('file_id', [fileout.file_id]) | | replace('vedlegg_id', [bloobout.vedlegg_id])]"/>
<api:call op="jsonproviderGet" in="bloobin" out="bloobout">


<api:set attr="out.filnavn" value="[bloobout.filnavn | allownull()]"/>
<api:set attr="out.vedleggID" value="[bloobout.vedleggID | allownull()]"/>
<api:set attr="out.bloobfile" value="[bloobout.bloobfile | allownull()]"/>
<api:push item="out"/>
</api:call>
</api:call>
</api:call>
</api:script>
</api:script>

 

So there are 3 levels to this:

  1. First one: i get UUID in filein.ElementMapName#
  2. The second: I get all the rows in stopin:URITemlate value=”[urlbase]/file_id”
  3. This is where it gets tricky, i cant seem to get hold of vedlegg_id to get the BLOOB file

 

Userlevel 6
Badge +5

 Try this

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
<api:info title="GetAllStops" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
<!-- You can modify the name, type, and column size here. -->
<attr name="filnavn" xs:type="string" readonly="false" other:xPath="dummy" />
<attr name="vedleggID" xs:type="string" readonly="false" other:xPath="dummy" />
<attr name="bloobfile" xs:type="string" readonly="false" other:xPath="dummy" />

</api:info>

<api:set attr="urlbase" value="https://som.url.io"/>
<api:set attr="filein.DataModel" value="DOCUMENT" />
<api:set attr="filein.EnablePaging" value="TRUE" />
<api:set attr="filein.JSONPath" value="$." />
<api:set attr="filein.URITemplate" value="[urlbase]?kortnavn=tilskudd/handikapprosjektet&fomDato=1999-01-01"/>
<api:set attr="filein.ElementMapPath#" value="/json/uuid" />
<api:set attr="filein.ElementMapName#" value="file_id" />

<api:set attr="stopin.DataModel" value="FLATTENEDDOCUMENT" />
<api:set attr="stopin.EnablePaging" value="TRUE" />
<api:set attr="stopin.JSONPath" value="$.;$.elements" />
<api:set attr="stopin.URITemplate" value="[urlbase]/{file_id}"/>

<api:set attr="bloobin.DataModel" value="DOCUMENT" />
<api:set attr="bloobin.EnablePaging" value="TRUE" />
<api:set attr="bloobin.JSONPath" value="$.elements.bilderHCprosjektet" />
<api:set attr="bloobin.URITemplate" value="[urlbase]/{file_id}/vedlegg/{vedlegg_id}"/>
<api:set attr="bloobin.ElementMapPath#" value="/json/elements/bilderHCprosjektet/filnavn" />
<api:set attr="bloobin.ElementMapName#" value="filnavn" />
<api:set attr="bloobin.ElementMapPath#" value="/json/elements/bilderHCprosjektet/id" />
<api:set attr="bloobin.ElementMapName#" value="vedlegg_id" />


<!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
<api:script method="GET">
<api:set attr="filein.URI" value="[filein.URITemplate]"/>
<api:call op="jsonproviderGet" in="filein" out="fileout">

<api:set attr="stopin.URI" value="[stopin.URITemplate | replace('{file_id}', [fileout.file_id])]"/>
<api:call op="jsonproviderGet" in="stopin" out="stopout">

<api:set attr="bloobin.URI" value="[bloobin.URITemplate | replace('{file_id}', [fileout.file_id]) | | replace('{vedlegg_id}', [bloobout.vedlegg_id])]"/>
<api:call op="jsonproviderGet" in="bloobin" out="bloobout">
<api:set attr="out.filnavn" value="[bloobout.filnavn | allownull()]"/>
<api:set attr="out.vedleggID" value="[bloobout.vedleggID | allownull()]"/>
<api:set attr="out.bloobfile" value="[bloobout.bloobfile | allownull()]"/>
<api:push item="out"/>
</api:call>
</api:call>
</api:call>
</api:script>
</api:script>

Also, where do you get the vedlegg_id field from? If it is to be used for the third call it should be received from the second.

@Thomas Lind 
Yes, it gets vedlegg_id from [urlbase]/{file_id}

It workd fine up until [urlbase]/{file_id} as you can see from the message below. Its even able to find vedleggID (id in JSON) if i skip the BLOOB part 

[500] Could not execute the specified command: Formatter replace failed in the evaluation of [bloobin.URITemplate | replace('{file_id}', [fileout.file_id]) | | replace('{vedlegg_id}', [bloobout.vedlegg_id])]. The error was:
Item bloobout specified in the formatter, does not exist.
Value: https://https://som.url.io/1234567-xxxx-yyyyy-zzzzz-abvdefghi/vedlegg/{vedlegg_id}
Parameters:
{vedlegg_id}
Unable to evaluate: [bloobout.vedlegg_id]

Here is how the structure for  bilderHCprosjektet looks like.

The XPath for id is: /json/elements/bilderHCprosjektet/id

The last URL (https://https://som.url.io/1234567-xxxx-yyyyy-zzzzz-abvdefghi/vedlegg/{vedlegg_id}) gives just the file (a couple of kb) that i want to save in DWH in BLOOBFILE column

 

Userlevel 6
Badge +5

Hi @aftabp 

I sent you a private message.

I have an idea as to what you are missing, but I will need some extra info to apply it to the file.

Reply