Follow

Connect to JSON with a CData data source

Before you attempt this you need to add the JSON provider to your list of available sources.

Go through this guide before you start.

Add a CData data source

Contents

JSON Information

In many cases you can use this JSON provider instead of a REST provider, there are a few things you can do with this, that isn't available in REST. JSON can also do pagination, read about it here Pagination.

It should be mentioned that this provider can connect to JSONP files as well. Here is a link to the file I use in this example

http://mysafeinfo.com/api/data?list=top100rocksongs_billboard_2013&format=json&abbreviate=false&case=default

It contains the Top 100 Rock Songs (Billboard, 2013). The setup is as follows.

[
 { "ID": 1, "Sequence": 1, "Song": "STAIRWAY TO HEAVEN", "Performer": "Led Zeppelin" }, { "ID": 2, "Sequence": 2, "Song": "BOHEMIAN RHAPSODY", "Performer": "QUEEN" }, { "ID": 3, "Sequence": 3, "Song": "FREE BIRD", "Performer": "LYNYRD SKYNYRD" }, { "ID": 4, "Sequence": 4, "Song": "SMOKE ON THE WATER", "Performer": "DEEP PURPLE" }, { "ID": 5, "Sequence": 5, "Song": "COMFORTABLY NUMB", "Performer": "Pink Floyd" }
]

CData Setup

Add a CData data source and point to the JSON provider.

The first step in setting up a CData connector should always be to click the ? in the top right corner.

Sp_rgsm_lstegn.PNG

JSON guide from CData

You have two methods of connecting to the data.

Online resource

If it is on a online resource location you need to fill out the following fields.

URI: http://mysafeinfo.com/api/data?list=top100rocksongs_billboard_2013&format=json&abbreviate=false&case=default

You need to paste in the link to the JSON resource location.

JSON Path: <empty>

This is the the root object/element. If you want to be more specific there is some good examples in this link. JsonPath. Otherwise, just keep it empty, the JSON provider can work out the field automatically.

Local resource

If you have it saved as a .JSON file you need to fill out the following fields. I went to this page RAW data and copied the raw data and saved it in a file called Service.json.

URI: <path to file>

Add the path to the file in the data source field.

JSON Path:<empty>

The same as before.

Getting the data

Synchronize to get the available fields.

After that you can set up incremental load, make data type overwrites, data selection rules and use the Query Table tool.

You can also use additional data sources. You just need the additional sources to have the same metadata.

Advanced features

Automatically getting the correct data types

As standard CData will scan the first 100 rows and determine what data types these fields are. Sometimes this gives an error because the 101 row contained a longer string than was present in the previous rows. The solution simply is to change the Row Scan Depth field to a higher number. If you set it to 0 it will parse the whole file. Also in regards to JSON/XML files, this field doesn't only solve the issue of data type, but also what tables are available. If the file contains more than 100 rows in total these will not be added. So a good idea is to work out what the maximum amount of rows that are contained in the file and then add that or set it to 0 to be sure.

Parsing Hierarchical Data

If your file is containing nested objects, you now have the opportunity of seeing this. I used the Service.json file I downloaded earlier. It has this structure.

{
  "people": [
    {
      "personal": {
        "age": 20,
        "gender": "M",
        "name": {
          "first": "John",
          "last": "Doe"
        }
      },
      "vehicles": [
        {
          "type": "car",
          "model": "Honda Civic",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "07-17-2017",
              "desc": "oil change"
            },
            {
              "date": "01-03-2018",
              "desc": "new tires"
            }
          ]
        },
        {
          "type": "truck",
          "model": "Dodge Ram",
          "insurance": {
            "company": "ABC Insurance",
            "policy_num": "12345"
          },
          "maintenance": [
            {
              "date": "08-27-2017",
              "desc": "new tires"
            },
            {
              "date": "01-08-2018",
              "desc": "oil change"
            }
          ]
        }
      ],
      "source": "internet"
    },
When I pull it in like this and use no other features, as shown previously. It gives me one table called people and when I preview it, each nested part is put into one field.

What you can do about this is to use the Data Model feature. As standard it is not used and it has two options besides that. Flatten Documents and Relational.

Relational creates a table for each nested part and creates ids to relate them to each other.

Like this.

People now looks like this.

FlattenDocument makes all these previous tables one large one.

Like this.

Using the JSON Format field

This will give you some more configuration options, to make your data look correctly.

  • JSON

    This is the default format and should be used in the majority of cases.

  • JSONRows

    This is a specific format in which data is returned in a relational format consisting of rows of data contained within primitive arrays. Column information is returned as well in a separate array.

    Note: Data Model does not apply when using this JSONFormat.

    Example: 1

    {
      "dataset": {
        "column_names": [
          "Name",
          "Age",
          "Gender"
        ],
        "data": [
          [
            "John Doe",
            37,
            "M"
          ],
          [
            "David Thomas",
            25,
            "M"
          ]
        ]
      }
    }	

    The JSONPath property requires special syntax to identify the column and row paths. The syntax consists of specifying a path for each using a "column:" and "row:" prefix. Using the example above, the JSONPath would be set to: column:$.dataset.column_names;row:$.dataset.data

    In the case that columns are returned in an object with additional data, an additional "columnname:" prefix can be specified to identify the path to the value containing the column name.

    Example: 2

    {
      "columns": [
        {
          "name":"first_name",
          "type":"text"
        },
        {
          "name":"last_name",
          "type":"text"
        }
      ],
      "rows": [
        [
          "John",
          "Doe"
        ],
        [
          "David",
          "Thomas"
        ]
      ]
    }

    In the above example, JSONPath would be set to: column:$.columns;columnname:$.columns.name;row:$.rows

  • LDJSON (Line-Delimited JSON)

    This format is used to parse line-delimited JSON files (also known as NDJSON or JSONLines). Line-delimited JSON files contain a separate JSON document on each line.

    Example LDJSON File:

    { "Name": "John Doe", "Age": 37, "Gender": "M" }
    { "Name": "David Thomas", "Age": 25, "Gender": "M" }
    { "Name": "Susan Price", "Age": 35, "Gender": "F" }

    The JSONPath value is treated the same as when using the regular JSON format. The only difference is that the root path ($.) is always used (therefore treating all the lines of JSON as it is contained within an array).

    In the above example, the JSONPath will be "$.", which will return 3 rows containing the columns: Name, Age, and Gender.

In the real world this is necessary to use LDJSON, if you have a file containing more than one table in it. Without this it would only show one of the tables, but with this it will split the file out into two or more tables with the name it has in the file.

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.