CData Performance

CData Providers - Performance Optimization and Troubleshooting

  • 16 February 2023
  • 0 replies
  • 759 views

Userlevel 3
Badge +3

This article describes general techniques to optimize performance of CData Providers and to troubleshoot errors or unexpected behavior during data extraction.  For specific provider-related techniques, refer to the provider documentation.

Performance

A CData provider’s performance is largely dependent on the data source and the amount of data.

Decreasing the amount of data retrieved will improve performance.

Queries with multiple joins, many columns, large number of rows, etc. take longer.

Using server-side filters to decrease the data set can improve performance. These filters are passed through the API request to the actual data source

Properties related to performance:

  • MaxRows - limits the number of rows returned when no aggregation or group by is used in the query
  • Verbosity - decreasing the verbosity level will reduce the amount of log writing
  • PageSize - affects the maximum number of results to return per page; If you get an error asking to request less data, set PageSize to a lower value.
  • BatchSize - useful when the server has limitations on the size of the request that can be submitted
  • Disable PseudoColumns / CustomFields to decrease the request size

Generating Logs

Logfiles can be generated by setting two properties:

  • Logfile – set to a file path and file name (for example,    Logfile=‘C:\Logs\REST_Log.txt’)
  • Verbosity – set to a number between 1 and 5 (for example,    Verbosity=3)

A Verbosity 3 log is usually sufficient for troubleshooting a technical issue. However, if logging is always on, the verbosity should not be set higher than 1 to avoid performance loss.

Verbosity Levels:

  • Logs the query, the number of rows returned by it, the start of execution and the time taken, and any errors.
  • 2  Everything above + cache queries and additional information about the request
  • 3  Everything above + HTTP headers and body of the request and response
  • 4  Everything above + transport-level communication with the data source, including SSL negotiation
  •  Everything above + communication with the data source and additional details that may be helpful in troubleshooting problems, including interface commands.

Note: Verbosity levels 3 and higher can capture sensitive client data, including:

  • 3  Full body of the request and response, this will include all data returned by the provider
  • 4  SSL certificates
  • 5  Any other transfer data, including binary transfer data

The provider masks sensitive values, such as passwords, in the connection string and requests. However, it is best practice to review a log for sensitive data before sharing outside of your organization.

Verify the provider version

The provider version can be obtained from the log file. It is printed out whenever a connection is opened. For example:

Opened REST connection. Version: 22.0.8354.0. Edition: [ADO.NET Provider .NET 4.0].

The version will help identify known issues (if any) that have been resolved in later builds, so you may update the provider as needed.

Identify whether the issue originates from Product/Application or the Provider

Review the error message to identify the source of the problem (e.g. an error message you are familiar with, or know that it stems from a product).

View the stack trace for the exception.  If it contains the CData class, the error is most likely coming from the provider.

Verify the query being pushed down to the provider. Depending on the issue, you can distinguish whether the problem is specific to the product or the provider.

Identify whether the issue originates from the Provider or the Datasource

The log displays the order of operations performed (such as retrieving metadata, executing queries, etc.) along with the requests and responses sent to the server. The error message will be displayed in the log as well and will help identify at what point/call/request the error occurred.

The request and response sequence can be reviewed to identify if there are any issues. The request can be verified to ensure that the format and data is correct based on the query being issued. The request may vary between providers.  Reviewing the data source specifications may help identify issues, e.g., malformed request, for further troubleshooting.

 The response can also be reviewed as it may contain additional information on errors (For example, the SharePoint providers may just return an HTTP response that needs to be looked at). The response can also be reviewed for the data returned. This is useful for cases where data is not being returned when you expect it to. This may imply the data most likely doesn’t exist on the server, does not match the query, or a filtering issue in the provider. To troubleshoot further, you may remove filters (e.g., WHERE clause) and test if the data is returned. If it does, then this narrows down the issue to the filter. If the filter is pushed down to the provider (e.g., there isn’t an extra layer performing the filter), an issue can be raised with the support team to investigate further.

Reviewing Logs

[INFO|Connec] lines contain information on the provider used, the version, the edition, and the connection string

Every CData provider has a major and minor version: i.e., 22.0.8413

    • 22 – Major version, updated every year
    • 8413 – Minor version, updated with daily builds

[EXEC|Parsed] will contain the query executed by the user

[HTTP|Req: 0] contains the request sent by the provider including the URL and headers; the integer will increment to track different requests

[HTTP|Res: 0] contains the headers and response from the API; integer will correspond to the Request.

The response can be used to determine if missing data was returned by the API.

It may also contain a more detailed error message.

OAuth Troubleshooting

The provider will automatically refresh the OAuth token (where applicable, based on support from the data source) when it expires. This is done when the InitiateOAuth property is set to GETANDREFRESH or REFRESH. In this scenario, upon making a connection, the provider will use the refresh token to obtain a new Access Token. By default, the Access Token is saved to disk via the OAuthSettingsLocation property. This file is unique per user/credentials. If there are multiple data sources with different credentials/accounts being used, then the OAuthSettingsLocation value will need to be set to unique files. If those are set to the same file, there will be authentication errors and conflicts as the access token stored in the file will not be correct for one of the data sources and may be overwritten when a refresh occurs.

For REST API, use Postman to verify connection

Sometimes, the server cannot perform the request due to a client side problem, and returns a 400 (Bad Request) status code. To cross-check whether the connection string options are working OK, test the same request in an external tool (like Postman) with the URL and Header.
If you are able to see the result in Postman, then test again with the CData provider in TimeXtender and generate a log in order to troubleshoot.

Timeout

Timeouts can occur because of the Timeout connection property or due to the data source.

The Timeout connection property can be set to 0, which means the provider will never time out.

An easy way to determine if the timeout is caused by the provider or the data source is to compare the timestamps of the initial request and timeout in a verbosity 2 log. If the timestamps are exactly the length that Timeout is set to, it is caused by the provider.

Review  Timeouts on databases for more details on timeout issues.


0 replies

Be the first to reply!

Reply