Follow

Selecting, Validating and Transforming Data

Selecting, Validating and Transforming Data

Selecting the right data from the source, validating it and transforming the data if needed are central parts of the data warehouse process.

In Discovery Hub you specify data selection rules to ensure that only the data needed for your analysis is extracted from the data source to the staging database.

On the staging database, you perform data cleansing by applying validation and transformation rules to the data. This ensures that only valid data is loaded into the data warehouse.

However, you can also apply selection, validation and transformation rules on a data warehouse. This is useful when you have moved data from different business units into the data warehouse and want to ensure the validity of the consolidated data.

Operators for selecting and validating data

When defining a data selection or validation rule, you can use the operators listed below.

Values must be either integers or letters. You can also specify a list of values by entering comma-separated values.

Operator

Definition

Not Empty

Selects records where the value of a field is not empty or NULL

Equal

Selects records where the value of a field is equal to the specified value

Greater Than

Selects records where the value of a field is greater than the specified value

Less Than

Selects records where the value of a field is less than the specified value

Not Equal

Selects records where the value of a field is not equal to the specified value

Greater or Equal

Selects records where the value of a field is greater than or equal to the specified value

Less or Equal

Selects records where the value of a field is less than or equal to the specified value

Min. Length

Selects records that contain at least the specified number of characters

Max. Length

Selects records that contain no more than the specified number of characters

List

Selects records where the value of a field is equal to one of the specified comma separated values

Empty

Selects records where the value of a field is empty or NULL

Not in List

Selects records where the value of a field is not equal to one of the specified comma separated values

Like

Selects records where the value of a field is similar to the specified value. A percent sign (%) can be used as a wildcard. For instance, "ABC%" will return all records where the value in the specified field starts with "ABC".

Not Like

Selects records where the value of a field is not similar to the specified value. A percent sign ( % ) can be used as a wildcard. For instance, "ABC%" will return all records where the value in the specified field does not start with "ABC".

Data Selection Rules

Data selection rules are used to specify a set of conditions that data extracted from a source table must satisfy. By applying selection rules, only the subset of data that you actually need is loaded into the data warehouse or staging database.

On data warehouses you can add data selection rules on both the table level and the source table level. If more than one source table delivers data to a data warehouse table, you can set up different rules for each source table, but you can also set up general rules.

On Qlik models, data selection rules are set on the source table level.

You can add usage conditions to selection rules based on project variables. This enables you to e.g. load less data in a development environment than in the production environment.

Adding a Data Selection Rule

  1. Expand the business unit that contains the data source you want to apply the selection rule to, expand Data Sources and then expand the relevant data source.
    - OR -
    Expand the data warehouse that contains the table you want to apply the selection rule to and expand Tables. If you want to apply the rule to a specific source table, expand the table and Mappings as well.
    - OR -
    Expand the Qlik model that contains the table you want to apply the selection rule to and expand Tables. If the table is concatenated, expand Mappings as well.
  2. Right-click the table, or source table, you want to add the selection rule to and click Add Data Selection Rule.
    The Data Selection pane appears in the right hand side of the window.
  3. Click the field you want to use in the selection rule.
  4. In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
  5. If applicable, type a value for the operator in the Valuebox.
  6. Click Add.

All selection rules that you have applied to a table are displayed below the relevant table.

Adding a Usage Condition to a Selection Rule

To add a usage condition to a selection rule based on a project variable, follow the steps below.

  1. Right click a selection rule and click Add Usage Condition. The Usage Condition panel is displayed in the right hand side of the application window.
  2. In the Usage Condition panel, click the variable you want to use.
  3. In the Operator list, click the operator you want to use. You have the following options:
    • Equal
    • NotEqual
    • GreaterThan
    • LessThan
    • GreaterEqual (Greater than or Equal to)
    • LessEqual (Less than or Equal to)
  4. In the Comparer list, click the general data type of the variable, which Discovery Hub will use in the comparison. You have the following options:
    • String
    • Date
    • Numeric
  5. Type the value you want to compare the parameter with in the Value box.
  6. Click Add to add the usage condition.

For more information about project variables, see Project Variables.

Data Validation Rules

Validation rules ensure a high level of accuracy and reliability of the data in the data warehouse and are used to discover invalid data. You can apply validation rules at the field level in the staging database or at field level in the data warehouse.

While data is cleansed on the staging database, it often has to be cleansed again if you have consolidated data from different business units in the data warehouse.

You can make a validation rule conditional if you want the rule to apply in specific situations only.

For each validation rule you apply to a field, you must also classify the severity of a violation. The following classifications are available:

SeverityDefinition
Warning The violation is not critical to the data quality and does not require immediate attention. The data is considered valid and will still be made available to the end users.
ErrorThe violation is critical to the data quality and requires immediate attention. The data is considered invalid and will not be made available to the end users.

Adding Data Validation Rules

You can add any number of validation rules to a field.

  1. Expand the business unit that contains the data source you want to apply the validation rule to, expand Data Sources, expand the relevant data source and expand the relevant table.
    - OR -
    Expand the data warehouse that contains the table you want to apply the selection rule to, expand Tables and expand the relevant table.
  2. Right-click the field, you want to apply the validation rule to, and click Field Validations. The Field Validations pane appears in the right-hand side of the window.
  3. Click the field you want to use in the validation rule.
  4. In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
  5. If applicable, type a value for the operator in the Value box.
  6. Click Error to specify that as the severity level or leave it at Warning.
  7. Click Add to add the rule.

Adding Conditions

You can add any number of conditions to your validation rules. Follow the steps below to add a validation rule.

  1. Locate the selection rule you want to modify.
  2. Right-click the rule and then click Add Condition. The Condition pane is displayed.

  3. In the Operator list, click the operator you want to use. See Operators for Selecting and Validating Data.
  4. In the Value field, type the value you want to use in the comparison.
    - OR -
    Click Field and click the field, you want to use in the comparison.
  5. Click Add to add the condition to the rule.

The condition is displayed below the validation rule or transformation rule it belongs to.

To View Validation Errors or Warnings

  1. On the Reports menu, click Errors or Warnings.
  2. In the Database list, click the database that contains the table you want to view errors or warnings for.
  3. In the Table list, click the relevant table. The No. of rows box displays the number of errors or warnings in the table and the rows that violate the rules are displayed in the pane below.
  4. Click any row to display the error or warning message in the Error Message or Warning Message box.

Data Transformation

Fields transformations lets you modify existing data in a number of ways. You can, for example, easily reverse the sign of numeric values, trim fields or return a specified number of characters from the original field value.

Adding Field Transformation Rules

  1. Expand the business unit that contains the data source you want to apply the validation rule to, expand Data Sources, expand the relevant data source and expand the relevant table.
    - OR -
    Expand the data warehouse that contains the table you want to apply the selection rule to, expand Tables and expand the relevant table.
  2. Right-click the field, you want to add a transformation rule to, and then click Field Transformations. The Field Transformation pane appears. In the pane, click the field you want to add a transformation to.
  3. In the Operator list, click the operator, you want to use, and then click Add.
    OperatorDescription
    UpperConverts all text values to upper-case
    LowerConverts all text values to lower-case
    FirstReturns the number of beginning characters specified by the user
    LastReturns the number of ending characters specified by the user
    TrimLeftTrims padded spaces from the left of the data
    TrimRightTrims padded spaces from the right of the data
    TrimTrims padded spaces from the left and right of the data
    FixedInserts a fixed value that is specified by the user
    CustomAllows for custom SQL code to be executed
    ReverseSignReverses the sign for numeric values

    TimeOnly

    Returns only the time portion of a datetime field

    DateOnly

    Returns only the date portion of a datetime field

    Replace

    Replaces one set of characters with another

  4. If you have selected First or Last as the operator, enter how many characters you want to include in the Length field.

Adding Conditions

You can add conditions to transformation rules in the same way that you add conditions to validation rules. See Adding Conditions

Was this article helpful?
0 out of 0 found this helpful

0 Comments

Please sign in to leave a comment.