Splitting data with the Validation Transform

Objective

After completing this lesson, you will be able to Split data using the validation transform.

The Validation Transform

The Validation transform enables the creation of validation rules and the moving of data into target objects based on whether they pass or fail validation.

Compared to the Case transform, the Validation transform has different features:

  • It has only two main target tables: One for rows passing the rule validation, one for rows failing the rule validation.
  • It generates a rule violation table that lists all the rules failed by each record.
  • It can directly replace incorrect or missing values with the desired data in the passing records target table.
  • It can generate validation reports for data stewards.

For example, you could have incoming data that you must validate in terms of date formats. You want to know which rows have a bad format, or no date at all, to inform the source system about this problem and try to solve it directly in the source. Also, you want to replace missing discounts with a default value of 0.

Validation Transform Use Cases

Use the Validation transform in your data flows to ensure that the data meets the required criteria.

  • Use the Validation transform to ensure that all values:
    • Are within a specific range
    • Have the same format
    • Do not contain a NULL value
  • Do NOT use the Validation transform to split your data into different tables. (Use the Case transform instead.)

Validation Transform Features

The Validation transform allows the defining of reusable business rules to validate each record and column.

  • The Validation transform qualifies data sets:
    • Based on rules.
    • With several rules per column allowed.
    • With simple (column) or complex (expression) rules.
  • Then applies chosen Actions on Failure:
    • Sends the record to the Pass schema.
    • Sends the record to the Fail schema.
    • Sends the record to both schemas.
    • Optionally substitutes the value in the Pass schema.

Validation Transform Options:

The Validation transform optionally collects:

  • Statistics for use in Validation reports in Management Console.
  • Sample data to display in those reports.

Output Schemas

Pass Output Schema

The Pass output schema is identical to the input schema.

If you choose to send failed data to the Pass output, Data Services does not track the results. It is advisable to substitute a value for failed data that is sent to the Pass output because Data Services does not add columns to the Pass output.

Fail Output Schema

Data Services adds two columns to the Fail output schema:

  • The DI_ERRORACTION column indicates where failed data is sent:
    • The letter B is used for Both( Pass and Fail) outputs.
    • The letter F is used for data sent only to the Fail output.
  • The DI_ERRORCOLUMNS column displays all error messages for columns with failed rules. The names of input columns associated with each message are separated by colons. For example, FAILED RULE(S): C1:C2 .

Rule Violation Statistics

As shown in the following figure, the rule violation table lists all the rules and columns that failed the validation, for each record.

The field Row_ID , which is also added to the Fail table, allows you to make the link back to the original data. In this example row 1 and 2 each failed for one validation rule (validZIP and validPhone ). Row 3 failed both rules.

Using the rule violation table, it is possible to create queries and reports to show all rows and columns that failed for a particular rule and count the number of failures per rule.

Conflict of Validation Rules

Whatever the number of rules violated for a record, only one action will be executed: Either Send to Pass, Send to Fail or Send to Both.

So, what happens if several rules with different actions fail ? Send to Fail will always be the winning partner. Then Send to Both will win over Send to Pass. Send to Pass will only be executed if all the violated rules have this same action.

Validation Outcome

  • Two columns are validated:
    • Action on Failure for one column is Send to Pass
    • Action on Failure for the other is Send to Fail
  • What are the possible outcomes?

Validation Rules and Actions

Rule Validation ResultAction
The record passes both rules.The record is Sent to Pass.
The record passes the Send to Fail rule, but fails the Send to Pass rule.The record is Sent to Pass.
The record passes the Send to Pass rule, but fails the Send to Fail rule.The record is Sent to Fail.
The record fails both rules.The record is Sent to Fail.

The Validation Transform Editor

The Validation transform editor allows you to create as many validation rules as required.

Those rules can temporarily be disabled, and enabled again.

You can define substitution values for the records that you send to the Pass schema or to both schemas.

Validation Rules

Validation Rule Creation

The validation rule consists of a condition and an action on failure:

  • Use the condition to describe what is required for valid data.

    For example, specify the condition IS NOT NULL if you do not want any NULLS in data passed to the specified target.

  • Use the Action on Failure area to describe what happens to invalid or failed data.

    Continuing with the above example, for any NULL values, select the Send to Fail option, to send all NULL values to a specified FAILED target table.

A Validation Rule can be based either on a validation function or on a simple expression.

If you use a validation function, you'll have to bind its parameters to constant values or columns.

The rule considers the column used in the expression or the columns used in the function parameters as the validated columns. If the rule fails, the rule violation table will record a failed value for this column.

Some columns can be used in the function but not considered as having values failing the rule. For example, the country can be passed to the function to determine the correct zip required format. So only the zip column is considered to be incorrect here, not the country.

Validation Transform Reminders

  • Action on Failure only applies when row fails the validation rule
  • Send to Fail takes precedence over others
  • Pass output can use substituted values
  • Fail output adds two columns
  • Collect statistics to be viewed in the Management Console.

    (Disable at execution time for better performance.)

Let's Try It

Let me guide you through the use of the Validation transform:

If you want to test it by yourself, go ahead:

Log in to track your progress & complete quizzes