Auditing dataflows

Objectives

After completing this lesson, you will be able to:

  • Audit dataflows

Auditing

It is possible to collect statistics on the data that flows out of any Data Services object, such as a source, transform, or target.

These statistics can help determine if the data flow has performed as intended.

The audit will allow you to know something is wrong, but you will then have to investigate to understand what happened. It could be erroneous source data as in the previous example, or something missing in the transform configuration, or an issue with the loading process.

Setting Up Audit

To use auditing you must:

  • Define audit points: Audit points collect and store runtime statistics about the data that flows out of objects.
  • Define rules: Use runtime statistics to define rules that verify that the data at specific points in a data flow is what you expect. Verification occurs when data reaches the following movement in the data flow:
    • Extracted from sources.
    • Processed by transforms.
    • Loaded into targets.
  • Generate a runtime notification: Notification include a list of audit rules that failed and the values of the audit statistics at the time of failure.
  • Display the audit statistics: After a job executes, the audit statistics identify the object or objects in the data flow that produced incorrect data.

Audit Points and Labels

An audit point represents the object in a data flow where statistics are collected. Audit a source, a transform, or a target in a data flow.

When audit points are defined on objects in a data flow, specify an audit function. An audit function represents the audit statistic that Data Services collects for a table, output schema, or column.

The available audit functions depend on the audit point type.

Audit Functions

Choose from these audit functions:

Data ObjectFunctionDescription
Table or output schemaCountThis function collects two statistics:
  • Good count for rows that were successfully processed
  • Error count for rows that generated some type of error if you enabled error handling

The datatype for this function is integer .

ColumnSumSum of the numeric values in the column. This function only includes the good rows. This function applies to columns with a datatype of integer , decimal , double , and real .
ColumnAverageAverage of the numeric values in the column. This function only includes the good rows. This function applies to columns with a datatype of integer , decimal , double , and real .
ColumnChecksumDetects errors in the values of the column by using the Checksum value. This function applies only to columns with a datatype of varchar .

An audit label represents the unique name in the data flow that Data Services generates for audit statistics. Audit labels are collected for each defined audit function on each audit point.

Audit Labels

If the audit point is on a table or output schema, these two labels are generated for the Count Audit function:

$Count_objectname

$CountError_objectname

If the audit point is on a column, the audit label is generated with this format:

$auditfunction_objectname

Note
An audit label can become invalid if an object that had an audit point defined on it is deleted or renamed. Invalid labels are listed as a separate node on the Labels tab. To resolve the issue, re-create the labels and delete the invalid items.

Defining Audit Rules and Actions

Use auditing rules when comparing audit statistics for one object against another object. For example, use an audit rule to verify that the count of rows from the source table is equal to the rows in the target table.

An audit rule is a Boolean expression, which consists of a left-hand-side (LHS), a Boolean operator, and a right-hand-side (RHS). The LHS can be a single audit label, multiple audit labels that form an expression with one or more mathematical operators, or a function with audit labels as parameters. In addition to the LHS, the RHS can also be a constant.

Examples of Audit Rules

  • $Count_CUSTOMER = $Count_CUSTDW
  • $Sum_ORDER_US + $Sum_ORDER_EUROPE = $Sum_ORDER_DW
  • round($Avg_ORDER_TOTAL) >= 10000

Audit Actions

Choose any combination of the actions listed for notification of an audit failure:

  • E-mail to List: Data Services sends a notification of which audit rule failed to the e-mail addresses listed in this option. Use a comma to separate the list of mail addresses, or specify a variable for the mailing list.
  • Script: Data Services executes the custom script created in this option.
  • Raise exception: When an audit rule fails, the Error Log shows the rule that failed. The job stops at the first audit rule that fails.

    This action is the default. If the action is cleared and an audit rule fails, the job completes successfully and the audit does not write messages to the job log.

If all three actions are chosen, Data Services executes them in the order presented.

Note

The E-mail to List option uses the smtp_to function to send an e-mail. Define the server and sender for the Simple Mail Tool Protocol (SMTP) in the Data Services Server Manager.

Audit Status

Audit status can be viewed in one of these locations:

Action on FailurePlaces where you can view audit information
Raise an exceptionJob Error Log, Metadata Reports
E-mail to listE-mail message, Metadata Reports
ScriptWherever the custom script sends the audit messages, Metadata Reports

Tracing Audit

Auditing can be enabled or disabled in the Job Execution Properties dialog box.

To view the results of the audit in the log, you can set Trace Audit Data to Yes in the Job Execution Properties dialog box.

Let's Try It

Let me guide you through the use of the Audit feature:

Log in to track your progress & complete quizzes