It is possible to collect statistics on the data that flows out of any Data Services object, such as a source, transform, or target.
These statistics can help determine if the data flow has performed as intended.
The audit will allow you to know something is wrong, but you will then have to investigate to understand what happened. It could be erroneous source data as in the previous example, or something missing in the transform configuration, or an issue with the loading process.
Setting Up Audit
To use auditing you must:
- Define audit points: Audit points collect and store runtime statistics about the data that flows out of objects.
- Define rules: Use runtime statistics to define rules that verify that the data at specific points in a data flow is what you expect. Verification occurs when data reaches the following movement in the data flow:
- Extracted from sources.
- Processed by transforms.
- Loaded into targets.
- Generate a runtime notification: Notification include a list of audit rules that failed and the values of the audit statistics at the time of failure.
- Display the audit statistics: After a job executes, the audit statistics identify the object or objects in the data flow that produced incorrect data.
Audit Points and Labels
An audit point represents the object in a data flow where statistics are collected. Audit a source, a transform, or a target in a data flow.
When audit points are defined on objects in a data flow, specify an audit function. An audit function represents the audit statistic that Data Services collects for a table, output schema, or column.
The available audit functions depend on the audit point type.
Choose from these audit functions:
|Table or output schema
|This function collects two statistics:
The datatype for this function is integer .
|Sum of the numeric values in the column. This function only includes the good rows. This function applies to columns with a datatype of integer , decimal , double , and real .
|Average of the numeric values in the column. This function only includes the good rows. This function applies to columns with a datatype of integer , decimal , double , and real .
|Detects errors in the values of the column by using the Checksum value. This function applies only to columns with a datatype of varchar .
An audit label represents the unique name in the data flow that Data Services generates for audit statistics. Audit labels are collected for each defined audit function on each audit point.
If the audit point is on a table or output schema, these two labels are generated for the Count Audit function:
If the audit point is on a column, the audit label is generated with this format:
Defining Audit Rules and Actions
Use auditing rules when comparing audit statistics for one object against another object. For example, use an audit rule to verify that the count of rows from the source table is equal to the rows in the target table.
An audit rule is a Boolean expression, which consists of a left-hand-side (LHS), a Boolean operator, and a right-hand-side (RHS). The LHS can be a single audit label, multiple audit labels that form an expression with one or more mathematical operators, or a function with audit labels as parameters. In addition to the LHS, the RHS can also be a constant.
Examples of Audit Rules
- $Count_CUSTOMER = $Count_CUSTDW
- $Sum_ORDER_US + $Sum_ORDER_EUROPE = $Sum_ORDER_DW
- round($Avg_ORDER_TOTAL) >= 10000
Choose any combination of the actions listed for notification of an audit failure:
- E-mail to List: Data Services sends a notification of which audit rule failed to the e-mail addresses listed in this option. Use a comma to separate the list of mail addresses, or specify a variable for the mailing list.
- Script: Data Services executes the custom script created in this option.
- Raise exception: When an audit rule fails, the Error Log shows the rule that failed. The job stops at the first audit rule that fails.
This action is the default. If the action is cleared and an audit rule fails, the job completes successfully and the audit does not write messages to the job log.
If all three actions are chosen, Data Services executes them in the order presented.
The E-mail to List option uses the smtp_to function to send an e-mail. Define the server and sender for the Simple Mail Tool Protocol (SMTP) in the Data Services Server Manager.
Audit status can be viewed in one of these locations:
|Action on Failure
|Places where you can view audit information
|Raise an exception
|Job Error Log, Metadata Reports
|E-mail to list
|E-mail message, Metadata Reports
|Wherever the custom script sends the audit messages, Metadata Reports
Auditing can be enabled or disabled in the Job Execution Properties dialog box.
To view the results of the audit in the log, you can set Trace Audit Data to Yes in the Job Execution Properties dialog box.
Let's Try It
Let me guide you through the use of the Audit feature: