Using automatic recovery mode

If a job with automated recovery enabled fails during execution, you can execute the job again in recovery mode. During recovery mode, Data Services retrieves the results for successfully–completed steps and only reruns incomplete or failed steps under the same conditions as the original job.

Enabling Recovery

Recovery is not enabled by default on job execution. You need to select the option.

When the option is enabled, Data Services records all the completed and failed tasks.

If the job fails, another option is then available for the next job execution: Recover from last failed execution. The option is selected by default and the job is then running in recovery mode.

Recovery mode

In recovery mode, Data Services executes the steps or recovery units that did not complete successfully in a previous execution. This includes steps that failed and steps that generated an exception but completed successfully, such as those in a try/catch block. As in normal job execution, Data Services executes the steps in parallel if they are not connected in the work flow diagrams and in serial if they are connected.

Let me explain with an example:

Recovery Requirements

In our example, to ensure that the fact tables are loaded with the data that corresponds properly to the data already loaded in the dimension tables:

The recovery job must use the same extraction criteria as the original job for the recovered data flow.
If the recovery job uses new extraction criteria, such as basing data extraction on the current system date, the data in the tables will not correspond to the data previously extracted into the completed data flow. If the recovery job uses new values, the job execution may follow a completely different path with conditional steps or try/catch blocks.
The recovery job must follow the exact same execution path as the original job.
Data Services records any external inputs to the original job so that the recovery job can use these stored values and follow the same execution path.

Using Try/Catch Blocks for Recovery

Data Services does not save the result of a try/catch block for re–use during recovery. If an exception is thrown inside a try/catch block, during recovery, Data Services executes the step that threw the exception and subsequent steps.

Since the execution path with the try/catch block might be different in the recovered job, using variables set in the try/catch block could alter the results during automatic recovery.

For example, suppose that you create a job that defines the value of variable $i within a try/catch block. If an exception occurs, set an alternate value for $i. Subsequent steps are based on the new value of $i.

During the first job execution, the first work flow contains an error that generates an exception, which is caught, and then sets the variable to 0. But the job goes on with the next step and executes the workflow for $i>1. However, the job fails in the subsequent work flow as shown in the following figure.

After fixing the error, you run the job in recovery mode. During the recovery execution, the first work flow no longer generates the exception. Thus the value of variable $i is different, and the job selects a different subsequent work flow, producing different results as shown in the following figure.

Recovery Units

In some cases, steps in a work flow depend on each other and are executed together. When there is a dependency, designate the work flow as a recovery unit. This requires the entire work flow to complete successfully. If the work flow does not complete successfully, Data Services executes the entire work flow during recovery, including the steps that executed successfully in prior work flow runs.

Conversely, you may need to specify that a work flow or data flow should only execute once. When this setting is enabled, the job never re–executes the object.

Caution

We do not recommend marking a work flow or data flow as Execute only once if the parent work flow is a recovery unit.