The first solution to manage error situations is to avoid them. Some error situations can easily be sidestepped by constructing jobs so that they take into account the issues that frequently cause them to fail.
But some situations are unavoidable, such as server failures. You must then design your data flows to recover from the failed situation.
You can avoid errors in your data flows by:
- Using conditions so that the data flow only executes when the context is correct.
- Using a loop to start over until the context is correct.
Don't forget to add a way out in case situation is never reached!
One example is when an external file is required to run a job. In this situation, use the wait_for_file function or a while loop and the file_exists function to check that the file exists in a specified location before executing the job.
The while loop is a single–use object that is used in a work flow. The while loop repeats a sequence of steps as long as a condition is true.
Typically, the steps done during the while loop result in a change in the condition so that the condition is eventually no longer satisfied and the work flow exits from the while loop. If the condition does not change, the while loop does not end.
For example, you might want a work flow to wait until the system writes a particular file. Use a while loop to check for the existence of the file using the file_exists function. As long as the file does not exist, the work flow can go into sleep mode for a particular length of time before checking again.
As the system might never write the file, add another check to the loop, such as a counter, to ensure that the while loop eventually exits. In other words, change the while loop to check for the existence of the file and the value of the counter. As long as the file does not exist and the counter is less than a particular value, repeat the while loop. In each iteration of the loop, put the work flow in sleep mode and then increment the counter.
Levels of Data Recovery Strategies
Some errors cannot be anticipated and will cause the job to fail.
Executing a failed job again may result in duplication of rows that were loaded successfully during the first job run.
You then have to find a way of recovering your jobs so that the database is in a correct state and you can go on with the schedule.
- Recover entire database
Use the standard RDBMS services to restore crashed data cache to an entire database. This option is outside of the scope of this course.
- Use transactions:
Setup your data flow to be encased in a transaction so that tables stay in a consistent state.
- Recover from partially loaded tables:
Avoid duplicate loading of rows.
- Recover missing values or rows:
Identify missing values and manage rows that could not be inserted.
- Manage exceptions:
Ensure that all exceptions are managed in a work flow.
Depending on the relationships between data flows in your application, you may use a combination of these techniques to recover from errors.
Setting Up Transactions
If your data flow needs to load a large amount of data or if it loads dependent tables, for example sales order and sales order item tables, you don't want rows to be loaded if some others are not.
To improve performance and use of resources, SAP Data Services by default loads tables in several transactions. You can set the "Rows per commit" option to determine how many rows will be processed in each transaction.
But, then, if your data flow fails, some rows might have already been committed in the table when the job stops.
To avoid this situation, you can ask Data Services to include the whole data flow into a single transaction.
Partially Loaded Data
Several methods can be used to ensure that duplicate rows are not inserted:
- To avoid inserting duplicates, use one of the following options:
- Include the Table Comparison transform in the data flow to only insert missing rows when the table has more rows and fewer fields, such as fact tables.
- Select the Delete data from table before loading or Drop and re-create table option in the target table to replace data during each execution. This technique can be optimal when the changes to the target table are numerous compared to the size of the table.
- Change the target table options to use the auto-correct load feature when a table has fewer rows and more fields, such as dimension tables.
- Include an SQL command to remove previous partial updates before the table loads.
The auto-correct load checks the target table for existing rows before adding new rows to the table. Using the auto-correct load option, however, can slow jobs. Consider this technique when the target table is large and the changes to the table are relatively few.
Preload SQL commands can remove partial database updates that occur during incomplete execution of a step in a job. Typically, the preload SQL command deletes rows based on a variable that is set before the partial insertion step began.
Missing Values or Rows
- Handle missing values with Validation or Query transform
- Handle missing rows with the Use overflow file options.
Missing rows are rows that cannot be inserted into the target table. For example, rows may be missing in instances where a primary key constraint is violated. Overflow files help to process this type of data problem.
When you specify an overflow file and Data Services cannot load a row into a table, Data Services writes the row to the overflow file instead. The trace log indicates the data flow in which the load failed and the location of the file. Use the overflow information to identify invalid data in the source or problems introduced in the data movement. Every new run overwrites the existing overflow file.
When using overflow files, you need to enter a file name and a file format.
For the file format, you have two options:
- If you select Write data , you can use Data Services to read the data from the overflow file, cleanse it, and load it into the target table.
- If you select Write sql , you can use the SQL commands to load the target manually when the target is accessible.