CDS Extraction with Generic Delta
The delta based on the date/time stamp is sometimes also referred to as generic delta and the concept is similar to the classic ABAP extractors of the past. Generic delta in CDS has been available since the SAP S/4HANA 1809 on-premise release: It relies on a date/time element being present in the CDS view, reflecting the changes of the underlying records. You can use the following field types as delta criteria:
- Date (ABAP type DATS)
- UTC Time Stamp
When extracting data through this framework, the system stores the Transaction Sequence Numbers (TSN) of the time of extraction (not the highest value of the source). Only records for which the delta field contains a higher timestamp are delivered. You can check the tables ODQSSN or ODQQUEDES for the stored TSNs. Records may have an earlier timestamp, because they are updated to the leading table after the extraction with some delay. In this case you must specify a (lower) safety interval, similar to the generic datasources. Therefore, you add an annotation.
Applying a time stamp is the preferred way for delta extraction. If no appropriate time stamp is available in the application tables/CDS view, a date field can be used as well as an alternative.
Note
It is not recommended to choose a date field (see SAP Note 730373), but instead it is recommended to use a timestamp. Checks are done every 15 seconds.
The following annotations define the details of this delta approach:
- @Analytics.dataExtraction.delta.byElement.name: Application developers can enable the generic delta extraction with this annotation. This element defines the field to be used for filtering during generic delta extraction.
- @Analytics.dataExtraction.delta.byElement.maxDelayInSeconds: There is always a time delay between taking a UTC timestamp and the database commitment. This annotation specifies the maximum possible delay in seconds. If you do not add this annotation, a default delay of 1800 seconds, that is, half an hour, is applied. This mean that only records with a timestamp older than 1800 seconds (= 30 minutes) are extracted in the current run. These ignored records will be extracted during the next delta run. This safety interval allows for the accurate capture of delta in a situation when records are stored with some delay. Make sure that the value is high enough. If you specify a delay of 10 seconds, but storage may be delayed by 60 seconds, the system would miss such records. In the current run the record is not yet stored, and in the following run, the system no longer checks for this timestamp.
- @Analytics.dataExtraction.delta.byElement.detectDeletedRecords: By using this annotation, the system will remember all key combinations of the view that were extracted in delta mode. If a key combination does not occur in the view anymore, this will automatically generate a delete image in the extracted data.
However, if you archive the data, this might appear as a deletion to the CDS-based ODP extraction framework. In data warehouse scenarios, you usually do not want these archiving deletions on the source system side propagated as "deletions" in your data warehouse. If database records in your source system were archived after one year, you can specify an additional annotation.
- @Analytics.dataExtraction.delta.byElement.ignoreDeletionAfterDays: Referring to the annotation before, the extraction will ignore deleted records if they are older than the specified number of days.
Note
Recommendations:
- Data records with an empty delta field (for example, ChangeDate) are only extracted during a "Delta Init with data" (first DTP execution). During regular delta loads, such records are ignored. Ensure that the delta field is filled by the application.
- We recommend using only persisted timestamp or date fields and that you refrain from using virtually derived or calculated fields in CDS views. This can lead to severe performance penalties.
- The detectDeletions annotation is only feasible for low-level volumes of data (fewer than 1,000,000 data records) and must not be used for high volume scenarios.
Using a delta field, the ODP framework determines up to which record a data consumer has already extracted records. On a subsequent extraction request, only records with a higher date/time stamp are collected for extraction. For a real-time delta subscription by a streaming process chain, the real-time daemon checks for new records in the CDS views approximately every 15 seconds by default. For non\u0002 real-time delta subscriptions, new records according to the delta criterion are directly pulled from the CDS view during extraction.
As a safeguarding measure, a safety interval can be specified. This interval can accommodate technical delays like waiting for a database commit on the application system side. A record with a timestamp falling in this time safety interval will be selected twice from the CDS view. Once with extraction run 1, and once with extraction run 2. The ODP framework stores key data and hashes of all records belonging to the safety interval for finding changed records.
In the extraction run 1, records belonging to the time interval Start timestamp of Extraction run 1 – maxDelayInseconds are stored for later comparison. In the subsequent extraction run 2, records belonging to this time interval are selected again and compared against the formerly saved records/hashes of extraction run 1. Only records with changed hashes belonging to the safety interval will be extracted again to reflect the changes in the target system.
Previously extracted records that have not changed within the safety interval are not extracted again. With the annotations so far, you get newly created records and updates to existing records, but no deletions. The annotation @Analytics.dataExtraction.delta.byElement.detectDeletions enables the view to detect deleted records as part of the generic delta mechanism. When you include this annotation, all record key combinations being extracted are stored in a separate data storage. To identify deletions, all records of this data storage are compared against all records still available in the CDS view during each extraction run. Records that are not available anymore in the view are sent to the consuming clients as deletions.
Note that this concept is only feasible for low-level volumes of data (~ fewer than 1,000,000 data records) and must not be used for high-volume data applications. Hence, this mechanism is applicable for small master data and text extractions. Another annotation is available to reduce the time frame of which records are considered for the deletion comparison. This means you have a trailing limit and only the set of extracted records falling into this time frame are compared against the currently available records in the CDS view.