Understanding the Replication Process

Objective

After completing this lesson, you will be able to Describe the replication components and technologies.

The Replication Process

Replication usually means copying data directly from one system to another in real-time. It's usually done at the table level.

There are a few reasons why you might consider replication, including:

  • You want to duplicate data into another system that has different tools and technology options. This system will allow you to create a specific application.

  • You want to distribute data to one or more targets in real-time to share information.

  • You want to migrate data from an old system to a new system.

  • You've been accessing data remotely (without persistence) but performance has become unacceptable, perhaps due to increasing data volumes.

To set up data replication using SAP HANA, it's important to have an understanding of the underlying process and the database objects associated with it.

Replication Components

To replicate a source table, it’s necessary to set up access to that table. This is done by creating a remote source and a virtual table, as described in the previous unit.

In addition, you need to create the target table to store the replicated data. The target table can either have the same structure as the source table, be a subset of the source table with fewer columns, or have variations in the column definitions, for example, to reduce long strings where not all characters are needed.

Once you've created the virtual table and the target table, you can then define the remote subscription . The target table is then subscribing to the changes made to the data accessed by the virtual table.

Depending on the method used to implement this replication, those objects (virtual table, target table, remote subscription) are either created automatically or must be created manually.

Replication Technologies

Depending on your SAP HANA system: on-premise or cloud, and depending on the type of data source you want to replicate from, different replication technologies are available.

The most common replication technologies are:

  • Log-based table replication

    Uses the database redo log to fetch changes on the source table and reproduce them. It's non-intrusive, and transactional integrity is assured because only committed transactions are replicated.

  • Trigger-based table replication

    Triggers are created in the source database to monitor the source table and capture all modified (updated or deleted) and new rows. The captured data is stored in a shadow table. A queue table is also created to record all modifications in the correct sequence. This technology is independent of the source database version and may offer more functions than log-based replication, such as replication of Large Objects (LOBs).

  • File replication

    This is the technology implemented by the FileAdapter and is used to replicate new rows in a file. Only append is supported.

Which Technology Is Used?

Depending on the source and target, and also the adapter chosen, different technologies are available as shown below:

Not all SDI adapters support real-time replication. Real-time replication is sometimes referred to as real-time Change Data Capture.

Some adapters use log-based replication. These are usually suffixed with Log.

There are adapters that use trigger-based replication, such as HANAAdapter.

A number of adapters use specific, proprietary technologies, such as FileAdapter.

A few adapters require a specific setup in the source system to implement replication features.

Note

You can find a list of adapters and their capabilities in the help documentation: SAP HANA Smart Data Integration and SAP HANA Smart Data Quality - Configuration Guide for other SAP HANA Scenarios

If you're replicating to an on-premise SAP HANA database, you must use smart data integration (SDI) to connect to the data source. But with SAP HANA Cloud database as the target for replication, you can also use smart data access (SDA) to replicate data from an SAP HANA database. The technology used is an optimized log-based replication. This type of replication is referred to as remote table replication (RTR).

Required Authorizations

In order to implement table replication, some specific privileges are required:

  • For the user specified in the remote source definition:

    Full access on the source schema.

    For example, if the source is an SAP HANA database, the user must have CREATE ANY privilege on the source schema.

  • For the user implementing the replication in the target SAP HANA database:
    • CREATE VIRTUAL TABLE and CREATE REMOTE SUBSCRIPTION on the remote source.
    • CREATE TABLE on the target schema.

Replication Implementation Steps

Here are the basic steps to implement replication:

  1. Create a remote source.
  2. Create a virtual table based on the remote source.
  3. Create the target table – this could be done before steps 1 and 2.
  4. Define the remote subscription using the virtual table as the provider and the target table as the receiver.
  5. Queue the remote subscription, which involves creating source triggers, shadow tables, and queue tables for trigger-based replication (only applicable to SDI remote sources).
  6. If using SDI remote sources, copy the initial source data into the target.
  7. Distribute the data, initiating real-time change data capture.

In the following lessons, you’ll learn how to implement these steps using the different tools of SAP HANA.

Log in to track your progress & complete quizzes