Setting Up SAP HANA System Replication

Objective

After completing this lesson, you will be able to set up SAP HANA system replication

Overview of Configuration Steps

Configuration Steps to Set up SAP HANA System Replication

  1. Start the primary system.
  2. Create an initial data backup or a storage snapshot on the primary system.
  3. Enable system replication on the primary system.
  4. Prepare the secondary system for authentication by copying the system PKI SSFS .key and the .dat file from the primary system to the secondary system.
  5. Register the secondary system and establish a connection between the secondary and primary systems.

The configuration tasks on the primary and secondary systems to set up system replication are shown in the figure Setup of System Replication. With this configuration, you can recover from a data center outage by switching to a secondary site. The primary system stays online during this procedure.

Independent of the system replication operation mode, the first data transfer action is an automatic initial data load. The second data transfer action depends on the operation mode.

The following steps are performed during the setup of system replication:

  1. The primary system is informed to enable system replication.
  2. The secondary database is stopped. Content is wiped out during the initial load, with a full data backup later during the initial start of replication.
  3. The secondary system is advised to connect to the primary system, and communicates about the attempt to start the system replication standby process.
    • This process is secured with certificates and so on.
    • Only one command is needed: HDBNSUTIL.
    • Both sides must have the same number of active and standby hosts with the same sizing (memory and CPU).
    • SAP HANA itself handles the relationships of, for example, scale-out setups on both sides (primary to secondary) and how communication is established with each counterpart.
    • Communication takes place internally between sites on TREXnet.

Note

If the primary connection between data centers is too weak for an initial data load (usually TBs), then use snapshot data backups for setting up SAP HANA system replication initialization.

Additional Configuration Steps to Enable SAP HANA System Replication

Starting with SAP HANA 2.0, additional configuration steps are required to set up SAP HANA system replication, because replication connections now use certificate-based authentication.

System replication with SAP HANA 2.0 requires authentication for the data and log shipping channels. The authentication is done using the certificates in the system PKI SSFS store. An additional manual setup step is required to exchange certificates in the system PKI SSFS store between primary and secondary sites. For more information, see SAP Note: 2369981.

Additional Configuration Steps to Enable SAP HANA System Replication

Copy the system PKI SSFS KEY and DAT files from the primary site to the secondary site. The files can be found at the following locations:

/usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT

/usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY

For more information, see SAP Note 2369981 - Required configuration steps for authentication with HANA System Replication.

Note

If you installed XS advanced, you must also copy the XSA SSFS .key and the .dat file from the primary system to the secondary system in the following directories:

/usr/sap/<SID>/SYS/global/xsa/security/ssfs/data/SSFS_<SID>.DAT

/usr/sap/<SID>/SYS/global/xsa/security/ssfs/key/SSFS_<SID>.KEY

For more information, see SAP Note 2300936 - Host Auto-Failover & System Replication Setup with SAP HANA extended application services, advanced model.

The copied files become active during system restart. Therefore, it is recommended to copy the files when the secondary SAP HANA system is offline, for example, before registration.

Enablement of SAP HANA System Replication

System replication can be set up or managed on the command line with hdbnsutil, using the SAP HANA cockpit, SAP HANA studio, or with SAP Landscape Management.

The following administration activities are possible with hdbnsutil, using the SAP HANA cockpit, or SAP HANA studio:

  • Performing the initial setup, that is, enabling system replication and establishing the connection between two identical systems.

  • Monitoring the status of system replication to ensure that both systems are in sync.

  • Triggering takeover by the secondary system in the event of a disaster and failback once the original system is available again.

  • Disabling system replication.

Enable SAP HANA System Replication Using SAP HANA Cockpit

There are two ways to set up SAP HANA system replication in the SAP HANA cockpit:

  • Enable the primary system and then register the secondary system from the primary system in one configuration step.

  • Enable system replication on the primary system and then register the secondary system in a second step.

The steps to configure the primary and the secondary system using SAP HANA cockpit are outlined in the figure, Enable System Replication.

SAP HANA Cockpit: In the systemdb - database overview page, to find the app, search for 'System Replication'. In the system replication app, to start the wizard, choose the Configure System Replication button.Configure System Replication

You have enabled system replication and registered the secondary system with the primary system. The secondary system operates in recovery mode. All secondary system services constantly communicate with their primary counterparts, replicate and persist data and logs, and load data to memory. However, the secondary system does not accept SQL connections.

To set up SAP HANA system replication between two identical SAP HANA systems, you must first enable system replication on the primary system and then register the secondary system.

Enable SAP HANA System Replication with hdbnsutil

It is also possible to configure SAP HANA system replication with the command line tool hdbnsutil as <sid>adm at the OS level. The command line tool can be a part of a script, which executes further steps beyond system replication.

Enable SAP HANA System Replication with hdbnsutil

  1. Create a data backup of the primary system.

  2. Enable the primary system and give the primary system a logical name:

    hdbnsutil -sr_enable --name=PRIMARY

  3. Stop the secondary system:

    sapcontrol –nr <instance_number> -function StopSystem HDB

  4. Register the secondary system (choose replication mode and operation mode):

    Code Snippet
    12345
    hdbnsutil -sr_register --remoteHost=<primary hostname> --remoteInstance=<instance number> --replicationMode=<sync|syncmem|async> --operationMode=<delta_datashipping|logreplay> --name=SECONDARY

  5. Start the secondary system to start replication:

    sapcontrol –nr <instance_number> -function StartSystem HDB

Once the secondary system is started, the replication process starts automatically.

Enable the Full Sync Option for SAP HANA System Replication

When activated, the full sync option for SAP HANA system replication ensures that a log buffer is shipped to the secondary system before a commit takes place on the local primary system.

Full Sync Option for SAP HANA System Replication

  • Switch the full sync option on and off:

    hdbnsutil -sr_fullsync --enable|--disable

  • Check the setting of the full sync option:

    Use SQL to display the column "FULL_SYNC" of view M_SERVICE_REPLICATION. The full sync option can have the following values:

    • DISABLED: Full sync is not configured at all

    • ENABLED: Full sync is configured, but it is not yet active

    • ACTIVE: Full sync mode is configured and active

The full sync option can be enabled for SYNC replication (that is, not for SYNCMEM). With the full sync option activated, transaction processing occurs on the primary blocks. If the secondary system is not currently connected, the newly created log buffers cannot be shipped to the secondary site. This behavior ensures that no transaction can be locally committed without shipping the log buffers to the secondary site. The full sync option can be switched on and off using the command: hdbnsutil -sr_fullsync --enable|--disable

This command changes the setting of the enable_full_sync parameter in the system_replication section of the global.ini file accordingly. However, in a running system, full sync does not become active immediately. This is done to prevent the system from blocking transactions immediately when setting the parameter to true. Instead, full sync has to first be enabled by the administrator. In a second step, it is internally activated when the secondary is connected and becomes ACTIVE.

In the M_SERVICE_REPLICATION system view, the setting of the full sync option can be viewed in using SQL.

The full sync option can have the following values:

  • DISABLED: Full sync is not configured at all. The parameter is enable_full_sync = false in the system_replication section of the global.ini file.

  • ENABLED: Full sync is configured, but it is not yet active, so transactions do not block in this state. To become active, the secondary has to connect and REPLICATION_STATUS must be ACTIVE.

  • ACTIVE: Full sync mode is configured and active. If the network connection to a connected secondary is closed, transactions on the primary side block in this state.

If full sync is enabled when an active secondary is currently connected, FULL_SYNC is immediately set to ACTIVE.

Caution

If the secondary is stopped, disable FULL_SYNC. Otherwise, the primary blocks and it is not possible to stop it.

Note

Resolving a blocking situation of the primary caused by the enabled full sync option must be done with the hdbnsutil command, because a configuration changing command could also block in this state. This is also necessary if you want to shut down the currently blocking primary. Otherwise, it is not possible to stop it.

Compression Methods for Log and Data Shipping

SAP HANA system replication supports a number of compression methods for log and data shipping.

The following types of compression for log and data shipping are supported:

Log

Log buffer tail compression (by default)

Log buffer content compression

Data

Data page compression

Log buffer tail compression is turned on by default. All log buffers are aligned to 4 KB boundaries by a filler entry. With log buffer tail compression, the filler entry is cut off from the buffer before sending it over the network and added again when the buffer has reached the secondary site. So only the net buffer size is transferred to the secondary site.

The size of the filler entry is less than 4 KB. This is the maximum size reduction per sent log buffer. If the log buffers size is quite large, the compression ratio is quite limited.

Log buffer and page content compression can be activated by parameter settings.

Log buffers and data pages shipped to the secondary site can be compressed using a lossless compression algorithm (LZ4). By default, content compression is turned off. You can turn it on by setting the following configuration parameters on the secondary site in the system_replication section of the global.ini file.

Configuration Parameters to Activate Compression

  • Enable compression of a log when it is sent to the secondary site:

    enable_log_compression = true

  • Enable compression of data when it is sent to the secondary site:

    enable_data_compression = true

Note

After changing these parameters, the secondary site needs to be reconnected to the primary site.

Log and data compression is especially useful when system replication is used over long distances, for example, using the ASYNC replication mode.

The open source compression algorithm LZ4 has been selected because of its speed and compression ratios, and the relatively low time overhead introduced for compression/decompression. Log buffer content compression also works in combination with log buffer tail compression. Therefore, only the content part of the log buffer is compressed, without considering the filler entry.

The activation of the compression reduces the required network bandwidth, but at the same time there is some CPU overhead for compressing and decompressing the information. Using compression is particularly useful in the case of long distances between primary and secondary sites or in the case of bandwidth limitations.

Checking and Monitoring of SAP HANA System Replication

After setting up the secondary system for system replication, you can monitor the status of the replication between the primary and the secondary system using the following tools:

  • SAP HANA Cockpit

  • SAP HANA Studio

  • hdbnsutil

The current status of system replication can be checked with all of these tools.

The system replication status values are highlighted. The text explains the system replication status values details.

System Replication Status

StatusDescription
UnknownSecondary did not connect to primary since last restart of the primary.
InitializingInitial data transfer in progress. In this state, the secondary is not usable at all.
SyncingSecondary is syncing again (for example, after a temporary connection loss or restart of the secondary).
ActiveInitialization or sync with primary is complete and secondary is continuously replicating. No data loss will occur in SYNC mode.
ErrorError occurred on the connection.

Monitoring System Replication with SAP HANA Cockpit

To monitor SAP HANA system replication, you can use the System Replication tile in the SAP HANA Cockpit.

If system replication is configured, the System Replication tile provides information about the type of landscape (2-tier or 3-tier), the replication mode between the primary and the tier-2 secondary, the operation mode, and the overall replication status.

The System Replication tile displays the following states at a glance:

  • Not configured (meaning system replication is not configured)

  • All services are active and in sync

  • All services are active, but not yet in sync

  • Errors in replication

To check the status of replication in detail, choose the System Replication tile. The System Replication overview screen displays a graphical representation of the system replication landscape, configuration, and status. At the top, the "chain" of systems with their replication modes is shown, containing further information about the sites and the network connections between them.

The System Replication screen provides the following information:

  • The name and role of the system, as well as the selected operation mode.

    For the operation modes logreplay and logreplay_readaccess, a retention time estimation is also displayed. This is an estimation of the time left before the primary system starts to overwrite the RetainedFree marked log segments, and a full data shipping becomes necessary to get the primary and secondary systems back in sync after a disconnect situation. The estimated log full time is an estimation of the time left before the primary system runs into a log full. The value shown in the header shows the situation into which the system could run first: log retention or log full.

  • If the SQL ports of the secondary system are open for read access.

  • The replication mode used between the systems.

  • The current average redo log shipping time and the average size of shipped redo log buffers.

    This describes how long it took on average to send redo log buffers to the secondary site, based on measurements over the last 24 hours.

On the primary system, the system replication tile is highlighted, showing the primary and secondary system status. The system replication overview app shows the detailed status per service.

In addition, detailed information on system replication is provided in the tabs shown in the figure, Details for the Status of a Specific Service.

Overview System Replication Tabs

Replicated Services
The Replicated Services tab provides information on the replication status per site and service.
Network

The Network tab provides information on the time it took to ship the redo log to the secondary system and to write the redo log to the local log volume on disk.

You can select the network connection that you want to analyze (for example, Network Site 1 to 2 or Network Site 2 to 3). The graph displayed compares the local write wait time with the remote write wait time monitored over the last 24 hours.

Log Replay

The Log Replay tab provides a graphical representation on the delay of the secondary system. This tab is displayed if the chosen operation mode for the system replication landscape is logreplay or logreplay_readaccess.

When this tab is activated for a secondary system, the log replay delay is shown for the last 24 hours.

Furthermore, in this tab you can select to visualize the estimated log retention time as well as the estimated log full time for all system replication relevant services.

Network Speed Check
The Network Speed Check tab provides a way to measure the network speed of the system replication host-to-host network channel mappings.
Network Security Settings
The Network Security Settings tab displays the specific network security details configured between the primary and the secondary systems.

Selecting one row in the Replicated Services tab shows the details for the corresponding service grouped thematically, as in the following example for the index server. Because this information is context-sensitive, you only see the information required for this system. Therefore, because this example system is running in the logreplay operation mode, no information on delta data shipping is shown here. However, the context-sensitive information about the log replay delay is displayed. The delta between Last Log Position and Replayed Log Position indicates how far the log replay is behind on the secondary.

Selecting a specific service shows the log position, savepoint, full data replica, and backlog details.

SAP HANA Cockpit for Secondary Management

The SAP HANA Cockpit distinguishes between a primary and a secondary system. On the SAP HANA Cockpit of the secondary system, the System Replication tile provides an initial overview of this site’s state. From the System Replication Overview, you can initiate a takeover.

On the secondary system, the system replication tile is highlighted, showing only the secondary system status. The system replication overview app shows the takeover button.

Monitoring System Replication with Command Line Tools and Scripts

Command Line Tools and Scripts to Monitor System Replication

  • hdbnsutil -sr_state

    Checks if the primary and the secondary sites have been successfully enabled for system replication.

  • landscapeHostConfiguration.py

    Checks the overall status of the primary system.

  • systemReplicationStatus.py

    Checks the overall status of the system replication.

The Python scripts are located in the directory $DIR_INSTANCE/exe/python_support.

Command: hdbnsutil -sr_state

Primary Site

Code Snippet
1234567891011121314151617181920212223242526
h10adm@wdflbmt7346:/> hdbnsutil -sr_state checking for active or inactive nameserver ... System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: primary operation mode: primary site id: 1 site name: PrimarySite is source system: true is secondary/consumer system: false has secondaries/consumers attached: true is a takeover active: false Host Mappings: ~~~~~~~~~~~~~~ wdflbmt7346 -> [SecondarySite] wdflbmt7347 wdflbmt7346 -> [PrimarySite] wdflbmt7346 done.

Secondary Site

Code Snippet
1234567891011121314151617181920212223242526272829
h10adm@wdflbmt7347:/> hdbnsutil -sr_state checking for active or inactive nameserver ... System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: syncmem operation mode: logreplay site id: 2 site name: SecondarySite is source system: false is secondary/consumer system: true has secondaries/consumers attached: false is a takeover active: false active primary site: 1 Host Mappings: ~~~~~~~~~~~~~~ wdflbmt7347 -> [SecondarySite] wdflbmt7347 wdflbmt7347 -> [PrimarySite] wdflbmt7346 primary masters:wdflbmt7346 done.

Script: landscapeHostConfiguration.py

You can also gather information about the overall status of the sites and the system replication using Python scripts.

The landcapeHostConfiguration.py script shows the status of the primary system:

  • SAP HANA is OK.

  • SAP HANA will be OK after a host auto-failover, for example.

  • Not enough instances are started and a takeover would be useful.

Note

The script does not tell you if the secondary system is ready for a takeover.

The script provides an overall status and a return code to match the overall host status.

A takeover is only recommended when the return code from the script is 1 (error).

Example:

Code Snippet
1234567
<sid>adm># python $DIR_INSTANCE/exe/python_support/landscapeHostConfiguration.py | Host | Host | Host | ... | NameServer | NameServer | ... | | Active | Status | | Config Role| Actual Role | | ----- | ------ | ------ | --------- | ---------- | ----------- | ------ | host1 | yes | ok | ... | master 1 | master | ... | host2 | yes | ok | ... | master 2 | slave | ... overall host status: ok

The following host states are possible:

  • OK: System is OK.

  • WARNING: A host auto-failover to a standby host is taking place.

  • INFORMATION: The landscape is completely functional, but the current (actual) role of the host differs from the configured role.

  • ERROR: There are not enough active hosts.

Script: systemReplicationStatus.py

The systemReplicationStatus.py script shows the status of system replication.

Using systemReplicationStatus.py has the advantage of showing whether the secondary systems are in sync or not. This provides more confidence if a takeover is justified because if system replication was never in sync or is outdated, unexpected loss of data might occur.

Example:

Code Snippet
123456789101112131415161718
h10adm@wdflbmt7346:/> python $DIR_INSTANCE/exe/python_support/systemReplicationStatus.py | Database | Host | Service Name | Site Name | Secondary | Secondary | Replication | | | | | | Host | Site Name | Status | | -------- | ----------- | ------------ | ----------- | ----------- | ------------- | ----------- | | SYSTEMDB | wdflbmt7346 | nameserver | PrimarySite | wdflbmt7347 | SecondarySite | ACTIVE | | H10 | wdflbmt7346 | xsengine | PrimarySite | wdflbmt7347 | SecondarySite | ACTIVE | | H10 | wdflbmt7346 | indexserver | PrimarySite | wdflbmt7347 | SecondarySite | ACTIVE | status system replication site "2": ACTIVE overall system replication status: ACTIVE Local System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mode: PRIMARY site id: 1 site name: PrimarySite

The additional parameter systemReplicationStatus.py --localhost restricts the execution of the python script to the host on which is it executed.

The script provides the following return codes:

  • 10: No System Replication

  • 11: Error

  • 12: Unknown

  • 13: Initializing

  • 14: Syncing

  • 15: Active

Monitoring System Replication Using SQL Statements

You can also get system replication-specific information directly from system views.

System Views Providing Information About System Replication

  • M_SERVICE_REPLICATION

    Collects the history of data and log replication every hour.

  • M_SYSTEM_REPLICATION

    Provides general system replication-relevant information about the whole system.

Note

A set of complex SQL statements is available in SAP Note: 1969700 - SQL Statement Collection for SAP HANA. The section ReplicationSystem Replication includes some system replication-relevant statements. The Overview script provides information about the system replication landscape and the replication state for each service.

Monitoring System Replication Alerts

Specific alerts are issued by the primary system to warn you of potential problems.

System Replication Alerts

  • System Replication Connection Closed (Alert ID 78)

  • System Replication Configuration Parameter Mismatch (Alert ID 79)

  • System Replication Logreplay Backlog (Alert ID 94)

  • System Replication Increased Log Shipping Backlog (Alert ID 104)

The Connection Closed and Configuration Parameter Mismatch alerts are raised when a system replication connection is closed, or when there is a system replication configuration parameter mismatch.

The Logreplay Backlog alert is raised when the system replication logreplay backlog is increased. In this case, logreplay is delayed on the secondary site, causing a longer takeover time.

To identify the reason for the increased system replication logreplay backlog, check the state of the services on the secondary system. To get more information, monitor the secondary site. Possible causes for the increased system replication logreplay backlog can be, for example, a slow or non-functioning log replay, or a non-running service on the secondary system.

The Increased Log Shipping Backlog alert is raised when the system replication log shipping backlog is increased. In this case, the log shipping to the secondary system is delayed or does not work properly, causing data loss on the secondary system in the case where a takeover is executed.

To identify the reason for the increased system replication log shipping backlog, check the status of the secondary system. Possible causes for the increased system replication log shipping backlog can be a slow network performance, connection problems, or other internal issues (for example, in the sync or syncmem replication modes).

Monitoring INI File Parameter Changes

Database parameters should be the same in the primary and secondary systems and are checked automatically. The configuration parameter checker reports on any differences between primary, secondary, and tier 3 secondary systems. In such a case, the parameter checker generates an alert.

With parameter replication activated, any changes made on the primary are automatically replicated to the secondary sites. Without this parameter replication activated, changes should be manually duplicated on the other system.

Parameter replication is off by default. It can be enabled and disabled on the primary site by using the following parameter:

[inifile_checker]/replicate = true | false

The parameter checker is on by default. It can be enabled and disabled on the primary site by using the following parameter:

[inifile_checker]/enable = true | false

Some parameters may have different settings on the primary and the secondary sites on purpose. One example is the global_allocation_limit parameter, where the secondary is used for other systems. By adding these parameters to the exclusion list, you can exclude them from checking.

Log in to track your progress & complete quizzes