During a takeover, you switch your active system from the current primary system to the secondary system.
If your primary data center is not available, due to a disaster or for planned downtime for example, and a decision has been made to fail over to the secondary data center, you can perform a takeover on your secondary system.
In addition to the tools that may be used to monitor the overall system status when system replication is enabled, a script is provided with SAP HANA, which helps you decide when a takeover should be performed.
We recommend that you use third-party, external tools to check if hosts, the network, and the data center are still available.
In addition, a script called landscapeHostConfiguration.py is provided so that SAP HANA itself can communicate the status of the primary system. It can communicate the following statuses:
SAP HANA is OK.
SAP HANA will be OK after a host auto-failover, for example.
Not enough instances are started and a takeover would be useful.
A takeover is only recommended when the return code from the script is 1 (error).
Note
The script does not tell you if the secondary system is ready for a takeover.
If a takeover occurs, the secondary site finds the latest savepoint in the data disk area. This is the starting point for a usual database restart, but many large data packages (main indexes) are preloaded in-memory, as on the primary data center before takeover. This supports the restart considerably. Based on this initial savepoint on the secondary data center, the log replay can start and roll the database forward to the latest point in time.

The following decision guideline can help you decide if a takeover is advisable.
Takeover Decision Guideline
There are three main questions involved in deciding whether or not a takeover will improve the situation.
- Can a takeover help at all?
No: Do not perform a takeover.
Yes: Proceed to question 2.
- Can a takeover reduce the downtime duration?
No: Do not perform a takeover.
Yes: Proceed to question 3.
- Can it be guaranteed that no data loss will result from the takeover?
No: Evaluate the risk of data loss in the case of a takeover against that of data loss in case of no takeover, and against the impact of a longer downtime to bring back the primary site instead.
Yes: Perform a takeover.
Note
For more information on how to answer these questions, see SAP Note: 2063657.
You can use the getTakeoverRecommendation.py script to get takeover recommendations.
Takeover Recommendations are Given by the Script:
getTakeoverRecommendation.py
Evaluates the status returned by the Python scripts:
- landscapeHostConfiguration.py
- systemReplicationStatus.py
These three possible states are returned:
Takeover required
Not decidable
Possible
When the getTakeoverRecommendation script is called, it shows the takeover recommendation based on the current system state. However, when the primary system faces any error situation, the system replication status can no longer be determined. Therefore, the previous state should be saved and compared against the current state.
Example
Primary Site
This is a sample implementation of a python script that uses getTakeoverRecommendation to act as a minimalist cluster manager:
1234567891011121314import time
import subprocess
from getTakeoverRecommendation import TakeoverDecision
def main():
wasInSync = False
while True:
recommendation =
subprocess.call(["python","getTakeoverRecommendation.py","--sapcontrol=1"])
if not wasInSync and recommendation is TakeoverDecision.Required:
print "Primary defect & no sync => NO TAKEOVER"
if wasInSync and recommendation is TakeoverDecision.Required:
print "Primary defect & sync => TAKEOVER"
nowInSync = recommendation is TakeoverDecision.Possible
wasInSync = nowInSync
The output depends on the previous state with the result of the current call of getTakeoverRecommendation. If no sync state is reached, a takeover is not advised. But once the systems are in sync, the next error of the primary system will suggest a takeover. Any subsequent negative return value will reset the sync state, as it is no longer ensured that the replicated data is current.
Tools for Performing a Takeover
The takeover can be triggered using the following tools:
The SAP HANA Cockpit
SAP HANA Studio
hdbnsutil
The following steps are performed:
Trigger a takeover to the secondary system in the event of a disaster.
Register the former primary system as a new secondary when it becomes available again.

Command Line Tool hdbnsutil
- Perform a takeover on the secondary site:Code Snippet1hdbnsutil –sr_takeover
- When the former primary site is available again it can be registered as the new secondary site:Code Snippet12345hdbnsutil -sr_register --remoteHost=<new primary hostname> --remoteInstance=<instance number> --replicationMode=<sync/syncmem/async> --operationMode=<delta_datashipping|logreplay> --name=<siteName>
Note
External cluster management software can be used to perform the client reconnect after takeover. Some of SAP’s hardware partners offer an integration of SAP HANA high availability in their cluster management solutions.
Client Connection Recovery
To perform the takeover only on the SAP HANA system in most cases is not enough. Somehow, the client or application server needs to be able to continuously reach the SAP HANA system, no matter which site is currently the primary.
Methods for Client Connection Recovery
IP redirection
A virtual IP address is assigned to the virtual host name. In the case of a takeover, the virtual IP unbinds from the network adapter of the primary system and binds to the network adapter of the secondary system.
DNS redirection
In this scenario, the IP for the host name in the DNS is changed from the address of the primary system to the address of the secondary system.
Both methods have their advantages, but the method is mostly decided by IT policies and the existing configuration. If there are no existing constraints, IP redirection has the clear benefit of being faster to process in a script rather than synchronizing changes of DNS entries over a global network.
SAP HANA offers the so-called "HA/DR providers" that are capable of informing external entities about activities inside SAP HANA scale-out (such as host auto-failover) and SAP HANA system replication setups. In a Python script, actions can be executed before or after certain SAP HANA activities, such as startup, shutdown, failover, takeover, connection change, and so on. One example of these HA/DR providers, or "hooks", is moving virtual IP addresses after a takeover in SAP HANA system replication.
Additionally, external cluster management software can be used to perform the client reconnect after takeover.
Monitoring View Providing Information About Takeover History
The monitoring view M_SYSTEM_REPLICATION_TAKEOVER_HISTORYprovides information about take-overs in SAP HANA system replication (HSR) and when HSR was activated or reactivated.
During take-over, the content of the view is also moved to the system taking over, so that the complete take-over history is available.
Takeover History
Information provided by system view M_SYSTEM_REPLICATION_TAKEOVER_HISTORY |
---|
Execution end time for takeover of the transaction domain |
Execution start time for takeover of the transaction domain |
Master log position, that has been reached by takeover |
Time that has been reached by takeover |
Master nameserver host at takeover time |
Operation mode at takeover time |
Replication mode at takeover time |
Replication status at takeover time |
Highest master log position, that has been shipped before executing takeover |
Time of the last shipped log buffer before executing takeover |
Logical name provided by the site administrator at takeover time |
Generated ID of the secondary site at takeover time |
Source site master nameserver host at takeover time |
Logical name for the source site provided by the site administrator at takeover time |
Generated ID of the source site at takeover time |
Source site SAP HANA version |
End time of the takeover command |
Start time of the takeover command |
Indicates how the system went online: ONLINE: online takeover, OFFLINE: offline takeover, TIMETRAVEL: after time travel |
SAP HANA version for the site that is executing the takeover |
Implementing Takeover Hooks
Takeover Hooks
Takeover hooks are provided by SAP HANA in the form of a Python script template.
Pre- and post-takeover actions are implemented in this script, which are then executed by the name server before or after the takeover.
Therefore, the SAP HANA name server provides a Python-based API that is called at important points of the host auto-failover and the system replication takeover process.
There are a number of pre-takeover, post-takeover, and general hooks available.
These so called "hooks" can be used for arbitrary operations that need to be executed. One of the most important uses of the failover hooks is moving around a virtual IP address (in conjunction with STONITH).
There are other purposes, such as starting tools and applications on certain hosts after failover, or even stopping DEV or QA SAP HANA instances on secondary sites before takeover. Multiple failover hooks can be installed and used in parallel with a defined execution order.
The failover hooks are included in SAP HANA. SAP HANA comes with its own Python interpreter, which is used for interpreting the user defined failover hooks. The failover hook API also has a version number.
You can adapt Python files delivered with SAP HANA to create your own HA/DR provider. This allows you to integrate, for example, SAP HANA failover mechanisms into your existing scripts.
To create your own HA/DR provider, use the HADRDummy.py script (located in the $DIR_SYSEXE/python_support/hdb_ha_dr directory) as a template for implementing SAP HANA failover mechanisms in your own scripts.
After implementation of the basic HA/DR provider, you can add the methods listed in the figure, Hook Methods, to your provider.
Hook Methods
Name | Trigger |
---|---|
startup() | Beginning of nameserver’s start up phase |
shutdown() | Just before the nameserver exists |
failover() | As soon as the nameserver made a decision about the new role |
stonith() | As soon as the nameserver made the decision about the new role |
preTakeover() | As soon as the hdbnsutil -sr_takeover command is issued |
postTakeover() | As soon as all services with a volume return from their assign-call (open SQL port) |
srConnectionChanged() | As soon as one of the replicating services loses or (re-) establishes the system replication connection |
srServiceStateChanged() | As soon as the nameserver made a decision about the new state |
srReadAccessInitialized() | As soon as a tenant database or the system database is ready to accept SQL read queries on a read enabled secondary system |
As an example, the srServiceStateChanged() HA/DR Provider Hook reports changed service states. It notices that an SAP HANA service is currently stopping or crashing. This knowledge can be used to reduce the takeover (detection) time, especially in systems with huge index servers.
Note
The procedure for creating a HA/DR provider, and the available hook methods, are described in detail in the SAP HANA Administration Guide.
Takeover with Handshake
The takeover with handshake ensures that all of the sent redo log is written to disk on the secondary system.
During a planned takeover, it is important to ensure that no data gets lost (all primary updates must be available on the secondary system), and the former primary system is isolated to avoid a split-brain situation with multiple active primary systems.
The takeover with handshake is ideal for a safe planned takeover while the primary is still running. All new writing transactions on the primary system are suspended and the takeover is only executed when the redo log is available on the secondary system. When performing a takeover with handshake, it is not required to check the replication status or to stop the old primary before the takeover.
You can trigger a takeover with handshake using hdbnsutil -sr_takeover -–suspendPrimary on the secondary system.
If a primary service cannot be accessed, or a service replication is not active or in sync, the takeover will be aborted and reported as an error. In this case, there is no impact on the system and the replication remains as it was. The suspended primary service can be unblocked using the -sr_register hdnsutil command.