Examining Failure of a SAP HANA Worker Node

Objective

After completing this lesson, you will be able to identify the findings during a failure of a worker node

Failure of a Worker Node

Business Example

As an SAP HANA database administrator, you need to understand the SAP HANA host auto-failover concept. To better understand this feature, you need to have hands-on experience with a worker node failing in a multi-host SAP HANA system.

Failure of an SAP HANA Node

Host auto-failover is a local fault recovery solution that can be used in addition to, or as an alternative measure to, system replication. One (or more) standby hosts are added to an SAP HANA system and configured to work in standby mode. If they are in standby mode, the databases on these hosts do not contain any data and do not accept requests or queries. This means that they cannot be used for other purposes, such as quality or test systems.

When a primary (worker) host fails, a standby host automatically takes its place. If neither the name server process hdbnameserver, nor hdbdaemon respond to network requests (because the instance is stopped or the OS has been shut down or powered off), a host is marked as inactive and an auto-failover is triggered. Since the standby host may take over operation from any of the primary hosts, it needs shared access to all the database volumes. This can be accomplished by a shared, networked storage server, by using a distributed file system, or with vendor-specific solutions that use an SAP HANA programmatic interface, the Storage Connector API, to dynamically detach and attach (mount) networked storage upon failover.

To ensure data consistency at all times, it must be guaranteed that a failover does not happen (or at least does not succeed and may not cause corrupt data) if the failed host can potentially still write data. To achieve this, the SAP HANA host auto-failover uses a combination of heartbeat and fencing.

Heartbeat

The following types of heartbeat are used to check if another host is active as the coordinator before starting the current host as the coordinator or performing a failover:

  • TCP communication-based heartbeats:

    • Ping from nameserver to nameserver with SAP HANA internal communication protocol

    • Ping from nameserver to hdbdaemon with SAP HANA internal communication protocol

  • Storage-based heartbeats:

    The current coordinator nameserver periodically updates heartbeat files located on different storage partitions:

    • Shared storage for the SAP HANA binaries

    • Storage partition 1 for the coordinator node’s data

    These types of storage are typically connected with networks other than the inter-node network used for service-to-service communication (such as fiber channel for SAN or dedicated Ethernet for NFS) and therefore these heartbeats provide additional value.

Fencing

In rare cases, the heartbeats cannot detect if another host is alive, for example in split-brain situations where no communication is possible between hosts. I/O fencing ensures that the other side does not access the data or log storage any more.

The SAP HANA Storage Connector API, together with a specific Storage Connector, allows the usage of the following types of storage and network architecture to ensure proper I/O fencing:

  • SAN storage: the SAP HANA Fiber Channel Storage Connector [2] using SCSI-3 persistent reservations (SCSI-3 PGR).

  • NFSv3: used without file locking, but with a Storage Connector provided by certified storage vendors. This type of Storage Connector implements a Shoot The Other Node In The Head (STONITH) call to reboot a failed host.

    If an NFSv3 client dies (that is, the SAP HANA server), the file locks are not released on the NFS server side resulting in a deadlock for any host that wants to access these files. Using the nolock mount option solves the locking problem, but with this option, data is not protected against parallel reading and writing from different hosts. To solve this, STONITH must be implemented.

  • NFSv4 or cluster file systems like GPFS: using file locks. A Storage Connector is not required here as these file locks reliably prevent false access. However, a STONITH type Storage Connector is provided by some storage vendors to speed up failover.

Review the SAP HANA Multi-Host Configuration from the Command Line

The output from the python script landscapeHostConfiguration.py is highlighted in a terminal window.

The SAP HANA multi-host configuration can also be viewed at the operating system level. There is a Python script called landscapeHostConfiguration.py in the $DIR_INSTANCE/exe/python_support folder. Running the script as shown in the previous figure provides an overview of the configuration.

The following host columns are shown in addition to the SAP HANA cockpit 2.0 view by this script:

  • STORAGE_CONFIG_PARTITION / Storage Partition (Configured - new in SPS 12): The stable sub-path to reassign the same storage partition after failovers.

  • WORKER_CONFIG_GROUPS / Worker Groups (Configured – new in HANA 2 SPS 00): The stable classification values to assign hosts to logical worker groups.

  • WORKER_ACTUAL_GROUPS / Worker Groups (Actual – new in HANA 2 SPS 00): The current classification values to assign hosts to logical worker groups.

The return code may be consumed by cluster managers (for example, for SAP HANA system replication) to come to a decision on the system health state, as follows:

  • 0 = Fatal. For example, database offline.

  • 1 = Error. For example, a failover did not happen, because there was no standby host available.

  • 2 = Warning. For example, a failover is possible.

  • 4 = OK.

  • 5 = Ignore. For example, the system has switched roles (failover), but is fully functional.

A return code >= 4 indicates normal system operation. When the system is stopped, this script can also be used, but fills only a subset of the columns.

Host Failure Detection

Failure detection rules for worker hosts and coordinator hosts.

A host failure is any dysfunctional state of a host that affects the communication between the hosts of a distributed SAP HANA system. To check the functional state of a host, the name servers regularly send a ping on the internal network communication layer to name servers on other hosts. An additional ping to the hdbdaemon process is executed in the case where the remote name server does not reply repeatedly. Only when both services do not reply in time, is the host considered to have failed.

A crash of a single service does not trigger failover, because services are normally restarted by the hdbdaemon. If a service is not able to restart for any reason, it is assumed that it is not be able to start on another host either.

An exception is if the name server aborts itself during startup if the storage connector returns an error. It then instructs hdbdaemon to shut down the whole database instance on the host including the hdbdaemon itself, which allows failure detection and failover processing by other hosts.

Checking Worker Hosts

  • The name server communication heartbeat: The current coordinator name server pings all other name servers every 10 seconds. If a name server was active and five pings have failed (either immediately or after a 60 second ping timeout), the name server is considered inactive. By pinging multiple times, SAP HANA can recover from short network outages without triggering a failover.

  • The hdbdaemon communication heartbeat: If a worker name server was considered inactive (or had set itself to inactive), the coordinator name server pings the worker hdbdaemon process. If the hdbdaemon ping fails (either immediately or after a 60 second ping timeout), the host is considered as inactive and a failover is initiated.

Checking the Coordinator Host

  • The name server communication heartbeat: Name server candidates, which are not currently the coordinator, ping other candidates with lower priority every 10 seconds. Together with the worker name server heartbeat described earlier (current coordinator name server pings all other name servers), normally COORDINATOR 1 pings COORDINATOR 2 and COORDINATOR 3, and COORDINATOR 2 pings COORDINATOR 3. If a coordinator candidate does not receive any ping within 30 seconds, it pings the coordinator name server itself.

  • The hdbdaemon communication heartbeat: If the ping to the coordinator name server fails, the hdbdaemon process on the coordinator host is pinged. If the hdbdaemon does not answer within 60 seconds, the current coordinator host is considered inactive.

  • The name server storage heartbeat: The name server candidate host checks the heartbeat files for changes for a period of 60 seconds. Those files are updated by the current coordinator name server every 10 seconds with the hostname and a random string. A failover begins only if all files do not show any sign of changes for 60 seconds.

Worker Host Failover to a Standby Host

When a failure is detected and a replacement host is determined, the actual failover process starts.

Two states are highlighted: the scale-out system state before failure and the state after failover.

The previous figure is a visualization of a worker host failover to a standby host. On the left, the original state of the system is shown. On the right, the second host fails and its role is moved to the fourth host.

Failover step-by-step:

  1. Target host selection

    • If there is a standby host with an exact match of corresponding actual host roles, it is used.

    • If there is a standby host with one of the roles that corresponds to the failing host, it is used.

    • If the failing host has an SAP HANA worker role, any unassigned standby is used.

  2. The coordinator name server calls the stonith() method of all installed HA/DR provider hooks and the Storage Connector stonith() method. Typically the stonith() method is only implemented in NFSv3-related storage connectors and reboots the failed host.

    Note

    If STONITH fails, failover is aborted and all hosts remain in their old roles.

  3. Swap actual services, host roles, storage partition number, and volume IDs of all services between both hosts in the topology and inform all other hosts.

  4. The coordinator name server (which selected a replacement host), calls the name server on the target host to perform the failover.

  5. The host that was promoted to a new role calls the Storage Connectors attach() method to acquire the correct storage partition (if applicable) and calls the failover() method of all installed HA/DR provider hooks.

    Note

    If this fails, the host stops. If there are still standby hosts available, another failover is triggered and this host is set to ERROR.

  6. Reconfigure running standby services to load their newly assigned volume.

    Note

    If this fails, this is like a service failure and does not initiate a further failover.

  7. Reconfigure hdbdaemon to start/stop services that should run on only one of the two hosts.

    Note

    If this fails, this is like a service failure and does not initiated a further failover.

The coordinator name server is the only entity in the whole system that is able to make a failover target host selection. Since the coordinator has mechanisms to avoid split brain situations, there is conceptually no split brain situation possible for worker hosts. If a worker loses its connection to the coordinator name server, it waits and is notified by the new coordinator. If a worker cannot connect to a coordinator during startup, it terminates itself.

Log in to track your progress & complete quizzes