Describing SAP HANA Scale-Out Systems

Objective

After completing this lesson, you will be able to discovering SAP HANA scale-out systems

Introduction to SAP HANA Scale-Out Systems

Business Example

As an SAP HANA database administrator, you need to install and administrate your company’s high availability scale-out SAP HANA systems. You need to understand the basic concept behind the SAP HANA scale-out technology.

Scaling SAP HANA

Scaling the Data

One technique you can use to deal with planned data growth is to purchase more physical RAM than is initially required, to set the allocation limit according to your needs, and then to increase it over time to adapt to your data. Once you have reached the physical limits of a single server, you can scale out over multiple machines to create a distributed SAP HANA system. You can do this by distributing different schemata and tables to different servers (complete data and user separation). However, this is not always possible, for example, when a single fact table is larger than the server's RAM size.

The most important strategy for scaling your data is data partitioning. Partitioning supports the creation of very large tables (billions of rows) by breaking them into smaller chunks that can be placed on different machines. Partitioning is transparent for most SQL queries and other data manipulations.

Scaling Performance

SAP HANA’s performance is derived from its efficient, parallel approach. The more computation cores your SAP HANA server has, the better the overall system performance.

Scaling performance requires a more detailed understanding of your workload and performance expectations. Using simulations and estimations of your typical query workloads, you can determine the expected load that a typical SAP HANA installation may comfortably manage. At the workload level, a rough prediction of scalability can be established by measuring the average CPU utilization while the workload is running. For example, an average CPU utilization of 45% may indicate that the system can be loaded 2X before showing a significant reduction in individual query response time.

Scaling the Application

Partitioning can be used to scale the application as it supports an increasing number of concurrent sessions and complex analytical queries by spreading the calculations across multiple hosts. Particular care must be taken in distributing the data so that the majority of queries match partitioning pruning rules. This accomplishes two goals: directing different users to different hosts (load balancing) and avoiding the network overhead related to frequent data joins across hosts.

Scaling Hardware

SAP HANA is offered in a number of ways – in the form of an on-premise appliance, delivered in a number of different configurations and "sizes" by certified hardware partners or by using the tailored data center integration model, and as part of a cloud-based service. This creates different system design options with respect to scale-up and scale-out variations. To maximize performance and throughput, SAP recommends that you scale up as far as possible (acquire the configuration with the highest processor and memory specification for the application workload), before scaling out (for deployments with even greater data volume requirements).

Note

SAP HANA hardware partners have different building blocks for their scale-out implementations. Therefore, you should always consult with your hardware partner when planning your scale-out strategy.

Introducing High-Availability in an SAP HANA system

In the figure, Scale-up vs Scale-out, the SAP HANA systems are only increased in size from 1TB to 12TB. In both scenarios, scale-up and scale-out, no high availability is introduced. High availability can only be introduced in a scale-out setup with inclusion of standby nodes. A scale-out configuration with high availability is shown in the figure High Availability and Scale-out. One or more SAP HANA nodes can be configured as standby nodes. A standby node automatically takes over the operations of a failed host using the host auto-failover feature from SAP HANA.

Two scale-out systems. One with no standby node that's not high available. The other with a standby node that is high available.

As soon as you introduce standby nodes in an SAP HANA scale-out configuration, you reserve resources for the event of a failure. These resources cannot be used in the active system. In the figure High Availability and Scale-out, one host is defined as the standby node. This means that our system now has only 11 nodes active nodes, and the total database size is reduced from 12TB to 11TB.

A scale-up configuration, by default, has no high availability capabilities. This is due to the fact that a scale-up system consists of one server. A scale-up system can be made high availability by adding an additional standby host to the SAP HANA system, or by setting up a configuration using storage replication or system replication.

Multiple-host (Distributed) Systems

An SAP HANA system can comprise multiple isolated databases and may consist of a cluster of several hosts. This is referred to as a multiple-host, distributed system, or scale-out system, and supports scalability and availability.

An SAP HANA system is identified by a single system ID (SID) and contains one or more tenant databases and one system database. Databases are identified by an SID and a database name. From the administration perspective, there is a distinction between tasks performed at the system level and those performed at the database level. Database clients, such as the SAP HANA cockpit, connect to specific databases.

A scale-out system. The data and log file of services storing data are on the shared storage. On the standby host, services wait to connect to the data and log files of the failed services.

A host is a machine that runs parts of the SAP HANA system. The machine is comprised of CPU, memory, storage, network, and an operating system.

An SAP HANA instance is the set of components of a distributed system that are installed on one host. The figure High Available Scale-out System shows a distributed system that runs on four hosts. In this example, each instance has an index server and a name server.

One or more hosts can be configured to work in standby mode, so that if an active host fails, a standby host automatically takes its place. The index servers on standby hosts do not contain any data and do not receive any requests.

The index server contains all the database and processing components. Each index server is a separate operating system process and it also has its own disk volumes. When processing database operations, index servers may need to forward the execution of some operations to other servers that own the data involved in the operation.

In each SAP HANA system, there is one primary index server. It stores the meta-data and it contains the transaction manager that coordinates distributed transactions involving multiple index servers.

Database clients can send their requests to any index server. If the contacted index server does not own all the data involved, it delegates the execution of some operations to other index servers, collects the result, and returns it to the database client.

In a distributed system, a central component is required that knows the topology and how data is distributed. This component is the name server. The name server knows which tables, table replicas, or partitions of tables are located on which index server.

When processing a query, the index servers ask the name server about the locations of the involved data. To prevent a negative impact on performance, the topology and distribution information is replicated and cached on each host. In each SAP HANA system, there is one primary name server that owns the topology and distribution data. This data is replicated to all other name servers, called secondary name servers. The secondary name servers write the replicated data to a cache in shared memory from where the index servers of the same instance can read it.

The primary name server has its own persistence where it stores name server data (topology and distribution data). The secondary name servers have no persistence because they are only holding replicated data.

Log in to track your progress & complete quizzes