Performing Table Partitioning

Objective

After completing this lesson, you will be able to perform table partitioning tasks

Table Partitioning

Data Distribution in SAP HANA

In a multiple-host system, each index server is usually assigned to its own host for maximum performance. SAP HANA supports the following different ways of distributing data across the hosts:

  • A partitioned table splits its data in several blocks (partitions), and these partitions can be stored on different index servers.

  • Different tables can be assigned to different index servers.

  • A table can be replicated to multiple index servers, for better query and join performance.

When you create new tables, or partitions, they are distributed to the available hosts by the system. By default, a "round-robin" distribution method is used, but tables can also be positioned by using the table placement rules or by specifying a host and port number with the SQL CREATE TABLE statement in the location clause. This gives the developer complete control over the positioning of individual tables.

Specific applications may have predefined table distribution rules, and in some cases, configuration files and documentation are available in SAP Notes to help you to set up the necessary partitioning and table placement rules.

Introduction to Table Partitioning

The partitioning feature of the SAP HANA database splits column-store tables horizontally into disjunctive sub-tables or partitions. In this way, large tables can be broken down into smaller, more manageable parts.

Partitioning is only available for tables located in the column store. The row store does not support partitioning.

Hint

Partitioning is typically used in multiple-host systems, but it may also be beneficial in single-host systems.

Table partitioning is transparent for the application in a way that applications work properly with all partitioning strategies. Nevertheless, partitioning can have an impact on performance, so it can make a difference for the end user and the system load, both in a positive and negative way. To minimize the risk of performance regressions, it is important to implement a good partitioning strategy.

Partitioning is transparent for SQL queries and data manipulation language (DML) statements. The following additional data definition statements (DDL) for partitioning are available:

  • Perform the delta merge operation on certain partitions
  • Create table partitions
  • Re-partition tables
  • Merge partitions to one table
  • Add or delete partitions
  • Move partitions to other hosts

Note

After adding or removing hosts, it is recommended that you execute a redistribution operation. Based on its configuration, the redistribution operation suggests a new placement for tables and partitions in the system. If you confirm the redistribution plan, the redistribution operation redistributes the tables and partitions accordingly.

Tables can be partitioned using the range or hash method. The created partitions can be distributed to the available worker nodes.

When a table is partitioned, the split is done in such a way that each partition contains a different set of rows of the table. There are several alternatives available for specifying how the rows are assigned to the partitions of a table, for example, hash partitioning, round-robin partitioning, or partitioning by range.

Hash Partitioning

Hash partitioning is used to distribute rows to partitions equally for load balancing and to overcome the 2 billion row limitation. The number of the assigned partition is computed by applying a hash function to the value of a specified column. Hash partitioning does not require an in-depth knowledge of the actual content of the table.

For each hash partitioning specification, columns must be specified as partitioning columns. The actual values of these columns are used when the hash value is determined. If the table has a primary key, these partitioning columns must be part of the key. The advantage of this restriction is that a uniqueness check of the key can be performed on the local server. You can use as many partitioning columns as required to achieve a good variety of values for an equal distribution.

For more information about the SQL syntax for partitioning, see SAP HANA SQL and System Views Reference.

Round-Robin Partitioning

Round-robin partitioning is used to achieve an equal distribution of rows to partitions. However, unlike hash partitioning, you do not have to specify partitioning columns. With round-robin partitioning, new rows are assigned to partitions on a rotation basis. The table must not have primary keys.

Hash partitioning is usually more beneficial than round-robin partitioning for the following reasons:

  • The partitioning columns cannot be evaluated in a pruning step. Therefore, all partitions are considered in searches and other database operations.
  • Depending on the scenario, it is possible that the data in semantically-related tables resides on the same server. Some internal operations may then operate locally instead of retrieving data from a different server.

Range Partitioning

Range partitioning creates dedicated partitions for certain values or value ranges in a table. For example, a range partitioning scheme can be chosen to create a partition for each calendar month. Partitioning requires an in-depth knowledge of the values that are used or are valid for the chosen partitioning column.

Partitions may be created or dropped as needed and applications may choose to use range partitioning to manage data at a fine level of detail, for example, an application may create a partition for an upcoming month so that new data is inserted into that new partition.

Note

Range partitioning is not well suited for load distribution. Multi-level partitioning specifications address this issue.

When rows are inserted or modified, the target partition is determined by the defined ranges. If a value does not fit into one of these ranges, an error is raised. To prevent this, you can also define an 'others' partition for any values that do not match any of the defined ranges. The 'others' partitions can be created or dropped on-the-fly as required.

Range partitioning is similar to hash partitioning in that the partitioning column must be part of the primary key. Many data types are supported for range partitioning. See the list of data types in Partitioning Limits for the complete list.

Multi-Level Partitioning

Multi-level partitioning can be used to overcome the limitation of single-level hash partitioning and range partitioning, that is, the limitation of only being able to use key columns as partitioning columns. Multi-level partitioning makes it possible to partition by a column that is not part of the primary key.

Explicit Partition Handling for Range Partitioning

For all partitioning specifications involving range, it is possible to have additional ranges added and removed as necessary. This means that partitions are created and dropped as required by the ranges in use. In the case of multi-level partitioning, the desired operation is applied to all relevant nodes.

Note

If a partition is created and an others partition exists, the rows in the others partition that match the newly-added range are moved to the new partition. If the others partition is large, this operation may take a long time. If an others partition does not exist, this operation is fast as only a new partition is added to the catalog.

Range partitioning requires at least one range to be specified regardless of whether or not there is an others partition. When partitions are dropped, the last partition created cannot be dropped even if an others partition exists.

For range-range partitioning you have to specify whether a partition must be added or dropped on the first or second level by specifying the partitioning column.

Caution

The DROP PARTITION command deletes data. It does not move data to the others partition.

Time Selection Partitioning (Aging)

The SAP HANA database offers a special time selection partitioning scheme, also called "aging". Time selection or aging allows SAP Business Suite application data to be horizontally partitioned into different temperatures like hot and cold.

SAP Business Suite ABAP applications can use aging, which must not be used for customer or partner applications, to separate hot (current) data from cold (old) data. This is done by using time selection partitioning to perform the following actions:

  • Create partitions and re-partition
  • Add partitions
  • Allocate rows to partitions
  • Set the scope of Data Manipulation Language (DML) and Data Query Language (DQL) statements

Setting the DML and DQL scope is the most important aspect of time selection partitioning. It uses a date to control how many partitions are considered during SELECT, CALL, UPDATE, UPSERT, and DELETE. This date may be provided by the application with a syntax clause and it restricts the number of partitions that are considered.

Caution

Tables with time selection partitioning cannot be converted into any other kind of tables using ALTER TABLE.

Partitioning Advantages

    Caution

    Before a table is partitioned or re-partitioned, a delta merge operation is executed. Therefore, in the case of huge tables, you must partition them in good time so as not to run out of memory during the merge operation.

    Explicit partition handling

    In some cases, it can be useful that the application controls the creation and existence of partitions based on specific criteria, for example, by adding partitions to store the data for an upcoming month.

    Examples of Using Table Partitioning with SQL

    Tables can be partitioned, repartitioned, or merged using SQL statements.

    In the figure, Table Partitioning Examples, the table HA201_DEMO_TABLE is created with three partitions. The single-level partitioning specification is HASH on column Hay.

    The above example creates a new table that is partitioned during creation. In real life, it will also happen that you need to merge one or more partitioned tables, or partition an existing table. The use cases may be that the table is nearing the 2 billion records limit, or the query performance is not optimal. Using the mentioned SQL statements, you can merge partitioned tables, or partition existing tables in the SAP HANA database.

    Repartitioning

    There is no automatic repartitioning when threshold values are exceeded. Instead, this is proposed the next time the redistribution process is executed.

    The values entered for partitioning must be consistent with the physical landscape, especially the number of server nodes available.

    If repartitioning is necessary, tables are only repartitioned by doubling the number of existing (initial) partitions. This is done for performance reasons. The maximum number of (first-level) partitions reached by that process is defined by parameter global.ini > [table_placement] > max_partitions (default: 12).

    By default, the system does not create more partitions than the number of available hosts (or more specifically, possible locations). For example, if INITIAL_PARTITIONS is set to 3, but the distributed SAP HANA database has five possible locations, repartitioning from three to six partitions does not take place. A table can have more than one partition per host if the parameter global.ini > [table_placement] > max_partitions_limited_by_locations is set to false (default: true). This rule is disregarded if a higher number of first level partitions is required to partition groups with more than 2 billion records (global.ini > [table_placement] > max_rows_per_partition, default = 2,000,000,000).

    Note

    SAP Note: 2044468 - "FAQ: SAP HANA Partitioning" provides detailed information on SAP HANA partitioning.

    Table Distribution Editor: Additional Actions

    If a table is distributed to several partitions, it displays the host that stores each of these partitions. Existing partitions can be moved to different hosts by generating a specific redistribution plan. You can also balance table distribution after adding new hosts to the system. Check, optimize, compress, defrag, load table, delta merge, and evaluate repartitioning of tables that are not partitioned to other hosts as well.

    Tables can be partitioned, repartitioned, or merged using the table redistribution plan generator in SAP HANA cockpit.

    Note

    Before moving tables or partitions, the system checks that the host has sufficient memory.

    Changing how tables are distributed across hosts is a critical operation. Back up the landscape before executing a redistribution operation.

    Best Practices for Table Partitioning

    To create an optimal partitioning plan, you should try to follow the table partitioning best practices in the figure, Best Practices Table Partitioning.

    Follow the best practices for table partitioning, for example, keep tables, partitions, and key columns low.
    • Keep the number of partitioned tables low

      Only partition tables if you see a clear benefit without significant regressions.

    • Keep the number of partitions per table low

      An unnecessarily high amount of partitions result in overhead because some queries may have to access all partitions to find the data:

      • A high amount of network channels are opened and so the system is at risk of reaching the max channels limitation (SAP Note: 2222200) and running into network-related terminations.

      • Certain operations like the determination of column statistics (SAP Note: 2114710) have to be performed individually for each partition.

      So, consider the following general rules before defining a certain number of partitions:

      • If you partition tables due to the 2 billion limit, it is usually acceptable if individual partitions contain up to 1.5 billion records (less if you expect a significant future growth).

      • If you partition by date, you should avoid using granular ranges (such as days or weeks) resulting in a high amount of partitions.

      • If you use a RANGE partition on columns with data that is not evenly distributed (such as a number range column with multiple different number ranges), you should check the actual value distribution and define the range limits accordingly.

    • Keep the number of key columns low

      As few partition key columns as possible. It is useful to keep the number of partition key columns to a minimum for the following reasons:

      • Partition pruning of hash partitions can only be used if all underlying partitioning columns are specified with "=" or "IN" in the WHERE clause.

      • Determining partition pruning can be quite time consuming if many partition keys are involved.

        For more information, see SAP Note: 2000002 "What are typical approaches to tune expensive SQL statements?" and "Execution time higher than expected, negative impact by existing partitioning".

      In the case of hash partitioning, it is often useful to use only the most selective primary key column as the partition key column.

    • For SAP Suite on HANA, keep all partitions on same host

      In scale-out SAP Suite on HANA environments, it is advantageous to keep all partitions of a table on the same host. As of SPS08, this can achieved with an appropriate table placement configuration.

      As a fallback option, you can use a dummy first-level partitioning (for example, on MANDT) and perform the actual partitioning at the second level. In this case, all partitions are located on the same host.

    • Repartitioning rules

      When repartitioning, choose the new number of partitions as a multiple or divider of current number of partitions.

      If a table is already partitioned, it is most efficient to choose a new number of partitions that is a factor 2 multiple or divider of the current number of partitions (such as 4 -> 8 or 6 -> 3 partitions). Only in this case can the repartitioning happen in parallel on different partition groups and hosts ("parallel split/merge").

    • Avoid unique constraints

      When creating partitions, try to avoid creating additional unique constraints. Avoid partitioning tables with additional unique constraints (such as a unique secondary index), as the uniqueness checks impose significant overhead.

      Note

      SAP Note: 2000002 gives insight into SAP HANA SQL optimization, and describes symptoms that can be introduced by inadequate partitioning.

    Table Partitioning Monitoring Views

    The M_CS_PARTITIONS system view provides partition information about column tables. TABLE_NAME and SCHEMA_NAME columns are highlighted.

    The M_CS_PARTITIONS system view provides partition information of column tables.

    Code Snippet
    1
    select * from "M_CS_PARTITIONS" where "TABLE_NAME" = 'sap.hana.democontent.epm.data::PO.Item';

    The output shows the number of partitions. In the example of the figure, SQL Editor: Show Partitioned Table Information, we have three partitions.

    The M_CS_TABLES system view provides run time data for column tables or partitions of column tables.

    Code Snippet
    1
    select * from "M_CS_PARTITIONS" where "TABLE_NAME" = 'sap.hana.democontent.epm.data::PO.Item';

    The output shows which host the partition is located on, and how much memory is consumed by the table.

    The M_EFFECTIVE_TABLE_PLACEMENT system view provides information on the table placement location. This view also contains information about the partitioning thresholds. You can see the valid location(s) according to the configuration, the actual values for each partitioning parameter, and in the corresponding _MATCH columns, the reason (matching rule) for those.

    Use the table distribution application in SAP HANA cockpit to view the current table distribution.

    The table information is also available in SAP HANA Cockpit. In the View Current Table Distribution application, search for the required table and select it. In the pop-up, select the Show Runtime Data option. On the Runtime Data screen, the table definition is shown by default. Select the Partitions button to get an overview of how the table is partitioned and where the partitions are located. The partition range is also displayed.

    Table Consistency Checks

    To ensure consistency for partitioned tables, execute checks and repair statements, if required.

    You can call general and data consistency checks for partitioned tables to check, for example, that the partition specification, metadata, and topology are correct.

    Partitioning Consistency Check and Repair

    • General check: Consistency check

      CALL CHECK_TABLE_CONSISTENCY('CHECK_PARTITIONING', '<schema>', '<table>’)

    • Data check: General check plus check whether all rows are located in correct parts

      CALL CHECK_TABLE_CONSISTENCY('CHECK_PARTITIONING_DATA', '<schema>', '<table>’)

    • Repairing rows that are located in incorrect parts

      CALL CHECK_TABLE_CONSISTENCY('REPAIR_PARTITIONING_DATA', '<schema>', '<table>')

    Note

    The data checks can take a long time to run, depending on the data volume.

    Log in to track your progress & complete quizzes