In-Memory Column Tables

In-Memory Column Tables

The SAP HANA Cloud database stores data in memory using either a column-based or row-based table. Application builders can use column tables for almost every use case.

image

Advantages of Column-Based Storage

Column tables have several advantages:

  • Higher data compression rates In-memory columnar data storage natively utilizes compression to store data. Advanced compression techniques are available since all of the column’s data resides in memory versus on disk.

  • Higher performance for column operations such as searching and mathematical operations With columnar data organization, operations on single columns are as easy as iterating through an array with the added benefit of parallel operations and optimized storage designed for fast and efficient searches.

  • Elimination of the need for secondary indexes In many cases, columnar data storage eliminates the need for additional index structures since storing data in columns already works like having a built-in index for each column. Eliminating secondary indexes reduces memory size, can improve write performance, and reduces development efforts.

  • Elimination of materialized aggregates The database uses the column store to aggregate large amounts of data with high performance and without the need for materialized aggregates. A leaner data architecture provides application builders many benefits including simplifying the data model, optimal code opportunities to accomplish business logic, and data that is always current.

  • Parallelization Column-based storage also simplifies parallel execution using multiple processor cores. In a column store, data is already vertically partitioned. That means operations on different columns can easily be processed in parallel.


Try it out!

  1. Let’s explore how to create Row and Column tables in SAP HANA Cloud. The syntax for creating a table in SAP HANA is as follows:
Code Snippet
1
CREATE TABLE <NAME_OF_TABLE>
  1. By default, the SQL command creates a Column table. View the properties of a table by selecting the table name. Filter on GX_EMPLOYEES and then select Properties in the tab to the right.

Upon selecting Properties for the chosen table, the additional details about the table appear including name, schema, size, and other relevant information.

  1. Look for the Table Type property, and you will see it is listed as a COLUMN table.

  1. Select the Runtime Information tab to see the runtime details such as memory usage, disk size and record count.

To create a Row table, explicitly define the table type in the SQL statement.

  1. Open a new SQL console session and paste in the following SQL which will create a Row type table called GX_EMPLOYEES_ROW and populate it with the same data as before:
Code Snippet
12
CREATE ROW TABLE GX_EMPLOYEES_ROW as (SELECT * FROM HC_DEV.GX_EMPLOYEES);
  1. The newly created table can now be seen in DB Explorer. Select this table to open the details.

  1. Now select the runtime tab and observe the Memory Consumption details. The amount of memory required for a row table is higher than with the same data residing in a column table.

Column tables are more efficient and support faster query processing than row tables. For further information on data storage in SAP HANA Cloud, please review the SAP Help Portal.


Table Partitioning

The partitioning feature of the SAP HANA Cloud database splits column-store tables into smaller, more manageable parts.

Partitioning is transparent for SQL queries and data manipulation language (DML) statements. There are additional data definition statements (DDL) for partitioning itself:

  • Create table partitions
  • Re-partition tables
  • Merge partitions to one table
  • Add/delete partitions
  • Perform the delta merge operation on certain partitions

When a table is partitioned, the split is done in such a way that each partition contains a different set of rows of the table. There are several alternatives available for specifying how the rows are assigned to the partitions of a table, for example, hash partitioning or partitioning by range.

The following are the typical advantages of partitioning:

Large table sizes Each partition as well as a table without partitions can store 2 billion records. Adding more partitions to a table configures the table for more capacity.

Parallelization Partitioning allows operations to be parallelized by using several execution threads for each table.

Partition pruning Queries are analyzed to determine whether or not they match the given partitioning specification of a table (static partition pruning) or match the content of specific columns in aging tables (dynamic partition pruning). If a match is found, it is possible to determine the specific partitions that hold the data being queried and avoid accessing and loading into memory partitions which are not required.


Try it out!

Explore how to partition a table using practical examples. For demonstration purposes, the GX_EMPLOYEES table is used, which contains a conveniently rounded number of 100,000 records. However, the following steps can be applied to any of the GX_* tables which have been created in the local schema.

Currently, the GX_EMPLOYEES table does not contain any partitions, and all 100,000 records are in a single table. Confirm by examining the metadata of the table in the SAP HANA Cloud Database Explorer. Upon inspection, observe that the Partitions tab is blank, indicating the absence of any partitions in the table.

By selecting the Columns tab, it can be seen that all columns are sitting on Partition ID = 0:

Now let’s partition the table.

  1. Open a new SQL console and paste in the following code:
Code Snippet
12
/* HASH Partitioning*/ ALTER TABLE GX_EMPLOYEES PARTITION BY HASH(EMPLOYEE_ID) PARTITIONS 3;

This will partition the GX_EMPLOYEES table into three partitions, via a HASH partition function on the EMPLOYEE_ID column.

  1. Refresh the metadata screen for the table, and observe that GX_EMPLOYEES is now made up of three separate partitions, with about 33k records in each partition.

  1. Run the following SQL query in the console and observe the result:
Code Snippet
1
SELECT COUNT(*) FROM GX_EMPLOYEES;
  1. Upon querying the table, it can be observed that 100,000 records are returned as the result. Despite the fact that the table has been partitioned into three separate partitions, there is no need to make any alterations to the queries performed on the table. The partitioning is handled transparently by SAP HANA Cloud, ensuring that the query results remain consistent and accurate without requiring any modifications to the queries themselves.

  2. We can merge all the partitions and return to one single table with the following SQL statement:

Code Snippet
12
/* Merge Partitions */ ALTER TABLE GX_EMPLOYEES MERGE PARTITIONS;

Refresh the metadata screen for the table to see that it is a non-partitioned table again.


Data In Memory

SAP HANA Cloud is an ‘In-Memory’ database, meaning that it stores data in a computer’s main memory (RAM) instead of on traditional disks or solid-state drives (SSD). While most databases today have added more in-memory capabilities, they are still primarily disk-based storage databases. SAP HANA Cloud was built from the ground up to work with data in-memory and leverage other storage mechanisms as necessary to balance performance and cost. Retrieval from memory is much faster than from a disk or SSD, resulting in split-second response times.

Therefore, memory is a fundamental resource of the SAP HANA Cloud database. Understanding how the SAP HANA Cloud database requests, uses, and manages this resource is crucial to the understanding of SAP HANA Cloud.

Overview of HANA Used Memory

The dominant part of the used memory in the SAP HANA Cloud database is the space used by data tables. Separate measurements are available for column-store tables and row-store tables.

The column store is the part of the SAP HANA Cloud Database that manages data organized in columns in memory. This enables high data compression rates and faster aggregations.

The column store is optimized for both read and write operations. This is achieved through two data structures: Main storage and Delta storage.

To demonstrate the concept, create a copy of the GX_EMPLOYEES table with Delta Merge initially disabled. This will allow us to examine the behavior of the table without the Delta Merge feature. Subsequently, the table will be populated with data for further analysis.

  1. Select the SQL icon in the top left corner of Database Explorer to open a new SQL Console.

  1. Copy and paste the following SQL statement and execute it by selecting the green Run icon or by pressing the F8 function key.
Code Snippet
123
/* Create Table with Auto Merge disabled */ CREATE COLUMN TABLE LOCAL_EMPLOYEES AS (SELECT * FROM GX_EMPLOYEES) NO AUTO MERGE;

Note

It is important to note that when creating a table in SAP HANA Cloud, the Delta Merge feature is automatically enabled by default. There is no need to explicitly specify how the table should handle the auto merge process during creation. However, in this situation, we are specifying “No Auto Merge” for demonstration purposes to showcase the behavior without Delta Merge. For further information on the delta merge process, click

Now it is possible to check the details of this table by selecting it in the Catalog.

  1. Expand Catalog, navigate to Tables and find the newly created table under the user schema.

Note

Filter on schema -

  1. Select the LOCAL_EMPLOYEES table to see its meta data.

Note a list of all the table columns, and their data types:

  1. Select the Runtime Information tab.

In the table details section, users will find additional information about the table, including the number of records, table size in memory and on disk, as well as details about partitions and columns. Specifically, the Memory Consumption section provides insights relevant to our current focus.

In here, users can explore and analyze the memory consumption of the table, which is crucial for understanding the resource utilization and performance implications. By examining the memory consumption metrics, users can gain valuable insights into the table’s memory footprint and make informed decisions regarding table management and optimization.

From here, the total memory size of the table, along with how much of that is currently in the Delta Store and how much is in the Main Store, can be seen. Observe the fact that all the data is currently in delta, and the size of the table in memory is over 12 MB.

Note

As we created the table with

Now let’s run some statements against this table and observe the results.

  1. Copy and paste the following SQL query into an SQL console and execute it by selecting the green Run icon or by pressing the F8 function key.
Code Snippet
1
SELECT * FROM LOCAL_EMPLOYEES;
  1. Once the results are returned, select Messages to see the execution statistics.

Observe the main attributes of the query such as Elapsed Time, Prepare Time and Peak Memory consumed.

  1. Now execute the statement again and observe the differences in these attributes.

The time for statement preparation has reduced significantly, as this statement is now stored in the statement cache. Memory used has also been reduced by a large factor.

The next step is to perform a delta merge on the table and observe the results.

  1. Copy and paste the following into an SQL console and execute it:
Code Snippet
1
MERGE DELTA OF LOCAL_EMPLOYEES;
  1. Now return to the tab which has the meta data of the LOCAL_EMPLOYEES table open, and select refresh to observe the change in memory statistics.

The total memory consumption of the table has been reduced significantly, and the majority of the data is now stored in the Table’s Main Storage area.

Let’s repeat the previous steps of running a Select SQL query against the table and observing the resulting statistics.

  1. Copy and paste the following SQL query and execute it:
Code Snippet
1
SELECT * FROM LOCAL_EMPLOYEES;
  1. Once the results are returned, select Messages to see the execution statistics.

Note that the Execution time, Prepare time and Memory used in the query should all be quite similar to the 2nd execution above.

From the latest execution of the statement, the following conclusions can be drawn:

  • By merging the table data from Delta store to Main store, the data becomes highly compressed.
  • The statement read performance is not negatively impacted by this highly compressed state
  • Writing to the table is optimized by utilizing the Delta store of the table which is row-based.

Data Loaded In-Memory

In regards to data loading, there are different states for a Column table in HANA:

  • Unloaded - none of the column store data is loaded to main memory.
  • Partially Loaded - parts of the column store data are loaded to main memory e.g. only a few columns recently used in a query.
  • Fully Loaded - all data of the column store is loaded into main memory.

Normally, SAP HANA Cloud manages the loading and unloading of tables with the aim to keep all relevant data in memory.

Table columns are loaded into memory as they are accessed. To avoid memory waste, unused columns remain on disk. When additional memory is required by the database, columns are unloaded on a least recently used basis.

Though usually not necessary, it is possible to manually load and unload tables or columns from memory as the following exercise demonstrates.

  1. Select the LOCAL_EMPLOYEES table from the Explorer window on the left to open its meta data.

  2. In the Runtime Information tab, select Columns to see more details about the table columns.

  1. As already seen, running queries on this table and selecting all columns will result in each column being loaded into memory (Loaded status = ‘TRUE’).

  1. Run the following SQL query again just to observe the runtime:
Code Snippet
12
/* Columns loaded in memory */ SELECT * FROM LOCAL_EMPLOYEES;
  1. Observe a similar runtime to before.

  1. Force the table to be unloaded from memory. Copy and paste the following SQL, then execute it in the SQL console:
Code Snippet
1
UNLOAD LOCAL_EMPLOYEES;
  1. Go back to the tab with the meta data for the table still open, select refresh and observe the updated values. The Loaded column is now set to ‘FALSE’ for all columns of the table.

  1. Now run the SELECT * query again and observe the run-times. Either return to the SQL console that was previously open, and re-run the statement, or copy and paste the following SQL into a new console and execute:
Code Snippet
12
/* Columns not loaded into memory */ SELECT * FROM LOCAL_EMPLOYEES;

Note

that the statement execution time includes an initial load back into memory.

  1. As the previous query used all columns, the table should be fully loaded into memory again. To check this, select Refresh in the table metadata tab and observe the values in the ‘Loaded’ column.

Partially Loading Data

In the above examples, the SELECT * query retrieves all of the data from the table, thus SAP HANA Cloud loads the entire table into memory.

Let’s observe the result of a query returning only a few columns from the table.

  1. Unload the entire table from memory as it is currently fully loaded. Execute the following query in the SQL console:
Code Snippet
1
UNLOAD LOCAL_EMPLOYEES;

Note

Double-check if the table has been unloaded by refreshing the table metadata screen as before, and verifying the
  1. Run the following query which only selects four columns from our table:
Code Snippet
1234
/* Select certain columns only - columns not yet in memory */ SELECT EMPLOYEE_ID, (EMPLOYEE_FIRSTNAME || ' ' || EMPLOYEE_LASTNAME) AS NAME, EMPLOYEE_SALARY FROM LOCAL_EMPLOYEES WHERE EMPLOYEE_SALARY > 50000;
  1. After the query has finished, refresh the metadata tab for the table and check the Loaded column. Only the four columns used in the query are set to TRUE. This means all data that was not relevant for the query was left on disk, and only the required data was moved to main memory.

  1. Now remove the table as it is no longer required. Run the following query:
Code Snippet
12
/* Remove Table */ DROP TABLE LOCAL_EMPLOYEES;

Well done! By now, you have acquired a solid understanding of in memory column tables and the advantages they offer. You now have experience creating these types of tables, exploring table properties, and effectively creating partitions for improved performance and data management.