Row and Column Store Architecture

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Describe row and column storage used by SAP HANA Cloud

Row and Column Store

In this lesson, we turn our attention to the core component of SAP HANA Cloud; the in-memory database.

Most traditional enterprise relational database tables are based on well-established row storage. Row storage is regarded as the optimal storage design for an on-line transactional processing (OLTP) application. An OLTP application requires fast, record-level updates where all columns in the record are usually needed for processing. When an application reads a record, all columns in the record are read, even if a column is not needed, it is still processed, For analytical applications, where queries often produce ad-hoc access to only selected columns, row storage is not efficient. Why would you process a column that is not needed in a query?

Whilst SAP HANA Cloud fully supports OLTP applications using row storage it also supports advanced, column-based storage and processing which is the optimal design for on-line analytical processing (OLAP) applications where queries require access to selected columns. OLAP applications typically work with high-volume tables that need to be quickly aggregated by ad-hoc queries. Column tables are highly efficient in this case.

Unlike many database that support either row or column storage, SAP HANA Cloud database supports both row tables and column tables in the same database.

Modern applications combine transactional processing with analytical processing so SAP HANA Cloud, with its combination of row and column storage and processing, is the ideal database on which to build such modern applications.

The figure, Row and Column Storage, illustrates how row and column tables store the data.

Column tables are highly efficient for analytical applications where requests for selections of data are not predictable. Queries from analytical applications that are sent to the database often require only a subset of the overall data in the table. Usually only a few columns are required from the table and also only a limited number of rows from the columns are needed. With column tables, only the required columns are processed so you avoid touching columns that will never be used. Also, the data is arranged efficiently with all values of a column appearing one after another. With row storage, we first read a value from a column then skip over the remaining unwanted columns until we come back to the required column to read the next required value. This is not efficient and harms performance.

With column store, SAP HANA Cloud scans columns of data incredibly fast so that additional indexes, although supported, are usually not required. This helps to reduce the complexity by avoiding the need to constantly create drop and rebuilding separate indexes.

It is easy to alter column tables, for example, by adding extra column or removing columns, without dropping and reloading data.

Column tables are optimized for parallel processing, as each CPU core is able to work on a separate column.

The downside to column tables is the cost of reconstructing complete records from the separately stored columns. Reconstruction typically occurs in transactional applications that requires the complete record for updating, copying, or deletion. Although it is possible to build transactional application on column tables, you might achieve better performance using row-based tables if all columns are always processed together in your application and no analytics / ad-hoc queries are run against the table.

Save progress to your learning plan by logging in or creating an account