Reducing the Data Footprint

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Describe how SAP HANA Cloud reduces the data footprint

Data Footprint Reduction

Data in the SAP HANA Cloud column store tables is automatically compressed. The purpose of compression is to reduce the data footprint. It is not unusual to achieve up to 90% reduction in the data footprint.

The following are some of the benefits of a reduced data footprint:

  • You can fit entire enterprise databases into memory and avoid disk access which ensures high performance for searches and calculations.

  • You can get more data into CPU cache and therefore reduce main memory access, to further achieve high performance.

  • You can fit more data into your chosen SAP HANA Cloud configuration so that you maximize your investment.

The amount by which data reduction can take place is determined by the shape of the business data. Compression results are most effective when there is a lot of repetition in the column values. Let's look at an example:

A multi-million row sales order table contains the column COUNTRY which can only have the values : Belgium, Denmark, France, Italy, or Spain. This would mean those five values would repeat many times in the column. This is wasteful repetition.

  1. Compression removes the repetition of values in a column by storing each distinct value only once in a special place called a dictionary store. For each unique dictionary value, an integer is generated. So, in our example, the dictionary store would contain just five rows, each with its own integer that represents the real country name.

  2. Indexes are then generated which places each integer alongside the original record position.
  3. We no longer need the original uncompressed data with its repetitions and this is removed from the table to create a much smaller footprint.

Apart from reducing the storage, replacing repetitive string values with integers is very efficient for in-memory processing.

The example above is based on a compression technique called dictionary encoding. Dictionary encoding is a first level compression technique and is applied to all columns of a column store table. But there are many other compression techniques that can be further applied as a second level compression. These include run-length encoding, cluster encoding, and prefix encoding.

Compression techniques are deeply embedded in the SAP HANA Cloud database and are invisible to all users and developers. It is possible to view the compression details of each column table using the Database Explorer tool.

Compression is applied during the delta merge operation. That means, until the first delta merge is performed, data inserted into a column store table remains uncompressed. So if you fill a table with a large number of initial records, a delta merge should be performed as a priority to reduce the footprint.

Each subsequent delta merge operation regenerates the compression data to consider the newly inserted and updated records.

Note
Compression is not relevant for row store tables

Save progress to your learning plan by logging in or creating an account

Login or Register