Developing Applications Running on SAP BTP Using SAP HANA Cloud

Key Technologies of SAP HANA Cloud database

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Describe the Technology of the SAP HANA Cloud database

Moving the Database to Memory

New technology presents opportunities

SAP HANA Cloud has been developed from scratch with a design that takes full advantage of the recent trends and advances in computing technology. SAP HANA Cloud was not built by taking an existing software product and building on top of it, it was a complete re-design and re-development. This total redevelopment was undertaken to ensure that it could provide a next generation platform built on the very latest technology.

For example, historically, the high cost of memory meant that only small amounts were available to use. Small memory was a serious bottleneck in the flow of data from the disk to the CPU. It did not matter how fast the CPU was if the data could not reach it quickly from the disk.

But in recent years, the cost of memory has fallen and continues to fall year-on-year. Hardware vendors are now shipping huge amounts of memory in their servers. Memory can now scale up to many terabytes whereas previously, gigabytes of memory was the norm.

Huge Computing Power Now Available

With huge amounts of memory available, instead of using disk, we can now store the entire database of even the largest organizations, completely in memory. This gives you instant access to all data, and eliminates wait times caused by moving data from disk to memory. We can finally lose the mechanical spinning disk and the latency it brings, and rely on memory to provide all data instantly to the CPU. We know that Solid State Devices (SSD) storage is faster than disk but it still can not compete with memory.

To address large amounts of memory, we also use 64-bit operating systems. Traditional 32-bit operating systems cannot address the large amounts of memory now available.

But it is not enough to have huge memory delivering data to the CPU if the bottleneck is then the CPU performance So in addition to huge memory, CPU performance continues to improve at a phenomenal rate. We now have high-speed, multi-core CPUs that can take on complex tasks and break them up so then can be processed in parallel to provide incredible response times. This means that response times for even the most complex analytical tasks, such as predictive analysis, can be carried out in real time. So with a combination of huge memory sizes and faster multi-core CPUs, we have now have access to enormous amounts of computing power. SAP HANA Cloud exploits multiple CPUs to distribute the worksloads in order to achieve optimal performance. As you add more CPUs, performance is improved. We call this scaling up.

With modern blade-server architecture, cloud providers can now add more memory and more CPUs into their servers very easily and quickly. This allows fast scaling up of the hardware to handle bigger workloads and data volumes. Once the limits of scale-up have been reached for the hardware we can then look at scale-out. This is the deployment of extra worker nodes (more servers) to share the processing load and

But most databases were not designed to take advantage of such modern technology and would not know how to run optimally with large memory and multi-core CPUs. So SAP developed SAP HANA Cloud database from scratch specifically to run on the very latest hardware so that applications could take advantage of in-memory data storage and massively parallel processing.

Put simply, the databases and applications needed to catch up with advances in hardware technology. So, a complete rewrite of the database (SAP HANA Cloud database), as well as the applications that run on the database (e.g. SAP S/4HANA) was required.

SAP built SAP HANA Cloud to fully exploit the latest hardware. SAP collaborated with leading hardware partners who shared the designs of their new CPU and cache architectures. This enabled SAP to develop SAP HANA Cloud in such a way that it could extract every last drop of power from the hardware.

Moving the database from disk to memory

Watch this video to learn about moving the database completely to memory.

Moving the Database Completely to Memory

In the past, databases were stored completely on disk and only the data requested by the applications would be moved to memory where it then passed to the CPU for processing. Due to its limited size, memory would soon become filled and so data in memory would need to be unloaded back to disk to make way for new data requests. A lot of disk swapping was normal but this was harmful to performance of the applications. Application workarounds would be developed to try to reduce the swapping but this was never a proper fix to the underlying problem of using limited memory. With SAP HANA Cloud and the huge memory sizes available, you can now store a complete database in memory. This means that loading from disk to memory is not needed, all data is available instantly at all times to the application.

So how can we fit a complete database in memory? There are two key factors that work together to enable this:

  • Huge amounts of memory are now available. We have progressed from gigabytes (GB) of memory to terabytes (TB) of memory.

  • SAP HANA Cloud automatically compresses data. This compression reduces the data footprint of the largest databases down to a fraction of their original size. In the region of 90%.

So disk is not needed?

We know we can move the entire database from disk to memory. So does this mean SAP HANA Cloud eliminates disk? The answer is: No, SAP HANA Cloud still uses disk.

Even though we can fit the entire database in memory, we usually don't want to do that.

Data in memory is classified as hot. Hot data is frequently used by business and needs to perform very well. It is usually data that is very recent and is of interest to many parties. It might also be data that is processed by customer facing apps and needs to perform well. This data needs to be closest to the CPU for optimum read performance.

Infrequently used data can be classified as warm, which means that fast access is not so important. Warm data is stored on disk and loaded to memory only when needed. Most organizations would not want all data in memory as they regard only a part of it to be classified as hot. Memory costs are certainly falling but compared to disk, memory is still very expensive. This means that you should deliberately size memory optimally to fit only the hot data and not worry about trying to fit the entire data of the organization in memory. When customers choose their SAP HANA Cloud size they need to carefully calculate the memory size requirements of the hot and warm data and choose a memory / disk ratio setup that provides great performance on the most important data and acceptable performance on the less important data.

When you deploy an SAP HANA Cloud tenant you choose the key components that determine your required computing power: number of CPUs, memory size and disk size.

Row and Column Store

In this lesson, we turn our attention to the heart of SAP HANA Cloud; the in-memory database.

SAP HANA Cloud includes a fully-relational, ACID-compliant database.

Note
ACID is an acronym that means the database supports Atomicity, Consistency, Isolation, and Durability (ACID). This is a requirement of a database that must prove that it is 100% reliable for mission-critical applications. The database must guarantee data accuracy and integrity even when there are lots of simultaneous updates across multiple tables. It should also support data rollback and recovery.

Most traditional enterprise relational database tables are based on row storage. Row storage is regarded as the optimal storage design for an on-line transactional processing (OLTP) application. An OLTP application requires fast, record-level updates where all columns in the record are usually needed for processing.

Whilst SAP HANA Cloud fully supports OLTP applications using row storage it also supports advanced, column-based storage and processing which is the optimal design for on-line analytical processing (OLAP) applications. OLAP applications typically work with high-volume tables that need to be quickly aggregated by ad-hoc queries.

Unlike many database that support either row or column storage, SAP HANA Cloud database supports both row tables and column tables in the same database.

Modern applications combine transactional processing with analytical processing so SAP HANA Cloud, with its row and column storage and processing is the ideal database on which to build such modern applications.

Row and Column Storage

The figure, Row and Column Storage, illustrates how row and column tables store the data.

Column tables are highly efficient for analytical applications where requests for selections of data are not predictable. Queries from analytical applications that are sent to the database often require only a subset of the overall data in the table. Usually only a few columns are required from the table and also only a limited number of rows from the columns are needed. With column tables, only the required columns are processed so you avoid touching columns that will never be used. Also, the data is arranged efficiently with all values of a column appearing one after another. With row storage, we first read a value from a column then skip over the remaining unwanted columns until we come back to the required column to read the next required value. This is not efficient and harms performance.

With column store, SAP HANA Cloud scans columns of data incredibly fast so that additional indexes, although supported, are usually not required. This helps to reduce the complexity by avoiding the need to constantly create drop and rebuilding separate indexes.

It is easy to alter column tables, for example, by adding extra column or removing columns, without dropping and reloading data.

Column tables are optimized for parallel processing, as each CPU core is able to work on a separate column.

The downside to column tables is the cost of reconstructing complete records from the separately stored columns. Reconstruction typically occurs in transactional applications that requires the complete record for updating, copying, or deletion. Although it is possible to build transactional application on column tables, you might see better performance using row-based tables if all columns are always processed together in your application and no analytics / ad-hoc queries are run against the table.

But more often these days applications blur the line between transactional and analytical elements (e.g. SAP S/4HANA). In this case, you must decide which is the best storage method to use. Typically, column-based tables are recommended.

Note

It is not possible to define a table that is both row-based and also column-based.

Supporting Transactional and Analytical Applications

Watch this video to learn about the Delta Merge Concept.

Delta Merge Concept

SAP HANA Cloud database column tables utilizes data compression techniques to encode and store data that is ready for super-fast column scans and calculations.

Whilst this produces excellent read performance, writing new data to the database presents a challenge.

This is because the data in column store tables is compressed and inserting new data directly into the compressed data is not possible. This is because the column store data compression uses an encoding technique that analyzes the complete data set to generate a compressed version of the data. To add new records directly would require that the already-compressed data is first uncompressed, the new records added, and then the entire data set re-compressed. Doing this for each newly arriving record in the database would be costly and most of the time would be spent uncompressing and re-compressing data.

SAP HANA Cloud database overcomes this challenges by splitting up a column store table into two areas: delta store and main store. All write operations use the delta store to stage data. The delta store data is uncompressed. This means new data can be inserted directly and speedily. The main store manages data that is compressed. This data is encoded and optimized for very fast read performance. We do not write directly into this store.

All read operations combine the main and delta stores so that all data regardless whether it is in main or delta, is used by all applications. Developers does need to be concerned with the technicalities of delta and main store. They simply request data from the table without being concerned if the data is in main or delta store.

Of course, as more data is added to the column store table, the delta store will eventually become filled. The larger the delta store, the less efficient the read operation becomes as the combining of the main and delta becomes more and more costly. Therefore applications can begin to suffer from poor performance. In addition, database memory is not optimized and becomes increasingly filled as delta store data is uncompressed.

When the delta store has reached a threshold - determined by system parameters or evaluated by an application - a delta merge operation is performed. A delta merge operation combines the delta store with main data to create a new main store. The delta store is then emptied and the cycle can begin again.

Data Footprint Reduction

Data Compression

The data in the SAP HANA Cloud column store tables is automatically compressed. This is done to reduce the data footprint. It is not unusual to achieve up to 90% reduction in the data footprint.

The following are some of the benefits of a reduced data footprint:

  • You can fit entire enterprise databases into memory and avoid disk access which ensures high performance for searches and calculations.

  • You can get more data into CPU cache and therefore reduce main memory access, to further achieve high performance.

  • You can fit more data into your chosen SAP HANA Cloud configuration so that you maximize your investment.

The amount by which data reduction can take place is determined by the shape of the business data. Compression results are most effective when there is a lot of repetition in the column values. Let's look at an example:

A multi-million row sales order table contains the column COUNTRY which can only have the values : Belgium, Denmark, France, Italy, or Spain. This would mean those five values would repeat many times in the column. This is wasteful repetition.

  1. Compression removes the repetition of values in a column by storing each distinct value only once in a special place called a dictionary store. For each unique dictionary value, an integer is generated. So, in our example, the dictionary store would contain just five rows, each with its own integer that represents the real country name.

  2. Indexes are then generated which places each integer alongside the original record position.
  3. We no longer need the original uncompressed data with its repetitions and this is removed from the table to create a much smaller footprint.

Apart from reducing the storage, replacing repetitive string values with integers is very efficient for in-memory processing.

The example above is based on a compression technique called dictionary encoding. Dictionary encoding is a first level compression technique and is applied to all columns of a column store table. But there are many other compression techniques that can be further applied as a second level compression. These include run-length encoding, cluster encoding, and prefix encoding.

Compression techniques are deeply embedded in the SAP HANA Cloud database and are invisible to all users and developers. It is possible to view the compression details of each column table using the Database Explorer tool.

Compression is applied during the delta merge operation. That means, until the first delta merge is performed, data inserted into a column store table remains uncompressed. So if you fill a table with a large number of initial records, a delta merge should be performed as a priority to reduce the footprint.

Each subsequent delta merge operation regenerates the compression data to consider the newly inserted and updated records.

Note
Compression is not relevant for row store tables

Save progress to your learning plan by logging in or creating an account