Analyzing Disk and I/O Issues

Objective

After completing this lesson, you will be able to analyze disk and I/O issues.

SAP HANA Storage Usage

SAP HANA Storage and I/O Usage

SAP HANA operates with all data in-memory, but also uses persistent storage to keep the data safe in case of a system failure. Data changes are stored immediately (synchronously) in the Log Volume and the transaction data and undo data is stored (asynchronously) in the Data Volume using savepoints.

Data Volume explained
  • Contains SQL data and undo log information
  • Stores additional SAP HANA information, such as modeling data
  • Data kept in-memory to ensure maximum performance
  • Write process is asynchronous
Log Volume explained
  • Information about data changes (redo log)
  • Directly saved to persistent storage when transaction is committed (synchronous)
  • Cyclical overwrite (only after backup)
Screenshot of an I/O Pattern per Operation table with the following columns: Scenarios, Write Transaction, Savepoint, Snapshot, Delta merge, DB Restart, Fail-over, Take-over, ColumnStore table load, Data Backup, Log Backup, Database Recovery, and Queries.

Although SAP HANA is an in-memory database, I/O still plays a critical role in system performance. From an end user perspective, if there are issues with I/O performance, an application, or the system as a whole, it runs sluggishly, is unresponsive, or can seem to hang.

In certain scenarios, data is read from or written to disk, for example during the COMMIT transaction. Normally, this is done asynchronously, but at certain points, synchronous I/O is performed. Even during asynchronous I/O, important data structures may be locked.

Here are some details for each of the scenarios:

Savepoint

A savepoint ensures that all changed, persistent data since the last savepoint is written to disk.

By default, the SAP HANA database triggers savepoints at five minute intervals. Data is automatically saved from memory to the data volume located on disk. Depending on the type of data, the block sizes vary between 4 KB and 16 MB.

Savepoints run asynchronously to SAP HANA update operations. Database update transactions only wait at the critical phase of the savepoint, which usually takes microseconds.

Write Transactions

All changes to persistent data are captured in the redo log. SAP HANA asynchronously writes the redo log with I/O orders of 4 KB to 1 MB size into log segments. Transactions writing a Commit into the Redo log wait until the buffer containing the commit has been written to the log volume.

Delta Merge

The delta merge itself takes place in-memory. Updates to column store tables are stored in the delta storage. During the delta merge, these changes are applied to the main storage, where they are stored, read, optimized, and compressed. After the delta merge is complete, the new main storage is persisted in the data volume, that is, written to disk. The delta merge does not block parallel read and update transactions.

Data Backup

For a data backup, the current payload of the data volumes is read and copied to backup storage. When writing a backup, it is essential that there are no collisions with other transactional operations running against the database on the I/O connection.

Log Backup

Log backups store the content of a closed log segment. They are automatically and asynchronously created by reading the payload from the log segments, and writing them to the backup area.

Snapshot

SAP HANA database snapshots are used by certain operations, such as backup and system copy. They are created by triggering a system-wide consistent savepoint. The system keeps the blocks belonging to the snapshot at least until the drop of the snapshot. Detailed information about snapshots can be found in the SAP HANA Administration Guide.

Database Restart

At database startup, the services load their persistence, including catalog and row store tables, into memory. This means that the persistence is read from the storage. Additionally, the redo log entries written after the last savepoint are read from the log volume and replayed in the data area in-memory. When this is finished, the database is accessible. The bigger the row store, the longer it takes for the system to become available for operation again.

Database Recovery

The restore of a data backup reads the backup content from the backup device and writes it to the SAP HANA data volumes. The I/O write orders of the data recovery, have a size of 64 MB. The redo log can be replayed during a database recovery, that is, the log back­ups are read from the backup device and the log entries get replayed.

Failover (Host Auto-Fail­Over)

On the standby host, the services run in idle mode. Upon failover, the data and log volumes of the failed host are automatically assigned to the standby host. The standby host then has read and write access to the files of the failed active host. Row and Column Store tables (the latter on demand) must be loaded into memory. The log entries have to be replayed.

Takeover (System Replication)

The secondary system is already running, services are active but cannot accept SQL, and thus are not usable by the application. As in the database restart (described earlier) the row store tables need to be loaded into memory from persistent storage. If the pr­load table is used, then most of the column store tables are already in-memory. During takeover, the replicated redo logs, shipped since the last data transport from primary to secondary, must be replayed.

SAP HANA Disk-related Alerts

Screenshot of SAP HANA Disk-related Alerts, as described in the following text.
Alert 2 - Disk usage
Determines what percentage of each disk containing data, log, and trace files is used. This includes space used by non-SAP HANA files.
Alert 28 - Most recent savepoint operation
Determines how long ago the last savepoint was defined, that is, when a complete, consistent image of the database was persisted to disk.
Alert 30 - Check internal disk full event

Determines whether or not the disks to which data and log files are written are full. A disk-full event causes your database to stop and must be resolved. This alert is issued when it is not possible to write to one of the disk volumes used for data, log, backup, or trace files. As well as running out of disk space, there are other possible causes. All causes lead to this alert.

Issues that may prevent SAP HANA from writing to disk include the following:

  • File system quota is exceeded.
  • File system ran out of nodes.
  • File system has errors (bugs).

In all cases, the solution is to free up disk space.

Alert 34 - Unavailable volumes
Determines whether or not all volumes are available.
Alert 50 - Number of diagnosis files
Determines the number of diagnosis files written by the system (excluding ZIP files). An unusually large number of files can indicate a problem with the database (for example, problems with trace file rotation or a high number of crashes).
Alert 51 - Size of diagnosis files
Identifies large diagnosis files. Unusually large files can indicate a problem with the database.
Alert 52 Crashdump files
Identifies new crashdump files that have been generated in the trace directory of the system.
Alert 53 - Pagedump files
Identifies new pagedump files that have been generated in the trace directory of the system.
Alert 54 - Savepoint duration
Identifies long-running savepoint operations.
Alert 60 - Sync/Async read ratio
Identifies a bad trigger asynchronous read ratio. This means that asynchronous reads are blocking and behave almost like synchronous reads. This might have negative impact on SAP HANA I/O performance in certain scenarios.
Alert 61 - Sync/Async write ratio
Identifies a bad trigger asynchronous write ratio. This means that asynchronous writes are blocking, and behave almost like synchronous writes. This may have a negative impact on SAP HANA I/O performance in certain scenarios.
Alert 77 - Database disk usage
Determines the total used disk space of the database. All data, logs, traces and backups are considered.
Alert 89 - Missing volume files
Determines if there is any volume file missing.
Alert 113 - Host open file count

Determines what percentage of total open file handles are in use. All processes are considered, including non-SAP HANA processes. Compare M_HOST_RESOURCE_UTILIZATION.OPEN_FILE_COUNT with M_HOST_INFORMATION.VALUE of M_HOST_INFORMATION.KEY open_file_limit.

Note

For more information about SAP HANA alerts, see SAP Note 2445867 - "How-To: Interpreting and Resolving SAP HANA Alerts".

Monitoring the Persistence Storage

In the Database Directory screen, in the Disk column, the mini graph will turn red and the disk usage percentage will be very high to signal a disk full situation but you can't see which disk is running out of space. To get more detailed information on the disk usage, you can choose the disk space indicator in the Disk column. The Performance Monitor app will open and the Disk Used and Disk Size KPIs are shown. You can add additional disk-related KPIs to get a better insight into who is filling up the file system.

The Alerts column will show that there are alerts, but you can't see which alert was triggered. To get more detailed information on the alerts, you can choose the alert shown in the Alerts column. The Alerts app will open and the alert details are shown.

Screenshot showing the SAP HANA Cockpit Storage Information, as described in the preceding and following text.

In SAP HANA cockpit, disk-related information is found via the Disk Usage card and the Monitor Disk Volume application.

Screenshot of the Disk Usage Information page, highlighting information such as Disk Usage, Monitor Disk Volume, Disk performance monitor, and File Details, as described in the preceding and following text.

The Monitor Disk Volume application provides information about the size of the data and log volumes, the storage locations, the disk storage throughput, and the used page statistics in the data volume.

The Disk Usage card opens the Performance Monitor showing the Disk Size and Disk Used information over time. This information helps you to understand when the growth of the used disk space started and when the system ran out of space. By adding additional KPIs like Data Read/Write and Log Read/Write, you can determine if the disk full event is caused by SAP HANA writing huge amounts of data or log information.

Screenshot showing additional details about storage usage in the Performance Monitor app.

Note

For more information about disk I/O analysis, see SAP Note 1999930 - "FAQ: SAP HANA I/O Analysis".

Log in to track your progress & complete quizzes