Checking for SAP HANA Alerts

Objectives

After completing this lesson, you will be able to:
  • Check for SAP HANA alerts
  • Configure alert thresholds

Handling SAP HANA Alerts

SAP HANA Alerts

The SAP HANA database is continuously collecting and evaluating information about status, performance, and resource usage from all its components.

As an administrator, you actively monitor the status of the system and its services and the consumption of system resources. However, you are also alerted to critical situations, for example: a disk is becoming full, CPU usage is reaching a critical level, or a server has stopped.

A summary of all alerts in the database is available on the home page of the SAP HANA cockpit. To get more information about these alerts, and to analyze the historical occurrence of alerts, you can drill down into the Alerts application.

In addition, several configuration options are available so that you can tailor alerts in the SAP HANA database to your needs. For example, you can change the alerting thresholds, setup email notification of alerts, and switch particular alerts on or off.

On the Alerts card, the alerts are counted and grouped by the 10 most important alert categories defined in SAP HANA. Use the View By KPA (Key Performance Area) to switch between Alert Categories and Alert KPA. You can refresh the displayed data by using the SAP HANA Cockpit Refresh - Now button in the top-right corner.

To open the Alerts application, choose the Alerts card.

The internal monitoring infrastructure of the SAP HANA database is continuously collecting and evaluating information about status, performance, and resource usage from all components of the SAP HANA database. It also performs regular checks on the data in the system tables and views, and issues alerts to warn you of potential problems when configurable threshold values are exceeded. The priority of the alert indicates the severity of the problem, and depends on the nature of the check and configured threshold values. For example, if 90% of available disk space is used, a low-priority alert is issued; if 98% is used, a high priority alert is issued.

PriorityDescription
InformationAction recommended to improve system performance or stability
LowMedium-term action required to mitigate the risk of downtime
MediumShort-term action required (few hours, days) to mitigate the risk of downtime
HighImmediate action required to mitigate the risk of downtime, data loss, or data corruption

Analyze Alerts

The most important alerts are shown on the Alerts card. This makes them highly visible and helps the database administrator to quickly investigate the shown alerts. To open the Alerts app, on the Overview page of the SAP HANA cockpit, choose the Alerts card. All of the latest alerts are displayed in list format on the left.

Find and select the alert that you want to analyze using the options available for filtering, searching, and sorting. Detailed information about the alert is shown on the right, including a graph displaying how often the alert has been issued over a certain time frame.

Select the time frame that you want to analyze. By default, the number of occurrences per hour over the last 24 hours is displayed.

To further investigate the displayed alert, the following additional options are available:

  • You can use the Filter fields (1) to specify alerts, priorities, time ranges or categories.
  • You can use the Type dropdown list (2) to switch between the current and past alerts in the system. This can be useful when investigating the root-cause of a problem that might is caused by prior problems.
  • You can use the Sort table button (3) to change the sort order of the displayed alerts.
  • You can use the Details button (4) to check the full alert text and the last occurrence.
  • You can use the Alert Definition Details button (5) to view the next scheduled run and interval settings.
  • You can use the Proposed Solution button (6) to get an SAP related solution option.
  • You can use the Occurrences button (7) to review the history of the alert.
  • You can use the Check Now button (8) to execute the alert checker manually.
  • You can use the Edit Alert Definition button (9) to call up the alert configuration editor. With the alert checker application you can configure threshold settings, interval values, set up an email recipient for the alert, and activate or deactivate the alert checker.

    Caution

    Deactivating an alert checker is a bad practice as you did not solve the problem. You only stopped being notified about the problem.

The data collected by the data collectors of the statistics service is deleted after a default number of days. The majority of collectors have a default retention period of 42 days.

When you select an alert, detailed information about the alert is displayed on the right. The following detailed information about an alert is available:

  • Category

    Displays the category of the alert checker that issued the alert.

    Alert checkers are grouped into categories, for example, those related to memory usage, those related to transaction management, and so on.

  • Next Scheduled Run

    Displays when the related alert checker is next scheduled to run.

    If the alert checker has been switched off (alert checker status Switched Off) or it failed the last time it ran (alert checker status Failed), this field is empty because the alert checker is no longer scheduled.

  • Interval

    Displays the frequency at which the related alert checker runs.

    If the alert checker has been switched off (alert checker status Switched Off) or it failed the last time it ran (alert checker status Failed), this field is empty because the alert checker is no longer scheduled.

  • Alerting Host and Port

    Displays the name and port of the host that issued the alert.

    In a system replication scenario, alerts issued by secondary system hosts can be identified here. This allows you to ensure availability of secondary systems by addressing issues before an actual failover.

  • Alert Checker

    Displays the name and description of the related alert checker.

  • Proposed Solution

    Displays the possible ways of resolving the problem identified in the alert, with a link to the supporting app, if available.

  • Past Occurrences of Alert

    A configurable graphical display that indicates how often the alert occurred in the past.

How Does It Work?

In addition, several configuration options are available so that you can tailor alerting in the SAP HANA database to your needs (for example, changing alerting thresholds, switching off particular alerts, and setting up email notification of alerts).

As an SAP HANA database administrator, you need to monitor the status of the system and its services and the consumption of system resources. When critical situations arise, you need to be notified so that you can take appropriate action in a timely manner. For data center operation and resource allocation planning, you must analyze historical monitoring data. These requirements are met by SAP HANA's internal monitoring infrastructure.

These monitoring and alerting features of the SAP HANA database are performed by the statistics service. The statistics service is a central element of SAP HANA's internal monitoring infrastructure. It notifies you when critical situations arise in your systems and provides you with historical monitoring data for analysis. It collects statistical and performance information using SQL.

Alerting Framework Architecture

The statistics service collects and evaluates information about status, performance, and resource consumption from all components belonging to the system.

Monitoring and alert information are stored in database tables in a dedicated schema (_SYS_STATISTICS). From there, the information can be accessed by the SAP HANA cockpit. The data from system views is evaluated against certain threshold values, which can then trigger configured follow-up actions, such as an email notification.

The statistics service is implemented by a set of tables and SQLScript procedures in the master index server and by the statistics scheduler thread that runs in the master name server. The SQLScript procedures either collect data (data collectors) or evaluate alert conditions (alert checkers). Procedures are invoked by the scheduler thread at regular intervals, which are specified in the configuration of the data collector or alert checker. Data collector procedures read system views and tables, process the data (for example, if the persisted values need to be calculated from the read values) and store the processed data in measurement tables for creating the measurement history.

This scheduler thread is part of the statistics server that runs in the nameserver service. Calls are sent to the indexserver to call SQLScript procedures.

Alert checker procedures are scheduled independently of the data collector procedures. They read current data from the original system tables and views, not from the measurement history tables. After reading the data, the alert checker procedures evaluate the configured alert conditions. If an alert condition is fulfilled, a corresponding alert is written to the alert tables. From there, it can be accessed by monitoring tools that display the alert. It is also possible to have email notifications sent to administrators if an alert condition is fulfilled. Depending on the severity level of the alert, summary emails are sent at the appropriate frequency (hourly, every 6 hours, or daily). You can also trigger alert checker procedures directly from monitoring tools (for example, SAP HANA cockpit).

Data Management in the Statistics Service

The following mechanisms exist to manage the volume of data collected and generated by the statistics service:

  • Configurable data retention period

    The data collected by the data collectors of the statistics service is deleted after a default number of days. The majority of collectors have a default retention period of 42 days. For a list of those collectors that have a different default retention period, execute the following statement:

    Code Snippet
    1
    SELECT o.name, s.retention_days_default FROM _SYS_STATISTICS.STATISTICS_SCHEDULE s, _SYS_STATISTICS.STATISTICS_OBJECTS o WHERE s.id = o.id AND o.type = 'Collector'and s.retention_days_default != 42 order by 1;

    You can change the retention period of individual data collectors with the following SQL statement:

    Code Snippet
    1
    UPDATE _SYS_STATISTICS.STATISTICS_SCHEDULE set RETENTION_DAYS_CURRENT=<retention_period_in_days> where ID=<ID_of_data_collector>;

    Note

    To determine the IDs of data collectors execute the statement:

    Code snippet
    SELECT * from _SYS_STATISTICS.STATISTICS_OBJECTS where type = 'Collector';
    Expand

    Alert data in the _SYS_STATISTICS.STATISTICS_ALERTS table is also deleted by default after 42 days. You can change this retention period with the following statement:

    Code Snippet
    1
    UPDATE _SYS_STATISTICS.STATISTICS_SCHEDULE set RETENTION_DAYS_CURRENT=<retention_period_in_days> where ID=6002;

  • Maximum number of alerts

    By default, the number of alerts in the system (that is, rows in the table _SYS_STATISTICS.STATISTICS_ALERTS_BASE) cannot exceed 1,000,000. If this number is exceeded, the system starts deleting rows in increments of 10 percent, until the number of alerts is below the maximum.

    To change the maximum number of alerts permitted, add a row with the key internal.alerts.maxrows and the new maximum value to the table _SYS_STATISTICS"."STATISTICS_PROPERTIES.

    Code Snippet
    1
    INSERT INTO _SYS_STATISTICS.STATISTICS_PROPERTIES VALUES ('internal.alerts.maxrows', 500000);

Statistics Service in Multitenant Database Containers

In multiple-container systems, the statistics service runs as an embedded process in the (master) index server of every tenant database. Every database has its own _SYS_STATISTICS schema.

Monitoring tools such as the SAP HANA cockpit allow administrators in the system database to access certain alerts occurring in individual tenant databases. However, this access is restricted to alerts that identify situations with a potentially system-wide impact, for example, the physical memory on a host is running out. Alerts that expose data in the tenant database (for example, table names) are not visible to the system administrator in the system database.

Alert Definition

In the Alert Definitions application, you have an overview of all the available SAP HANA alerts. When you select an alert, you can set up email notifications and edit the properties for the selected alert. Optionally, use the Apply to other databases button to apply the alert definition to multiple databases. In the Select Databases dialog box, choose the target databases from the list, and select OK. If you have fixed the problem, or adjusted the threshold values to fit your situation, then you can start an alert checker run outside the default schedule by choosing the Check Now button. This rechecks the alert conditions and, depending on the situation, confirms the alert or triggers the alert again.

With the Configure Email feature, you can specify the email sender, SMTP server, SMTP port, and default recipients. After the email notification is set up, the SAP HANA database system sends notification emails to the default recipients whenever this alert occurs. To add a recipient specifically for an alert, use the Edit button.

Note

Add a recipient specifically for an alert definition, the default recipients will no longer receive alerts.

Configuring Alerts

In the Configure Alerts app you can configure the following:

  • Change the threshold values that trigger alerts of different priorities.

  • Set up email notifications so that specific people are informed when alerts are issued.

You can also perform the following actions on alert checkers:

  • Run alert checkers on a once-off basis regardless of their configured schedule or status.

  • Switch alert checkers on or off.

Alert Checker Details

When you select an alert checker Alert Configuration, detailed information about the alert checker and its configuration is displayed on the right.

You can view the following detailed information about an alert checker:

DetailDescription
Header informationThe name of the alert checker, its status, and the last time it ran.
DescriptionA description of what the alert checker does, for example, what performance indicator it measures or what setting it verifies.
Alert Checker IDThe unique ID of the alert checker.
CategoryThe category of the alert checker.

Alert checkers are grouped into categories, for example, those related to memory usage, those related to transaction management, and so on.

Threshold Values for Prioritized AlertingThe values that trigger high, medium, low, and information alerts issued by the alert checker.

The threshold values and the unit depend on what the alert checker does. For example, alert checker 2 measures what percentage of disk space is currently used, so its thresholds are percentage values.

Note

Thresholds can be configured for any alert checker that measures variable values that should stay within certain ranges, for example, the percentage of physical memory used, or the age of the most recent data backup. Many alert checkers verify only whether a certain situation exists or not. Threshold values cannot be configured for these alert checkers. For example, alert checker 4 detects service restarts. If a service was restarted, an alert is issued.
IntervalHow often the alert checker runs.
Schedule ActiveIndicates whether the alert checker is running automatically at the configured interval.
Proposed SolutionPossible ways of resolving the problem identified by the alert checker.

Alert Checker Statuses

The status of an alert checker indicates whether it is running on schedule, has failed and has been disabled by the system, or has been switched off. The following status states are possible:

  • Active

    The alert checker is running on schedule.

  • Failed

    The alert checker failed the last time that it ran (for example, due to a shortage of system resources), so the system disabled it.

    The alert checker remains disabled for a specific length of time before it is automatically re-enabled. The length of time is calculated based on the values in the following columns of the table STATISTICS_SCHEDULE (_SYS_STATISTICS):

    • INTERVALLENGTH

    • SKIP_INTERVAL_ON_DISABLE

    Once INTERVALLENGTH x SKIP_INTERVAL_ON_DISABLE has elapsed, the alert checker is re-enabled. The default values for all alert checkers mean that failed checkers remain disabled for one hour. Every 60 seconds, the system determines the status of every alert checker and/or whether the time to re-enablement has elapsed.

    You can also re-enable the alert checker manually by switching it back on in Alert Configuration.

  • Switched Off

    You switched off the alert checker schedule.

    If you want the alert checker to run again automatically, you must manually switch it back on.

Configure Alerting Thresholds

In many cases, you can configure the thresholds that trigger an alert. An alert checker can have a low, medium, or high priority threshold.

Thresholds can be configured for any alert checker that measures variable values that should stay within certain ranges, for example, the percentage of physical memory used, or the age of the most recent data backup. Many alert checkers verify only whether a certain situation exists or not. Threshold values cannot be configured for these alert checkers. For example, alert checker 4 detects service restarts. If a service was restarted, an alert is issued.

Alerts are issued when the alert checker records values that exceed the configured thresholds.

Switch Alerting On/Off

If you no longer want a particular alert to be issued, you can switch off the underlying alert checker so that it no longer runs automatically according to the schedule. Alert checkers that have been disabled by the system must be switched back on manually.

In some situations you may want to stop a particular alert from being issued, either because it is unnecessary (for example, alerts that notify you when there are other alerts in the system) or because it is not relevant in your system (for example, backup-related alerts in test systems where no backups are performed).

Caution

If you switch off alerts, you may not be warned about potentially critical situations in your system.

You can switch an alert checker on again at any time. You may also want to switch on alert checkers that the system has disabled, such as checkers with the status Failed. The system automatically disables alert checkers when they fail to run, for example, due to a shortage of system resources.

The system automatically switches failed alert checkers back on after a certain length of time. For more information, see Alert Checker Statuses.

You can disable an alert for a particular table or schema. You can do this for the alerts "Record count of non-partitioned column-store tables" (ID 17) and "Table growth of non-partitioned column-store tables" (ID 20).

To exclude an alert from being issued for a particular table, use the following SQL statement:

Code Snippet
1
INSERT INTO _sys_statistics.statistics_exclude_tables VALUES (<alert_id>, '<schema_name>', '<table_name>')

To exclude an alert from being issued for all tables of a particular schema, use the following SQL statement:

Code Snippet
1
INSERT INTO _sys_statistics.statistics_exclude_tables VALUES (<alert_id>, '<schema_name>', null)

To re-enable the alerts, delete the entries from the table _sys_statistics.statistics_exclude_tables.

If you switched off the alert checker, its status changes to Switched Off and it is no longer scheduled to run automatically.

If you switched on the alert checkerfigured schedule.

Set Up Email Notification

You can configure alert checkers so that you and other responsible administrators receive push notifications by email when alerts are issued.

If you want to be notified by email about new alerts when they are issued, you can set this up in Alerts Configuration. You can configure one or more default recipients to be notified when any alert checker issues an alert. Also, if different people need to be notified about different alerts, you can configure dedicated recipients for these alert checkers.

Note the following behavior:

  • If you configure checker-specific recipients, default recipients are not notified.

  • If you delete all checker-specific recipients, default recipients are notified again, if configured.

  • You can configure checker-specific recipients regardless of whether or not default recipients are configured.

The configured recipients receive an email when an alert checker issues an alert. If the alert checker issues the same alert the next time it runs, no further emails are sent. However, when the alert checker runs and does not issue an alert, indicating that the issue is resolved or no longer occurring, a final email is sent.

Check for Alerts Out of Schedule

In general, alert checkers run automatically according to a configured schedule. If necessary, you can run an alert checker on a once-off basis outside of its schedule.

In some cases, you may want to check for a particular alert outside of the alert checker's configured schedule. For example, to verify that the problem identified by a previous alert has been resolved.

Running an alert checker in this specific way does not affect its configured schedule.

Note

If you want to manually run an alert checker with the status Switched Off or Failed, you must switch it back on first.

Log in to track your progress & complete quizzes