Health Monitoring

Objectives

After completing this lesson, you will be able to:
  • Describe the strategy, goals, and scope of Health Monitoring
  • Perform Health Monitoring and check the self-monitoring for SAP Cloud ALM
  • Summarize configuration possibilities in Health Monitoring

Strategy, Goals, and Scope of Health Monitoring

In the Health Monitoring application, you can check the health of your monitored services and systems from an application and customer perspective.

The following graphic explain the concept of Health Monitoring of services and systems:

Graphic explaining the concept of Health Monitoring in SAP Cloud ALM, focusing on collecting availability data, health status, metrics, and resource usage for SAP SaaS Services, SAP PaaS Services, and managed systems. With this data Health Monitoring allows for monitoring, analytics, and alerting.

The metrics are collected on a regular basis and can be used to calculate the overall health of a managed component by executing application health checks for SAP based SaaS services and delivering technical metrics and events for customer applications built on SAP Business Technology Platform (SAP BTP), which represents the SAP PaaS Service. To cover a hybrid landscape, metrics are provided for light-weight system monitoring for SAP ABAP based On-Prem systems as well.

The image below displays the main capabilities:

Monitoring, Analytics, Alerting, and Resolution as key capabilities of Health Monitoring.

The application provides embedded alerting including the capability to trigger notifications and corrective actions. The embedded analytics allows to analyze trends and root causes for discovered problems. In detail, the following features are available in Health Monitoring:

  • Monitoring: Get informed about the healthiness of cloud services from an application perspective.
  • Analytics: Identify trends and usage of services and resources.
  • Alerting: Identify critical situations and notify the relevant users.
  • Resolution: Analyze the root cause of an issue and trigger operation automation procedures.

Health Monitoring and Self-Monitoring for SAP Cloud ALM

When you open the Health Monitoring app, you will be taken to the Home page where you can view an Overview of your health information.

Overview page of Health Monitoring with card tiles that contain additional information on quality indicators.

The home page looks similar to the home pages of other monitoring use cases in SAP Cloud ALM for Operations.

When opening the home page (1) the Overview screen is presented in the content area. Here you can see the current status of all services or systems, which are connected to Health Monitoring. An overall health is calculated based on the number of metrics in warning or critical status.

This page shows the current health status of your services/systems in scope and your favorites. The health status is defined by a percentage, which determines its rating color.

The health percentage is determined by the mean of the health score of the individual metric

Health Percentage

Metric RatingMetric Health Score
InformationNot included in health calculation
OK100%
Warning50%
Critical0%
FatalThe health of the entire service is 0%

The service health rating depending on health percentage is as follows:

Service Health Rating

Service Health PercentageService Health Rating
lower 80%Critical
80% - 99%Warning
100%OK

For health details of the services displayed in a single card, select the card title (2). The quality indicator (3) shows the worst data collection status for the components of the corresponding type. Select the icon to see the data collection statuses of the individual components. To display the status message and corresponding technical information, choose the particular row. Note that only components with current data from the last 24 hours are taken into account. This means that components for which no current data exist, are not listed here. 

To display the Alerting page, select the alert number (Field: Open Alerts) (4) in the card footer. The displayed alerts are filtered by the components of the corresponding card. In the footer of each card, the most severe status of the associated components regarding business service events is displayed as a calender sheet icon (5). Depending on the color, the status is as follows:

Colors, Illustrating the Status 1

ColorStatus
GreenNormal Operations
BlueMaintenance
OrangeDegradation
RedDistribution

The current status is displayed and the status Maintenance is also displayed, if the corresponding event starts within the next hour. Choose the calender sheet icon to see further information about the status.

In the lower right corner of every card, a (i) button (6) is displayed. Choose it to see the single services or systems of the corresponding service type including their status without having to leave the overview page.

To add or remove the service type card to your favorites, choose Add to Favorites / Remove from Favorites (7) in the lower right corner of the card.

The Monitoring page (1) shows the health of your selected managed components. The health of a component is defined by a percentage, which determines its rating color:

Screenshot of the Monitoring page in Health Monitoring showing services together with their status, number of alerts, and health.

For every displayed managed component, the following information is displayed:

  • Name (2) and Type (3) of the managed component.
  • Status (4) regarding business service events.

Depending on the color of the icon, the status is as follows:

Colors, Illustrating the Status 2

ColorStatus
GreenNormal Operations
BlueMaintenance
OrangeDegradation
RedDistribution

The current status is shown. The status Maintenance is also displayed if the corresponding event starts within the next hour. Choose the calender sheet icon to see further information about the status.

  • The number of alerts (5) in the component.
  • The health percentage (6) of the component.

It is determined by the mean health score of the individual metrics:

Metric Health Score and the Display of it

Metric RatingMetric Health Score
InformationNot included in health calculation
OK100%
Warning50%
Critical0%
FatalThe health of the entire service is 0%

In the column Message (7), the service health rating is displayed in squared brackets, followed by the number of metric in the different statuses. The Data Collection Status (8) specifies, whether the data collection is running. To display the status message and the corresponding technical information, choose the corresponding icon. To display the health of any managed component in full detail, select the corresponding row (9) in the list. 

After you have selected a managed component, all metrics of the component with their properties, ratings, values including the history are displayed in the Health at Metric Level page:

Screenshots of the Monitoring page, Metric overview tab in Health Monitoring, showing all metrics displayed as tiles in different sections. When selecting a tile, more details are displayed.

Metrics have threshold settings that are determined by the managed component. You can, however, override these settings manually by creating custom rules. For more information, see SAP Help Portal – SAP Cloud ALM for Operations – Health Monitoring – Configuration.

You can display the metrics in two ways (1):

  • Metrics Overview.
  • All Metrics.

Within the Metrics Overview:

For each monitored service type there is a pre-configured grouping, for example, ABAP System, HANA Database, Application Jobs, etc. (2) of the monitored metrics that ensures a quick health overview of the monitored object.

The different metrics are displayed as tiles in different sections, which represent topic areas within the managed component. If different label values exist for a metric, the metric is displayed with metric values and ratings for every label value in one tile. If you choose one metric tile : (3), all details of the corresponding metric (4) are displayed in a list similar to the tab All Metrics.

The main difference from a UI perspective is that if you select a metric in the Metric Overview, the label types of the metric are displayed as columns with the different label values.

Please find below the screenshots displaying the use of the All Metric view:

Screenshots of the Monitoring page, All Metrics tab in Health Monitoring, showing all metrics displayed as lines. When selecting the link in column History, the data is displayed in suitable time periods.

In All Metrics tab (1), all metric properties, values, and ratings, including the metric history (2), are displayed in a list.

Note

For more information about the displayed properties of the metrics, see the in-app help (the question mark icon) on the top right of the screen for help of the different columns.

For each metric it is possible to show the historical values of the metric. In the column Metric History  (2). Here you can identify trends or investigate at which time a specific resource shortage has occurred. Depending on the metric, the data is displayed in suitable time periods and resolutions. You can change both values by choosing Select Time Frame (3). By default, only the metric with the selected label value is displayed. This label value is displayed above the graph. To display other label values, use a filter for the corresponding label type.

In the Alerting page (1) of the Health Monitoring application, the alerts that you have activated are displayed in a list (2):

Screenshot of the Alerts page in Health Monitoring, showing alert name and message as well as alert details.

Alerts originating from Health Monitoring are based on the rating of the related metric. For these alerts, the metrics with their properties are displayed in the alert details (3).

Select an alert to find out more. You can then carry out several actions (4):

  • Confirm the alert if it's resolved. This changes the alert status and removes it from the list. When the same threshold violation occurs again, the alert will reappear in the list.
  • Add comments to be saved in the alert action log.
  • Assign and remove processors.
  • Send email notifications.
  • Start operation flows.
  • Access associated tickets and create new ones.

Health Monitoring can also be used to perform a self-monitoring of the SAP Cloud ALM applications:

Screenshot on Self-Monitoring in Health Monitoring for SAP Cloud ALM, tracking the internal health of applications and data collection processes.

Each application can provide information about its internal health (for example if all necessary jobs are running), as well as about issues during data collection (for example failed collectors). Additionally, you can see information about the HANA memory size of your SAP Cloud ALM tenant and the memory usage per application.

SAP Cloud ALM self-monitoring is automatically activated and covers metrics like:

  • SAP Cloud ALM HANA memory size.
  • Landscape Management: SLIS import job, number of services, number of systems, number of logical systems, number of endpoints.
  • Business Service Management: Triggering of Intelligent Event Processing (IEP) events.
  • Heath Monitoring: Data Collection status, Job status.
  • Real User Monitoring: Memory Size.
  • Business Process Monitoring: KPI Collection Status.
  • External API Management: Number of Successful notifications, Number of Failed Notifications, Number of Pull API calls.

Our demonstration video guides you through the features of the Health Monitoring application, including:

  1. Navigating the Scope Selector
  2. Utilizing the Metric Overview and All Metrics sections
  3. Analyzing metrics over time with the Metric History feature

Additionally, you'll learn how to leverage self-monitoring capabilities to determine the memory consumption of individual applications within SAP Cloud ALM.

You can follow along with this demonstration either by using the SAP Cloud ALM Public Demo tenant or your own SAP Cloud ALM tenant, if this use case has already been set up. To do this, simply select the Operations group and then choose the Health Monitoring tile.

Caution

Please be aware that the content may vary from what is shown in the video in both cases.

Configuration Possibilities in Health Monitoring

Health Monitoring provides standard content for all supported products. This content is being deployed automatically during the SAP Cloud ALM onboarding process and updated regularly. The standard content contains: metric descriptions, display settings for metrics, threshold settings for metrics, and events and metric assignments.

The screenshot below demonstrates how to begin the Configuration process.:

Screenshot of the Overview page of Health Monitoring indication the steps to configure the monitoring of a managed component.

To monitor a managed component in the Health Monitoring application, you need to perform the following steps:

  1. Connect services and systems: Before you can use the Health Monitoring app in SAP Cloud ALM, you have to connect your subscribed services and systems to your SAP Cloud ALM instance. This process is depending on the type of your managed component. For more information, see SAP Cloud ALM for Operations Expert Portal – Health Monitoring - Setup & Configuration.
  2. Add the managed component(s) to the scope: To do so, choose Select a scope icon (1) and add the managed component to your scope.
  3. Then choose the Configuration icon (2) of the Health Monitoring application.

    Note

    Because in this area only components in scope are displayed, add your managed component to the scope first.

  4. Activate monitoring for the managed components: After connecting the managed component to SAP Cloud ALM, it depends on the type of the managed component whether the data collection is activated automatically. So you should always verify that the data collection is set to ON on the on/off slider  (3) in the configuration panel.
  5. To enter the service configuration app (in order to change Metrics, Events and Alert Notifications), just click the corresponding managed component name (4) from the configuration pane. See the next slide.

Default thresholds are provided out of the box for some metrics. Additionally, it is possible to set Custom Thresholds on each metric:

Screenshot of Health Monitoring showing details of the threshold configuration for the Host Memory Consumption with respect to a specific service.

The thresholds will define in which cases a metric rating is calculated as warning or critical.

Thresholds can be defined on:

  • The metric status (for example to ignore a warning)
  • The metric value (if the metric exceeds or falls below a specific value)
  • The metric usage (if a quota metric exceeds a specific percentage)
  • The metric text (for example if specific text patterns are contained or missing in the metric text)

The following screenshots show the Event configuration:

Screenshot of Health Monitoring showing details of the event configuration for Job Application Status values with respect to a specific service.

If a critical event is detected (for example, because a custom threshold is reached), the following event reactions can be configured:

  • Create Alert: This will create an alert which is then visible on the Alerting page.
  • Send Email: One or multiple email recipients can be configured. They will receive a notification email when this event occurs.
  • Start Operation Flow: An operation flow will be triggered automatically when the event is detected.
  • Create ServiceNow Ticket: A ServiceNow ticket is created automatically when the event is detected.
  • Send Chat Notification: A message is sent to a chat application, for example, to MS Teams.

For a better history of your metrics, you can aggregate long-term data to hourly and daily values using the Configuration Screen:

Screenshot showing the configuration of the data retention time for Health Monitoring .

In the Application Settings area, you can define the Data Retention Time (1). The daily housekeeping job will calculate and store aggregates from the raw data before deleting it.

To learn how to configure Health Monitoring settings, including activating or deactivating monitoring, adjusting application settings such as data retention time or modifying metric or event configurations, please watch our demonstration video:

You can follow along with this demonstration either by using the SAP Cloud ALM Public Demo tenant or your own SAP Cloud ALM tenant, if this use case has already been set up. To do this, simply select the Operations group and then choose the Health Monitoring tile.

Caution

Please be aware that the content may vary from what is shown in the video in both cases.

Log in to track your progress & complete quizzes