Discovering Observability and Monitoring in SAP BTP, Cloud Foundry Runtime

Objective

After completing this lesson, you will be able to apply techniques and SAP services for setting up comprehensive monitoring systems in SAP BTP, Cloud Foundry runtime.

Observability and Monitoring in SAP BTP, Cloud Foundry Runtime

Introduction

As your company's flagship application gains traction and user traffic spikes, performance issues and unexpected errors are beginning to surface. To maintain a seamless user experience and ensure system reliability, you need to implement robust observability and monitoring strategies. In this lesson, you'll discover how to utilize tools within the SAP BTP, Cloud Foundry runtime to gain deep insights into your application's performance and rapidly address any issues.

Overview

Observability and monitoring are your windows into the inner workings of your software and systems. Observability enables you to understand what's happening inside your applications and infrastructure by analyzing the data they produce. Meanwhile, monitoring leverages this observability data to actively check and alert on the system's status, ensuring that performance remains optimal and issues are quickly addressed.

SAP BTP, Cloud Foundry runtime provides following observability indicators:

Range of Indicators

Observability IndicatorsDefinitionSource
LogsQualitative records of events, errors, or transactions that help trace system changes and behavior.

Application logs from applications.

System logs from Cloud Foundry components.

MetricsQuantitative measurements such as CPU usage, memory consumption, and network traffic are used to reflect system performance.

Container metrics.

Custom metrics defined by developers.

EventsRecords of significant actions affecting resources, including Audit Events for granular insights into operations like deployments and user management.Generated by applications and Cloud Foundry components when an event initiates.

Logs

Logs in Cloud Foundry originate from two main sources: your applications and the Cloud Foundry system components.

  • Application logs provide insights into the operations and issues within your applications. They can be defined by developers in terms of content and format. See App logging in Cloud Foundry for more details.
  • System logs offer a glimpse into how the platform manages and interacts with your applications. They're generated by Cloud Foundry’s internal components, such as Cloud Controller.

The logging architecture of SAP BTP, Cloud Foundry runtime leverages Firehose architecture. At its core, Loggregator collects and combines logs into a standardized format, then pushes them into the Firehose data stream.

You can access real-time logs using the cf log command in the Cloud Foundry CLI. SAP BTP, Cloud Foundry runtime also provides a built-in Log Cache for short-term storage, facilitating real-time troubleshooting and analysis.

For long-term data retention and in-depth analysis, forwarding logs to external systems becomes necessary. You can configure Syslog Drains to selectively route specific types of logs to a syslog server for centralized storage. You can set up nozzles, customizable plugins that integrate with the Firehose, to channel the entire Firehose stream to an external log management system, such as Splunk or Elasticsearch.

Once your logs are in an external storage system, you can employ different visualization tools to create dynamic and insightful dashboards. For example, Kibana is often employed with Elasticsearch to explore logs interactively.

Get an Overview of Cloud Foundry Application Logs

In this tutorial, you'll discover the application logs and the various methods used to view them.

Perform the steps, you find here.

Metrics

We can separate Metrics in Cloud Foundry categories: container metrics and custom metrics, each serving distinct monitoring needs:

  • Container metrics reveal how your applications utilize the resources allocated to their containers, including CPU usage, memory consumption and disk space. Diego Cell collects these metrics approximately every 15 seconds and forwards them to Loggregator for aggregation. By default, container metrics are stored temporarily in Log Cache for quick retrieval, and accessible using Cloud Foundry CLI commands, such as cf app <app-name>. For long-term storage and historical analysis, metrics are usually forwarded to external systems like Prometheus through nozzles. Learn more at Container metrics.
  • Custom metrics are user-defined, which can measure transaction rates, error counts, user engagement metrics, or any other data relevant to your application's performance and business goals. This often requires instrumentation within your code, either by language-specific libraries (for example, expvar for Golang), third-party monitoring systems (for example, Prometheus), or SAP's Java library. Most metrics libraries allow metrics to be exposed via an HTTP endpoint. Alternatively, you can run an agent within the application’s container. These methods handle metrics collection independently from the application logic, offering flexibility in gathering and exporting custom metrics with minimal changes to the code.

Both types of metrics are usually stored directly in monitoring tools like Prometheus, enabling detailed analysis, alerts, and integration with other systems. For visualization, Grafana is a popular choice in the Cloud Foundry community.

Get an Overview of Cloud Foundry Application Container Metrics

In this tutorial, you'll gain a broad understanding of application container metrics.

Perform the steps, you find here.

Get an Overview of Custom Metrics

In this tutorial, you'll gain an overview of custom metrics and instrumentation.

Perform the steps, you find here.

Forward Logs and Container Metrics

In this tutorial, you'll discover how logs and container metrics are forwarded so the user can consume them.

Perform the steps, you find here.

Audit Events

In Cloud Foundry, audit events provide insights into resource interactions by documenting actions such as application deployments, service bindings, user management, and so on. Unlike security events, which focus on access and authorization, or usage events, which are geared towards billing, audit events offer a more granular view into the activities that impact resources, recording details like the actor (the entity initiating the action), the target (the affected resource), and a timestamp.

Audit events are generated by applications and Cloud Foundry components, then captured by Loggregator and streamed through Firehose to various endpoints for access and analysis.

Watch the following video to discover how to monitor events for individual applications or across entire spaces.

Remember: as SAP BTP retains Cloud Foundry events for 14 days, you should forward these data to an external storage solution for long-term persistence and advanced analytics.

To better understand audit events, check out the following documentation: Audit events overview

Get an Overview of Events and the SAP Alert Notification System

In this tutorial, you'll discover the Cloud Foundry events and options to send alerts with the SAP Alert Notification Service.

Perform the steps, you find here.

Health Checks

Health checks are automated tests that Cloud Foundry performs on your running application instances to determine their status, ensuring reliability and availability. Cloud Foundry employs liveness and readiness check, serving distinct functions:

  • Readiness checks ensure that an application instance is prepared to handle requests. If a readiness check fails, the instance is temporarily removed from the load balancer, preventing it from receiving traffic until it stabilizes. This is essential during updates or scaling operations to ensure graceful deployments.
  • Liveness checks verify that an application instance is active and running. If a liveness check fails, the instance is automatically restarted. This selfrecovery mechanism helps mitigate system failures efficiently.

There are three types of health checks:

  • Port checks (default): Check if a TCP connection can be established on the application's port.
  • Process checks: Verify if the application's main process is running. It's useful for worker processes or applications that don't use standard ports.
  • HTTP checks (recommended): Perform a HTTP GET request to a specified endpoint, and check for a 200 status code. It provides more accurate feedback for web applications.

The life cycle of health checks can be summarized as the followings:

  • Deployment: Health check configurations are defined in the application's manifest file (manifest.yml) or through the Cloud Foundry CLI during deployment. For details, please check Configuring health checks.
  • Initialization: Diego initiates and schedules the application instance along with the specified health checks.
  • Startup: Upon startup, health checks begin immediately every two seconds. This continues until either the application passes the checks or a specified timeout is reached (default: 60 seconds, configurable up to 10 minutes on SAP BTP as instructed in the Deploying large applications guideline.
  • Ongoing monitoring: Once the application starts successfully, regular liveness and readiness checks are performed to continuously monitor its health.
  • Failure handling: Failing a liveness check leads to a restart of the app instance. Failing a readiness check results in the instance being removed from routing, though it continues running for potential recovery. When health checks fail, analyze your application logs for errors or warnings. If you're using a custom HTTP endpoint, ensure it's returning the correct response. Additionally, consider resource constraints (memory, CPU) and network connectivity as potential culprits. Eventually, if an instance repeatedly fails health checks, Cloud Foundry will give up on restarting it and mark it as crashed.

To learn more about health checks, refer to the following documentation: Health checks in Cloud Foundry

Perform an Application Health Check

In this tutorial, you'll discover how the Application Health Check works and how to set it up for your application.

Perform the steps, you find here.

Streamlining Observability and Monitoring with SAP BTP Services

SAP empowers you to effortlessly establish comprehensive observability and monitoring strategies for applications running on SAP BTP, Cloud Foundry runtime. This is achieved through a suite of robust services, including SAP Cloud Logging, SAP Alert Notification, and SAP Cloud Application Lifecycle Management (SAP Cloud ALM).

SAP Cloud Logging: Centralized Log Management and Analysis

SAP Cloud Logging enhances the observability capabilities for applications on SAP BTP by centralizing the collection, storage, visualization, and analysis of logs, metrics, and traces. This service seamlessly integrates with SAP BTP, Cloud Foundry runtime, accommodating various data formats and sources for comprehensive data collection and analysis. It allows you to manage the storage duration of your observability data with flexible retention policies. You can easily explore your logs, metrics, and traces through an intuitive web interface, leveraging customizable dashboards and predefined views to quickly identify trends and anomalies. Moreover, the service includes advanced features like alerting and anomaly detection to help you proactively address potential issues.

To begin using SAP Cloud Logging, you can create a service instance either through the Cloud Foundry CLI or SAP BTP cockpit, then choosing the service plan that best fits the business needs. Once the service instance is operational, you can start shipping logs and utilizing the tools provided for in-depth analysis and visualization. Detailed instructions can be found in our SAP Help Portal.

SAP Alert Notification Service: Proactive Alerts and Notifications

SAP Alert Notification service further enhances your monitoring capabilities by providing a centralized platform for real-time alerts and notifications on operational changes and system health. It acts as a powerful proxy, collecting and managing alerts from various applications and services. It offers a comprehensive and growing catalog of alerts across SAP BTP. It allows you to tailor subscriptions to specific events or conditions relevant to your monitoring needs. Also, it supports multiple notification channels, from traditional emails to modern integrations like Slack, allowing you to reach stakeholders efficiently and effectively.

To get started with SAP Alert Notification, create a service instance in your SAP BTP subaccount through Cloud Foundry CLI, a multitarget application descriptor or SAP BTP cockpit. Complete setup instructions and guidance on customizing alert conditions and actions can be found in the service's SAP Help Portal.

SAP Cloud ALM: a Central Hub for Application Lifecycle Management

SAP Cloud ALM is a cloud-based solution tailored for managing the entire lifecycle of SAP-provided SaaS and customer-built applications. With integrated observability and monitoring, SAP Cloud ALM offers insights into your application's performance through technical metrics, event monitoring, and embedded analytics. By enabling advanced alerting and automated corrective actions, such as service restarts, SAP Cloud ALM allows for proactive issue detection and resolution. As a centralized management interface, it streamlines the management of SaaS and custom applications, integrating with existing IT service management processes.

You can use SAP Cloud ALM to monitor your custom-built applications by integrating with the OpenTelemetry library, which involves minimal code changes thanks to auto-instrumentation and configuration options. Detailed guidance can be found here. At the same time, you can easily connect SAP services with SAP Cloud ALM through preconfigured and preinstrumented setups. Check our SAP Help Portal for the latest information.

Available at no additional cost under the Enterprise Level Support Contract, SAP Cloud ALM can be activated easily without additional hardware or complex configuration. Step-by-step instructions can be found in the following SAP missions:

Summary

This lesson has equipped you with the knowledge to set up and manage observability and monitoring within the SAP BTP, Cloud Foundry runtime. By understanding and applying these tools, you can ensure that your applications are reliable, performant, and secure.

Log in to track your progress & complete quizzes