Introduction
Your company has recently deployed applications on the SAP BTP, Cloud Foundry runtime and needs to understand the scaling capabilities to handle fluctuations in application traffic and demand efficiently.
Horizontal and Vertical Scaling
SAP BTP, Cloud Foundry runtime offers powerful scaling capabilities, including horizontal and vertical scaling. The choice between horizontal and vertical scaling depends on the specific requirements of your application. Here are some key details about scaling capabilities in SAP BTP, Cloud Foundry runtime:
Horizontal Scaling
Also known as scaling out, it is the process of adding more instances to support your application to handle increased load. In the context of SAP BTP and Cloud Foundry, this means increasing the number of application instances, ensuring that the application remains responsive and performant even as the application load increases.
In SAP BTP Cloud Foundry, you can easily scale your applications horizontally using the SAP BTP, Cloud Foundry runtime command line interface (CLI), SAP BTP Cockpit or the Application Autoscaler service. The platform will automatically distribute the instances across different virtual machines and availability zones to ensure high availability and fault tolerance.
Vertical Scaling
Also known as scaling up, it is the process of adding more resources, such as memory, disk and CPU, to your application to handle increased load. In the context of SAP BTP Cloud Foundry, this means increasing the memory and disk quota for your application instances. The amount of CPU is automatically provided by the platform in such a way that applications get a guaranteed CPU share of ¼ core per GB instance memory. The maximum instance memory per application is 16 GB, which allows for vertical scaling up to 4 CPUs. For more up to date memory limit, please refer to the Limits section.
In SAP BTP Cloud Foundry, you can efficiently scale your applications vertically using the SAP BTP, Cloud Foundry runtime command line interface (CLI) and SAP BTP Cockpit. You can define the memory and disk space for each application instance when pushing or scaling your application. Vertical scaling automatically adjusts the resources assigned to your application, enhancing its performance. However, it is important to note that this may lead to increased costs as additional resources are utilized.
In the context of multitenant applications, each tenant or consumer accesses the application through a dedicated URL. The application environment identifies them by their unique tenant ID, ensuring data isolation by distinguishing requests from different consumer tenants based on the tenant ID.
The resources for each module, such as memory and disk quota, can be defined in the mta.yaml file. When you deploy the MTA, the platform automatically allocates the specified resources to each application instance.
Limits
SAP BTP, Cloud Foundry runtime imposes certain limits on resource usage and scaling to ensure efficient resource allocation and fair usage among applications. These limits can be customized through the use of organization and space quotas, enabling administrators to define maximum resource allocations for memory, CPU, disk space, and the number of instances per application. In the SAP BTP, Cloud Foundry runtime environment, there are certain limits when it comes to scaling applications:
- Memory: The maximum memory you can allocate to an application instance is 16 GB.
- Disk Quota: The maximum disk quota you can allocate to an application instance is 10 GB.
- Application Package Size: The maximum application package size is 1.5 GB. If your application is larger than that, the deployment fails.
- MTA Archive Size: The maximum size of a multitarget application (MTA) archive is limited to 500 MB. Deployment is denied for archives with larger size.
- Number of Instances: The number of instances you can scale out to will depend on the quota assigned to your organization and space in the SAP BTP, Cloud Foundry runtime environment.
Please refer to the SAP BTP - Specific Configuration documentation for the most current information on limit configurations for SAP BTP, Cloud Foundry runtime.
These limits are designed to ensure fair usage and maintain the platform's performance and stability. If your application requires additional resources, consider optimizing your application or distributing the workload across multiple applications or services.
Autoscaling
Autoscaling is an essential aspect of application development, especially in a cloud environment. SAP BTP, Cloud Foundry runtime supports autoscaling through the Application Autoscaler service, allowing applications to dynamically adjust the number of running instances based on predefined criteria such as CPU usage, memory utilization, or custom metrics. This enables applications to automatically scale in or out in response to changing load conditions, ensuring optimal performance and resource utilization. The service automatically adjusts the number of application instances based on your defined policy, which can be configured using specific schedules or metrics such as CPU utilization, HTTP throughput, or HTTP latency. By doing so, applications can scale in or out dynamically, maintaining performance and resource efficiency under varying load conditions.
The Application Autoscaler service provides the following features:
- Dynamic Scaling: It adjusts the number of application instances automatically based on real-time application performance metrics.
- Scheduled Scaling: It adjusts the number of application instances automatically based on predefined schedules. This is useful for scenarios where predictable load changes occur.
- RESTful APIs: It provides APIs for managing autoscaling policies and retrieving autoscaling history.
- Autoscaling Dashboard: It provides a user interface for you to manage autoscaling policies and view autoscaling history.
To use the Application Autoscaler service, you must bind it to your application and define an autoscaling policy. Scaling policies with multiple rules are evaluated from top to bottom. The first rule that matches will determine the outcome of the policy evaluation.
It is generally acceptable if scaling is based on just one metric (depending on hysteresis). Ensure that the most critical rules are prioritized at the top of the policy. Note that this approach does not support combining multiple metrics using logical operations like AND or OR. Keep these considerations in mind when configuring scaling policies.
Streamlined Service Setup
- Create a service instance.
- Define scaling policy.
- Bind app to the service instance using policy.

Custom metrics should be submitted if the standard metrics are insufficient. These metrics must be submitted every 40 seconds per application instance to be consistent with other metrics. Note that there's a limit of 30 requests per second against the custom metrics API. All metrics, including custom ones, are averaged over time and across application instances. So, it's important to submit measurements periodically from each application instance.
Load Balancing
SAP BTP, Cloud Foundry runtime includes built-in load balancing capabilities to evenly distribute incoming traffic across all running instances of an application. This helps improve the overall performance, availability, and reliability of applications by effectively utilizing the available resources and handling traffic spikes.
If the automatic load balancing provided by the platform is not sufficient, SAP BTP Cloud Foundry also provides different strategies to distribute load. These strategies include round-robin and least connection. By default, the platform uses a round-robin approach to distribute incoming requests evenly across all instances of an application. However, if your application has specific requirements or characteristics, you can choose a different load balancing strategy that better suits your needs.
For example, the least connection strategy directs incoming requests to the instance with the fewest active connections, which can be beneficial for applications with long-lived connections or uneven workloads. The least-connection algorithm may produce better results when using the Application Autoscaler service. This algorithm directs traffic to the new instances created during scaling out, as they initially have no active connections. In contrast, the round-robin algorithm would distribute traffic more evenly but may take longer to balance the load, as connections established before the new instances became available need to complete.
Summary
SAP BTP, Cloud Foundry runtime offers powerful scaling capabilities, including horizontal scaling (increasing the number of instances) and vertical scaling (increasing memory and disk space). There are also limits on resource usage, customizable through organization and space quotas, and autoscaling capabilities to dynamically adjust the number of running instances based on predefined criteria. The platform also provides built-in load balancing to distribute incoming traffic evenly. These features are essential for maintaining optimal performance, resource utilization, and application availability.