Analyzing and Optimizing SQL Using Tools and Views

Objective

After completing this lesson, you will be able to use tools and views to analyze and optimize SQL.

Suboptimal SQL - Potential Symptoms

When facing performance problems on a SAP HANA system, it is not always clear right away what the reason is. It might be caused by the code of an application or by other reasons. For instance, an unload of data from the cache in SAP HANA can be due to contention of parallel long-running SQL. It also could be the result of a memory size configuration that does not allow processing appropriately, possibly due to heavy data growth over time. So when problems on SAP HANA occur, the first question to answer is: what is the root cause for it? If the answer is at least partly due to supposedly suboptimal SQL, then the following described methods can be used to analyze in detail what is, or are, the reasons for it and how to solve it in the next step.

  • Slow generation of reports in the backend
  • SAP HANA System not responding anymore while applications are running
  • Error messages: Out of memory situations (OOM) in the traces
  • Lack of memory according to SAP HANA Cockpit and Monitoring tools

Proactive Activities for Optimal SQL

The best way to prevent the creation of suboptimal SQL is undertaking training on how to create optimal SQL right from the start. Attending trainings before creating optimal SQL is not always feasible, and knowledge on how to do it the best way may have become inactive. Nevertheless, it appears to be a good investment in knowledge if you think about the following costs of making up SQL errors in the code (detect, evaluate, improve, test and bring to production again, and so on). Experience reveals that often suboptimal code shows up in inconvenient moments of projects.

Following are the proactive activities for optimal SQL:

  • Revise the data model for SAP HANA (in-memory, column based).
  • Stay trained on using in-memory methods and functions in SQL.
  • Stay current with new SAP HANA releases (new SQL Features).
  • Create easy to understand and transferable problem / solution documentation.
  • Provide of access to systems for backup teams (absences).
  • Workload Classes for the most urgent and important SQL.
  • Create, use and maintain a suitable naming convention for SQL objects.
  • Use a meaningful error messaging concept to find suboptimal SQL easier.

The data model is a core and important factor in creating optimal SQL. If the data model does not allow efficient code, optimization probably will fail, or at least only bring less results. Also the data modeler knowledge over time should be kept up-to-date to learn new features. Those new features may allow to solve an SQL performance problem in the code that could not be solved that way before. The Release Bulletins that come up with every Support Package Stack may offer new features of SQL. New hints can be mentioned in this context also.

In-memory column-based technology differs from the legacy RDBMS. Context switches between SAP HANA engines can cause poor performance but can be fixed. In the best case, context switches are not even allowed through proactive testing and monitoring. Also, organizational factors form a proactive approach to prevent suboptimal SQL from being written. Creating a list of best practices and how to access the system is another proactive measure. Also, assigning workload classes with higher or lower priority can push optimal execution of SQL in SAP HANA.

Last but not least, naming conventions for stored procedures, tables, and so on, connected with custom created error messaging, contribute to finding and improving SQL with poor performance.

Role Responsibilities for Suboptimal SQL

In real world projects, the question about who is responsible for taking care of the performance issues sometimes comes up, especially if it is SQL code that was handed over or inherited from third parties. This first barrier is not technical, but rather an organizational issue that needs to be addressed. Roles like Administrator, Data Modeler, or Developer are among those that are potentially capable of solving suboptimal SQL challenges. Someone has to take over the responsibility to analyze the problem and it is not clear which role will finally work on finding the solution for it. This could be time consuming and it is worthwhile to assign clearly who does what and when they do it.

In case of obvious inefficient system setups, the Administrator is usually responsible. If the execution of application code clearly causes slow SQL execution, the responsible role will be the author of this code. It is often the Developer who has to do some optimizing changes on the SQL Code where possible. If the data model happens to be the problem, for example, if migration from other database manufacturers towards SAP HANA took place, then the Data modeler appears to be the right person to find a solution.

  • Developer cannot grant privileges on system configuration.
  • Administrator is not permitted to enter a HDI Container deploying optimal code.
  • Data Modeler is not permitted to recompile a Stored Procedure executing the code in suboptimal way.
  • Problem Analyzer role would be beneficial to have – for example, setup in role rotation.

The role in-between that helps to find out what is happening and whom to assign the responsibility for finding the problem and then creating a solution can be described as the Problem Analyzer role. This could be set up to combine with existing roles like Architect, Developer, Administrator, or other roles. Part-time job rotation roles are an idea too. The role owner of Problem Analyzer pre-evaluates the root cause. For example, with poor performing Stored Procedure in Container XYZ - further steps may be assigned to the appropriately-trained person or the author to work on a solution and to transport it back to production.

Query Optimization Steps

The key to solving a performance issue is to find the root cause of the problem. This is the point in which the compilation process no longer leads to a result that matches performance expectations, though it may run correctly. This could be perceived as a contradiction (correct execution plan but poor performance). For this situation, SAP provides a great variety of tools. To be able to identify the underlying issues, you require a thorough understanding of the query optimization steps, as you see in the figure.

Diagram showing the query optimization steps.

On the way from Request to Answer, the Query Processor coordinates the Query Optimizer until all detail decisions have been made. There is an automatic Cost-Based Optimization process within SAP HANA that can be overruled by the Rule-Based Optimizer.

Executors like the SAP HANA Extended SQL Executor (ESX), which is a front-end execution engine, execute the SQL queries. The Enumerator phase (Rule Based vs. Cost-Based) ends with the selection of Column Engines, Row Engines, or an orchestration of those. Then, either a rewriting (rule-based) of the execution steps happens or logical or physical enumerators (cost-based) are applied until the execution plan has been created successfully. This process can be repeated and influenced by rules, for example, filter pushdown replacing joins could be a logical change that often appears as a solution. Hints provided by SAP can be used to test and change decisions of the optimizer and finally optimize the execution plan.

SQL Processing Steps

The basic SQL query undergoes several steps on the way to its execution plan. In order to analyze and finally optimize SQL in SAP HANA, it is necessary to understand firstly the process on how SQL is executed on the basis of an execution plan.

Table illustration showing the SQL processing steps.

An execution plan is the compiled, ready to execute, and selected translation of what the code intends to achieve so optimization can focus on this plan. Before this execution plan is created, several other steps have to be undertaken by SAP HANA.

These steps are executed by means of different components and in parallel if possible. After accepting an SQL request within a session, a syntax check and parsing in context of sufficient object and system privileges takes place in SQL front-end.

Then, the Query Optimizer, a very important concept in SQL Execution process, will evaluate the best and most efficient track for the result set which leads to the execution plan. Logical and physical optimization iterations are undertaken fully automatically. This happens under a cost-based condition.

In test or development systems, sometimes rule-based executions are used temporarily which makes parallelism either impossible or harder to apply. The regular cost-based evaluation triggers the comparison of enumerations and then leads to the best approach for that particular moment, which is represented by the execution plan to be used.

This plan can be reused to save compilation time in the future and cut down redundant operation steps (like syntax checks). The appropriate SAP HANA engines are then selected and will work on the generation of the result sets with the generated code according to the execution plan.

Extended SQL Executor

Since SAP HANA 2.0 SPS02, there are two additional processing engines in SAP HANA to execute SQL queries. They were introduced to offer even better performance under the precondition to not affect the functionality of SAP HANA. The new engines are active by default and are considered by the SQL optimizer during query plan generation. These are SQL Executor (ESX) and SAP HANA Execution Engine (HEX).

Diagram showing the SAP HANA Extended SQL Executor (ESX).

The SAP HANA Extended SQL Executor (ESX) is a front-end execution engine that replaces the row engine in part, but not completely. It retrieves database requests at session level and delegates them to lower-level engines like the Join Engine and Calculation Engine. Communication with other engines is simplified by using ITABs (internal tables) as the common exchange format.

The other engine, SAP HANA Execution Engine (HEX), is a query execution engine that will replace other SAP HANA engines such as the Join Engine and OLAP Engine in the long run by combining all into one single engine. It connects the SQL layer with the column store by creating an appropriate SQL plan during the prepare phase. Queries that are not supported by HEX, or where an execution is not considered beneficial, are automatically routed to the legacy engine.

Top Reasons for Unexpected SQL Query Performance Deterioration

Optimal SQL code may become suboptimal over time. Therefore, it is possible that applications may show a stagnation or deterioration in performance over time due to changing data metrics, changing data formats, or changed size relations between database artifacts through join-associated table evaluations. The cost-based optimizer for this reason is flexible and tries to optimize the execution plan as much as possible at creation time, though there are situations where this is not possible anymore and suddenly or subsequently a downfall in execution takes place.

The reason for this is not only connected to different size metrics of objects like tables, but could also be caused by new functionalities of SAP HANA after an upgrade. SAP HANA is a fast-growing technology and comprises, in its Support Package Stacks, large sets of new functionalities in each release. If new functionalities are released, for example, when new execution engines are introduced, it can have an impact on how the optimizer chooses to create the execution plan. Previously solid and long used execution plans might become suboptimal.

  • Upgrades of SAP HANA Platform with new functions and new default values
  • Evolution of the data model (optimal code may become suboptimal)
  • Memory Size exceeded (no temperature model and thereof size problems)
  • Mass processing of data at the same time (traffic jam effect)

Working through SAP HANA Release Bulletin documents shipped with each new SAP HANA upgrade is highly recommended. The steps are to be undertaken proactively on SAP HANA databases in order not to experience suboptimal code execution. Another advantage of establishing this routine is that new SAP HANA functionalities bring improvements that may have great potential for improvements in other work areas.

Too much growing data can lead to poor SQL performance and to the cache unloading data over time. Concepts and technologies like the temperature model or data tiering can protect from this deterioration and should be taken into account. Additionally, applying the flow of data into and out of SAP HANA gives back control over what to do with less important and less urgent data over time.

Last but not least, system performance deterioration can also have its origin in external factors, like the configuration of the overall system. Configuration (default vs. needed configuration) can be done in a way that performance of SQL execution within SAP HANA is not possible and activities on the application level would turn out to be ineffective. A systematic solution process is recommended between the SAP HANA system roles and shared objectives.

Analyzing Tools

Analyzing Tools for Suboptimal SQL

To find and focus on the suboptimal SQL is the very first step to solve poor performance. For this reason, there is a wide range of analyzing tools provided to the SAP HANA Developers who use SAP HANA. Some of those tools can help the developers to understand the group of objects involved, for instance, within the execution respective call of a stored procedure, or simply to understand the operators being used.

  • Explain Plan – lightweight tool
  • SQL Analyzer – main tool
  • Graph Analyzer – visualized plan (PlanViz)
  • SQL Plan Cache
  • SQL Traces (trace, step debug, time debug
  • System Views (e.g. expensive statements, plan_cache)
  • Mini Checks (also using System views)
  • Hints (rule vs. cost-based optimizations)

The hierarchy of operators can be visualized by the Graph Plan tool. We can see the excluded or included parts within the processing of SQL and find the place where time is consumed to a high degree.

The Explain Plan provides an ex-ante possibility within testing scenarios to evaluate how the cost based optimizer within SAP HANA will decide the route to take in order to create the result set. For instance, if an index was ignored by the optimizer, one can test using a hint in the code to force index usage and see if the SQL runs as expected.

SQL Traces show what happened and give a granular way of controlling what is intended to be traced. System views or Mini Checks, which are integrated in the Statement Library, can be used to evaluate information about suboptimal SQL.