Optimizing Joins

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

• Optimize joins to improve calculation view performance

Join Optimization

Set Join Cardinalities

When you define a join, you can optionally set the cardinality to describe the relationship between the two data sources, that is, 1..1, 1..n, n..1, and n..m.

We recommend that you always set join cardinalities so the optimizer can decide if joins can be omitted at runtime based on the query that is being sent to the calculation view.

If tables are used as the data sources, and not calculation views, then you can use the Propose Cardinality option.

Caution
The cardinality setting is proposed based on the content of the joined tables at the time you select the Proposal button. The cardinality setting does not automatically update as the table content changes. You might need to come back and change the cardinality setting if the relationship in the data changes. Additionally, remember that you need to use realistic data if you plan to use the proposal. It is no good proposing cardinality on a few test records that later do not make sense on live data.

With a join definition including cardinalities, and under additional conditions, the calculation view instantiation can automatically prune a data source involved into a join node. Let's have a look at these conditions:

The key principle applied by the SQL optimizer is that it will prune a joined source, based on the join type and cardinalities, only if this cannot affect the result set. In other words, whenever the result set might be different depending on whether the join is executed or not, it will execute the join.

Caution
Note that this "default" join pruning is not triggered on an Inner join. To optimize execution time for an inner join, you can use the join type Referential (see the following).

Let's have a look at how cardinalities and join type actually allow the optimizer to prune.

In the three examples, based on the same starting point scenario, no column from the right table is requested at all.

• In case of a left outer join, the right table R can be ignored because the 1..1 cardinality tells the optimizer that executing the join will not impact the result set. Indeed, for each row in L, there would be at most one match in R.

• In the second example, using a left join but a 1..n cardinality, it is the other way round. No pruning can occur because the join could match more than one row of R with each row of L.

• Finally, in the case of a right outer join, it is not possible to prune the right table, whatever the cardinality. This is because it is not possible to know which rows of L should be kept in the result set without executing the join.

Referential Join Type

Among join types, one is called Referential. When you use a referential join, you tell the optimizer that any row from one source always has a match in the other source. For example, the sales item will always have a header. You also specify whether this property is true for the left source, the right one, or both; this is done with the Integrity Constraint column.

The Referential join can be considered as an inner join supporting optimization. With the actual cardinality defined, it always returns the same result set as an inner join, and uses the Integrity Constraint (in addition to other general pruning conditions settings mentioned previously) to decide if the join execution can be ignored (and one table pruned) to increase performance.

Note
The key difference with Outer or Text joins is that the Referential join behaves as an inner join, as already said. In addition, for a 1..1 cardinality setting and Integrity Constraint placed on both sources (left and right), it allows join optimization regardless of which source (right or left) has no column requested.

General Join Recommendations

Joins are a source of potential performance improvements. Make sure you consider all the recommendations when defining joins to ensure you get the best performance from your calculation views.

Join Column Optimization

We already covered the pruning of data sources in a join but you should learn about another optimization possibility. In the remaining data source, can the column that was part of the join also be pruned?

Background - The Default Behavior

By default, the column on which a join is defined will always be included in the result, regardless of whether the column was requested by the query.

The reason for this is to guarantee a consistent aggregation behavior concerning the join field, even when the field in not requested by the query. If this aggregation behavior changes depending on which columns are requested, this could lead to a change of the resulting values of measures.

Optimize Join Columns Option

When you select the Optimize Join Columns checkbox, you tell the calculation engine that – in case a join partner is pruned – you do not expect changes to the result set depending on whether the joined column from the queried data source is used for aggregation or not. This depends heavily on the type of aggregations that are performed by the calculation view. In particular, the SUM aggregate function is not sensitive to the grouping level, whereas the MAX or MIN are sensitive.

When Does Join Pruning Occur?

When the Optimize Join Columns option is active, pruning of join columns between two data sources, A and B, occurs when all four following conditions are met:

• The join type is Referential, Outer or Text (actually, the calculation view cannot be built if join type is Inner).

• Only columns from one join partner, A, are requested.

• The join column from A is NOT requested by the query.

• The cardinality on B side (the side of the join partner from which no column is requested) is ..1.

Caution
As you see, the optimization heavily relies on cardinality. Therefore, you must ensure that the cardinality is set according to the actual data model. If it is not, the Optimize Join Columns option will produce unstable (though sometimes faster) results.

Greedy Join Pruning

SAP HANA Cloud QRC 3/2022 has introduced a new type of join optimization called Greedy Join Pruning. It allows the SQL optimizer to prune a joined source if it is not queried at all (no column requested), regardless of the cardinalities and join type settings. This can significantly improve query runtime. However, you should use it only when the expected results do not depend on whether the join is executed or not.

To achieve this, you define the Greedy Pruning setting of a join as Left, Right, or Both. Left or Right refer to which joined source can be pruned in a greedy mode.

Note
When greedy pruning is enabled, it does not prevent other join pruning mechanisms. Therefore, even if a condition is not met for greedy join pruning (for example, the Greedy Pruning is Left but you request columns from the left table of the join), join pruning could still happen because the cardinalities, join type and so on, allow it.

Greedy Pruning can be defined at three different levels:

Greedy Pruning at Different Levels

LevelPrecedenceHow to Activate Greedy Pruning
One join of a calculation view3In the Join Definition tab, in the Properties pane, use the Greedy Pruning dropdown list.
All joins of a calculation view2In the Semantics, display the View Properties > Advanced tab. Then choose Add Execution Hints, search for Pruning and select the desired greedy pruning setting.
All calculation views used in an SQL query1

`SELECT <...> FROM <...> ('PLACEHOLDER' = ('ce_settings', '{"greedy_join_pruning": "<value>"}')`