Optimizing Joins

Objective

After completing this lesson, you will be able to Optimize joins to improve calculation view performance.

Join Optimization

Set Join Cardinalities

When you define a join, you can optionally set the cardinality to describe the relationship between the two data sources, that is, 1..1, 1..n, n..1, and n..m.

Based on the join cardinality setting, some joins can be omitted to improve run-time performance, but a wrong setting can lead to unexpected results. Therefore, you should always set a correct join cardinality. Follow a cardinality proposal where applicable (only tables as sources) and ensure continued correctness of the setting.

We recommend that you always set join cardinalities so the optimizer can decide if joins can be omitted at runtime based on the query that is being sent to the calculation view.

If tables are used as the data sources, and not calculation views, then you can use the Propose Cardinality option.

Caution

The cardinality setting is proposed based on the content of the joined tables at the time you select the Proposal button. The cardinality setting does not automatically update as the table content changes. You might need to come back and change the cardinality setting if the relationship in the data changes. Additionally, remember that you need to use realistic data if you plan to use the proposal. It is no good proposing cardinality on a few test records that later do not make sense on live data.

With a join definition including cardinalities, and under additional conditions, the calculation view instantiation can automatically prune a data source involved into a join node. Let's have a look at these conditions:

An SQL statement together with a screen capture of field mapping that obeys he following prerequisites for join pruning: a. No field is requested from the to-be-pruned table. b. The join type is outer, referential, or text. c. The join cardinality is either .. 1 for the to-be-pruded table or only measures with count distinct aggregation or no measures at all are requested.

The key principle applied by the SQL optimizer is that it will prune a joined source, based on the join type and cardinalities, only if this cannot affect the result set. In other words, whenever the result set might be different depending on whether the join is executed or not, it will execute the join.

Caution

Note that this "default" join pruning is not triggered on an Inner join. To optimize execution time for an inner join, you can use the join type Referential (see the following).

Let's have a look at how cardinalities and join type actually allow the optimizer to prune.

Three examples, one in which pruning is possible, two examples without pruning. For details, refer to the following text.

In the three examples, based on the same starting point scenario, no column from the right table is requested at all.

  • In case of a left outer join, the right table R can be ignored because the 1..1 cardinality tells the optimizer that executing the join will not impact the result set. Indeed, for each row in L, there would be at most one match in R.

  • In the second example, using a left join but a 1..n cardinality, it is the other way round. No pruning can occur because the join could match more than one row of R with each row of L.

  • Finally, in the case of a right outer join, it is not possible to prune the right table, whatever the cardinality. This is because it is not possible to know which rows of L should be kept in the result set without executing the join.

Referential Join Type

Among join types, one is called Referential. When you use a referential join, you tell the optimizer that any row from one source always has a match in the other source. For example, the sales item will always have a header. You also specify whether this property is true for the left source, the right one, or both; this is done with the Integrity Constraint column.

You should use Referential Join where applicable. A screen capture of the join properties shows joint type Referential with integrity constraint Right. This means, every entry in the right table has at least one match in the left table. Other options are Left or Both. This allows pruning under the conditions listed in the text above.

The Referential join can be considered as an inner join supporting optimization. With the actual cardinality defined, it always returns the same result set as an inner join, and uses the Integrity Constraint (in addition to other general pruning conditions settings mentioned previously) to decide if the join execution can be ignored (and one table pruned) to increase performance.

Note

The key difference with Outer or Text joins is that the Referential join behaves as an inner join, as already said. In addition, for a 1..1 cardinality setting and Integrity Constraint placed on both sources (left and right), it allows join optimization regardless of which source (right or left) has no column requested.

General Join Recommendations

Recommendation using joins: Always maintain the cardinality of the joins. Try to use n:1 and 1:1 for left outer joins, or 1:n and 1:1 for right outer joins. Try to reduce number of join fields. Avoid joining on calculated columns. Avoid type conversions at runtime. Check whether dynamic join and optimize join columns can be used. (Screen captures of bad design that needs a calculation or a type conversion in a join.)

Joins are a source of potential performance improvements. Make sure you consider all the recommendations when defining joins to ensure you get the best performance from your calculation views.

Join Column Optimization

We already covered the pruning of data sources in a join but you should learn about another optimization possibility. In the remaining data source, can the column that was part of the join also be pruned?

Background - The Default Behavior

By default, the column on which a join is defined will always be included in the result, regardless of whether the column was requested by the query.

The reason for this is to guarantee a consistent aggregation behavior concerning the join field, even when the field in not requested by the query. If this aggregation behavior changes depending on which columns are requested, this could lead to a change of the resulting values of measures.

Optimize Join Columns Option

When you select the Optimize Join Columns checkbox, you tell the calculation engine that – in case a join partner is pruned – you do not expect changes to the result set depending on whether the joined column from the queried data source is used for aggregation or not. This depends heavily on the type of aggregations that are performed by the calculation view. In particular, the SUM aggregate function is not sensitive to the grouping level, whereas the MAX or MIN are sensitive.

When Does Join Pruning Occur?

When the Optimize Join Columns option is active, pruning of join columns between two data sources, A and B, occurs when all four following conditions are met:

  • The join type is Referential, Outer or Text (actually, the calculation view cannot be built if join type is Inner).

  • Only columns from one join partner, A, are requested.

  • The join column from A is NOT requested by the query.

  • The cardinality on B side (the side of the join partner from which no column is requested) is ..1.

Caution

As you see, the optimization heavily relies on cardinality. Therefore, you must ensure that the cardinality is set according to the actual data model. If it is not, the Optimize Join Columns option will produce unstable (though sometimes faster) results.

Greedy Join Pruning

SAP HANA Cloud QRC 3/2022 has introduced a new type of join optimization called Greedy Join Pruning. It allows the SQL optimizer to prune a joined source if it is not queried at all (no column requested), regardless of the cardinalities and join type settings. This can significantly improve query runtime. However, you should use it only when the expected results do not depend on whether the join is executed or not.

To achieve this, you define the Greedy Pruning setting of a join as Left, Right, or Both. Left or Right refer to which joined source can be pruned in a greedy mode.

Note

When greedy pruning is enabled, it does not prevent other join pruning mechanisms. Therefore, even if a condition is not met for greedy join pruning (for example, the Greedy Pruning is Left but you request columns from the left table of the join), join pruning could still happen because the cardinalities, join type and so on, allow it.

Greedy Pruning can be defined at three different levels:

Screen captures of ways to define Greedy Pruning. First option: At the join level, choose the greedy pruning direction. Second option: use SQL query execution hint greedy_join_pruning: <value>. Third option: For an entire Calculation View, define an execution hint at the view properties on the Advanced tab, then add an execution hint, search for pruning, find the entry Left side greedy_join_pruning with a value 7.

Greedy Pruning at Different Levels

LevelPrecedenceHow to Activate Greedy Pruning
One join of a calculation view3In the Join Definition tab, in the Properties pane, use the Greedy Pruning dropdown list.
All joins of a calculation view2In the Semantics, display the View Properties > Advanced tab. Then choose Add Execution Hints, search for Pruning and select the desired greedy pruning setting.
All calculation views used in an SQL query1

Inside your SQL query, add the following hint clause:

SELECT <...> FROM <...> ('PLACEHOLDER' = ('ce_settings', '{"greedy_join_pruning": "<value>"}')

The values are as follows:

  • 1: Disabled
  • 7: Left
  • 11: Right
  • 15: Both

The precedence column in the previous table means that, for example, a greedy pruning setting defined for an entire calculation view overrules any greedy pruning setting defined in any of its joins. Similarly, a greedy pruning hint inside an SQL query overrules any setting defined for the entire calculation view or in a specific join.

Log in to track your progress & complete quizzes