Modeling in SAP HANA Cloud

Using Joins Nodes

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Use joins to combine data sources
  • Join more than two tables in a single join node
  • Work with non-equi joins
  • Use a Dynamic Join
  • Define join columns optimization

Connecting Tables with Join Nodes

One important activity when you create calculation views, is to express the relationships between the different data sources used by a model. Most often, this is done by using joins, a classical artifact in any relational database management system (RDBMS).

A specific node type, the Join node, is used in modeling to materialize joins between one or more data sources (tables, calculation views, and so on).

In addition, the Star Join node, which is used to model a star schema, also defines one or several joins between a main data sources (fact "table") and DIMENSION calculation views.

For each Join node, you must define which columns of the two joined sources must participate in the join, as well as the join type, as discussed in the lesson Connecting Tables. You can also specify the cardinality, with the help of the Propose Cardinality feature, but only if you are sure that the cardinality you provide corresponds to how the data is actually organized.

Sample Business Case and Data

To illustrate the behavior of the different types of joins in SAP HANA Cloud, consider the following tables:

  • Sales Order

  • Customer

  • State

The objective is to join these tables to retrieve the sales order amounts (facts) with the customer information, including the states in which the customers reside.

To begin with, you can make the following observations:

  • Sales Order 8 does not have a customer master record.

  • Customer TOM does not have any orders.

  • State TX does not have a description.

  • No customer resides in Alabama.

Join Type Summary

There are different types of joins. The inner and outer joins are similar to other database management systems. Others are specific to SAP HANA Cloud and address specific requirements such as optimization (for example, the Referential Join), ease of use with typical SAP Business Suite data model (for example, the Text join), and others. As an aside, some of the join types listed in the table consist of a specific option or configuration. For example, the Temporal Join is an Inner Join to which you add time-related conditions.

You can review the typical use cases for the different join types.

Note

This table builds on a scenario similar to the one just described (Sample Data for Business Example), assuming that a fact table (considered as the left table) is joined to a dimension table (considered as the right table).

Each join type will now be presented in detail.

Inner Join

The Inner Join is the most basic of the join types. It returns rows when there is at least one match on both sides of the join.

Inner Join in a CUBE Calculation View

The figure, Inner Join in a CUBE Calculation View, shows the behavior of Inner Joins in a CUBE calculation view.

With the sample scenario data, some facts are not retrieved because customer information is missing.

Left Outer Join

A Left Outer Join returns all rows from the left table, even if there are no matches in the right table.

Left Outer Joins and Design Time Filters

The figure, Left Outer Joins and Design Time Filters, shows the behavior of left outer joins with design time filters.

Left Outer Join in a CUBE Calculation View

The figure, Left Outer Joins in a CUBE calculation view, shows the use of left outer joins in a CUBE calculation view.

Compared with the Inner join, all the sales order data (including those with no corresponding customer information) is retrieved, but still an analysis of sales by customer or state will return irrelevant data.

Right Outer Join

A Right Outer Join returns all the rows from the right table, even if there are no matches in the left table.

Right Outer Join in a CUBE Calculation View

Full Outer Join

A Full Outer Join combines the behaviors of the Left and Right Outer Joins.

The result set is composed of the following rows:

  • Rows from both tables that match on joined columns

  • Rows from the left table with no match in the right table

  • Rows from the right table with no match in the left table

Caution

A Full Outer Join is supported by calculation views only, in the standard Join and Star Join nodes.

However, in a Star Join node, a full outer join can be defined only on one DIMENSION calculation view, and this view must appear last in the Star Join node.

Referential Join

SAP HANA Cloud offers a type of join that is optimized for performance: the Referential Join.

The key principle of a Referential Join is that, if referential integrity between two tables is ensured, then under some circumstances, the join between these tables will not be executed, which will save execution time.

The concept of referential integrity between two tables (A and B) means that, in the joined columns (1 or more columns from each table), there is always a match in table B for a row of column A, or the other way round, or both.

Let’s take the example of a CUBE with Star Join calculation view that is defined with many DIMENSION calculation views, and the joins all have cardinality n..1 (meaning, a fact is connected to at most one member or each dimension). Let’s assume that, by design, the source system ensures that all records in the fact table always have a match in all the DIMENSION calculation views (this is often the case).

So, when you execute a query on your CUBE calculation view, the calculation engine can prune any DIMENSION calculation view from which your query does not request any column. So the corresponding join will not be executed. This is where optimization occurs.

Referential Join

Relies on Referential IntegrityReferential integrity is the fact that matches exist between the joined tables (in one direction, for example left to right, or right to left, or both).
Optimized for performanceReferential Join is not executed in circumstances where the result (without the join) will be the same as if the join was executed.
Like an Inner Join when join is executedWhen a Referential Join is not pruned, it is executed as an Inner Join.

Defining a Referential Join

To define a Referential Join, you first add a Join node to the calculation view scenario, and assign two or more data sources. In the case of a CUBE with Star Join calculation view, you assign a lower node to the Star Join node and add one or more DIMENSION calculation view.

Then, you have to define the following settings:

  • Join Type

    This setting must be set to Referential.

  • Cardinality

    Cardinality must be specified. If it is not, the Referential Join that cannot be optimized.

  • Integrity Constraint

    This setting defines in which direction the referential integrity is guaranteed.

    • Left: Every entry in left table has at least one match in right table.

    • Right: Every entry in right table has at least one match in left table.

    • Both: Every entry in both tables has at least one match in the other table.

With the Integrity Constraint setting, you ensure that the optimization of the join occurs only when it is actually possible, because the cardinality alone is not enough. If Integrity Constraint is set to left, no join optimization is triggered if a query requests columns only from the right table.

Conditions for Referential Join Optimization

A join defined as a Referential Join between two tables or sources, A and B, is pruned (not executed) when all three following conditions are met:

  • No field is requested from B.

  • Integrity is placed on A.

  • The cardinality on B side is :1.

Note

When the cardinality on B side is not :1, the join will always be executed even if no column from B is requested. This is a requirement to get the correct number of rows in the output, which depends on the number of matching rows in B for each row in A.

Let’s take an example, based on the figure, Referential Join.

The Join is defined as Referential, the Integrity Constraint (Integrity) is placed on the Left "table". This means that any record from the SNWD_SO_I table has at least one match in the CVD_PD calculation view. However, the cardinality is n..1, which also tells us that this is at most one match.

So all in all, for each record of SNWD_SO_I, there is exactly one match in CVD_PD.

With this join definition, if a query selects NO column from the right "table" CVD_PD, then the join will not be executed.

Referential Joins must be used with caution because they assume that referential integrity is ensured at any time. Using Referential Joins in a context where referential integrity is not ensured might lead to different results depending on whether or not you select columns from one of the two data sources.

Note
If you consider a join between a fact table and its related attributes, keep in mind that facts without corresponding attributes violate referential integrity, while attributes without facts (for example, a customer without any order) do not.

Create a Dimension Calculation View

Text Join

A Text Join enables SAP HANA Cloud to handle the translation of attribute labels in a way that corresponds to how translation texts are stored in the master data. In particular, this way to store texts in different languages is heavily used in SAP systems, such as the SAP Business Suite.

Technically, a Text Join behaves like a Left Outer Join, with cardinality 1:1, but in addition you specify a language column, typically called SPRAS in SAP systems tables.

During join execution, the language of the end user querying the calculation view is used to retrieve descriptions from the text table (here, MAKT) in the corresponding language, based on the language column.

Note
In the back-end system, this design for master data tables is what allows to store a description for a single item (here, a given Material Number) in different languages in a dedicated master data tables (here, MAKT) without using a different column for each language, which would be more complicated to handle.

Text Join Example

The figure, Text Join Example, is a simplified example of a Text Join. Depending on the session language of the end user, the DESC column displays the product description in English or in German.

Temporal Join

It is possible to add a temporal condition to a join in order to find matching records from two tables based on a date. The records are matched only if a date column of one table is within a time interval defined by two columns of the other table.

This is useful to manage time-dependent attributes.

Caution
Temporal Joins are only supported in the Star Join of calculation views of the type CUBE with Star Join. The join type must be defined as Inner.

Temporal Join Example

In this example, the status of the customers can change over time, and this information is captured in a dedicated table (Customer Status). If you need to analyze the sales orders and include the status of each customer when they issued the order, you create an Inner Join on the ContactID column and add a temporal condition as follows:

  • Temporal column: Date (Sales Orders)

  • From Column: DateFrom (Customer Status)

  • To Column: DateTo (Customer Status)

  • Temporal Condition: Include Both

Note
  • Temporal conditions can be defined on columns of the following data types:

    • timestamp

    • date

    • integer

  • Only columns already mapped to the output of the Star Join node can be defined as Temporal Columns in the temporal properties of the join.

Star Join

The Star Join in calculation views of the type CUBE with Star Join is a node type, rather than a join type.

It is used to structure data in a star schema. The fact table (data source) of a Star Join can be any type of input node. However, only calculation views of the data category DIMENSION are allowed as input nodes for dimensions.

The type of joins between the fact and dimension tables within the star schema can be defined in the Star Join node. The available joins are as follows:

  • Referential Join

  • Inner Join

  • Left Outer Join

  • Right Outer Join

  • Full Outer Join, with some specific restrictions (see above)

  • Text Join

Shared Columns from DIMENSION Calculation Views

In a CUBE with Star Join Calculation View, the Columns tab of the Semantics separates columns into two categories:

  • Private

    Private columns are columns that are defined inside the calculation view itself. These can be measures or attributes. You have full control over these columns.

  • Shared

    Shared columns are columns that are defined "externally", in one or more DIMENSION Calculation Views that are referenced by your CUBE with Star Join Calculation View. On these columns, you have logically less control, because they are potentially "shared" with other CUBE with Star Join Calculation Views. Still, you can hide some of these columns to keep only the ones that you need.

Regarding the shared columns, their Name and Label properties cannot be changed, compared with a private column, but you can define an Alias Name and an Alias Label. Moreover, providing Alias Names is mandatory if column names from the underlying DIMENSION calculation views conflict with each other or conflict with the private column names.

Join Cardinality

The cardinality of a join defines how the data from two tables joined together are related, in terms of matching rows.

For example, if you join the Sales Order table (left table) with the Customer table (right table), you can define an n:1 cardinality. This cardinality means that several sales orders can be related to the same customer, but the opposite is not possible (you cannot have a sales orders that relates to several customers).

Caution

We recommend that you specify the cardinality only when you are sure of the content of the tables. If not, just leave the cardinality blank.

Validating a Join

A feature in the SAP Business Application Studio suggests the recommended cardinality, based on an analysis of the tables that are joined together.

Caution

This analysis of joined tables is performed at the moment you define the join. If the content of the table evolves after that, the cardinality you have defined might become incorrect.

For example, you are validating the join between the Sales Order and Customer tables, but your data contains only one sales order per customer. In this case, the join validation might suggest a 1..1 cardinality, which does not correspond to the expected scenario in real life.

Multi-Join

SAP HANA Cloud provides a new capability in join nodes, which is to allow more than two data sources in the same node.

Watch this video to learn about the Multi-Join.

In the figure, Multi-Join Scenarios, the multi-join order property only applies in scenarios 1 and 2, and affects joins J1 and J2. The precedence between joins J1 and J3 (in scenario 2) or J1 and J2 (in scenario 3) is not controlled by the multi-join order setting.

When does Multi-Join Priority Affect the Join Node Results?

With more than two data sources feeding a Join node, the result sometimes depend on the join execution order, but this is not always the case. For example:

  • When all joins are Inner Joins, the result set is generally the same regardless of the join execution order.

  • With a mix of Inner and Left Outer Joins, the result set can vary based on the join execution order.

So, it is up to the Modeler to decide between joining more than two tables in a single node, or sticking to joining "only" two tables in a given join node, based on the potential differences in behavior, and which approach provides a better legibility of the Calculation View design in the calculation scenario and/or mapping tabs.

Non-Equi Join

SAP HANA cloud provides a type of join, called Non-Equi Join, where the join condition is not represented by an = (equal) operator, for example, the value of column CUSTOMER_ID in the table ORDERSequals the value of column ID in table CUSTOMERS, but instead, is based on other comparison operators such as Greater than.

This new type of join allows more flexible conditions to address specific scenarios.

In the SAP Business Application Studio for SAP HANA, this comes as a new type of node in which you define which operator must be applied when evaluating the join, that is, when comparing the rows based on the values of their joined columns.

Defining a Non-Equi Join condition is possible for the following types of joins:

  • Inner

  • Left Outer

  • Right Outer

  • Full Outer

Hint
In a Non-Equi Join, the operator can be specified differently for each pair of joined columns, as illustrated in Example 1 below.

In Example 1, two tables contain a list of products to be delivered and a list of sub-tasks which provides the availability date of components.

The objective is to display, in a calculation view, which sub-tasks will not finish early enough to allow the company to meet the planned completion date for some products.

This requirement can be achieved with a Non-Equi Join with the following join conditions:

  • ProductsToBeDelivered.id Equal SUBTASKS.id

  • ProductsToBeDelivered.dueOn Less Than SUBTASKS.plannedDate

In Example 2, we assume that server rental prices vary from day to day.

To calculate the total cost for a rental period of several days, rental periods are "translated" into days by a Non-Equi Join involving the system date table M_TIME_DIMENSION. The result of this Non-Equi Join is then joined (classical join) with the table priceData that contains the rental price for each day.

Note
Additional flags such as Dynamic Join are not available

Dynamic Join

Enhancing Model Flexibility with Dynamic Join

In some scenarios, you want to allow data analysis at different levels of granularity with the same calculation view.

This is generally possible in an Aggregation node when measures support the aggregation at different levels, that is, when it is possible to report measures grouped by one set of columns or another. For example, calculating the total sales by country and product in one case, and by region and product in another case.

The figure, Dynamic Join, shows an example of a more complex scenario, where you want to present two different measures side by side, either by Country or Region:

  • The sales by product

  • The total sales (all products)

In this case, assuming that you model your calculation view with a Regular Join on Country and Region, you will get correct results if you analyze the data by country, but the results will be inconsistent if you analyze the data by region.

Note
On further analysis of the example, you can see that the details of total sales by region and product are inconsistent for products that are not sold in all the countries of the region (HT-1001)... or (to be specific) in all the countries of the region that report sales.

Benefits of a Dynamic Join

With a Dynamic Join, only the join columns requested in the query are brought into context and play a part in the join execution. As a consequence, the same calculation view can be used for both purposes, that is, to analyze data by country or by region.

Note
A Dynamic Join can be defined only with multi-column joins.

If we consider the behavior of the join from an aggregation perspective:

  • In a Regular (static) Join, the aggregation is executed after the join.

  • In a Dynamic Join, when a joined column is not requested by the client query, an aggregation is triggered to remove this column, and then the join is executed based only on the requested columns.

Caution
With a Dynamic Join, if none of the joined columns are requested by the client query, you get a query runtime error.

Join Columns Optimization

In this lesson, you have already learned about a powerful optimization approach for joins in SAP HANA Cloud, which is to use Referential Joins. Compared with an Inner Join, a Referential Join can be omitted, in some circumstances, without affecting the calculation view output. These circumstances mainly relate to referential integrity, cardinalities, and which of the joined data sources the requested columns come from.

Still, in scenarios where one of the join partners can be pruned, another optimization question is whether the joined column(s) in the queried data source, the one that is not pruned, should be kept.

Background - The Default Behavior

Per default, the field on which a join is defined will be requested during query processing, regardless of whether the field is necessary from a query point of view.

The key objective is to guarantee a consistent aggregation behavior concerning the join field, even when the field in not requested by the query. So, the purpose of the default behavior of query execution is to get a stable aggregation behavior. If this aggregation behavior changes depending on which columns are requested, this could lead to a change of the resulting values of measures.

Optimize Join Columns Option

When you select the Optimize Join Columns checkbox, you tell the calculation engine that – in case a join partner is pruned – you do not expect changes to the result set depending on whether the joined column from the queried data source is used for aggregation or not. This depends heavily on the type of aggregations that are performed by the calculation view. In particular, the SUM aggregate function is not sensitive to the grouping level, whereas the MAX or MIN are.

When Does the Optimization (Join Pruning) Occur?

When the Optimize Join Columns option is active, pruning of join columns between two data sources, A and B, occurs when all four following conditions are met:

  • The join type is Referential, Outer or Text (in particular, the calculation view cannot be built if join type is Inner).

  • Only columns from one join partner, A, are requested.

  • The join column from A is NOT requested by the query.

  • The cardinality on B side (the side of the join partner from which no column is requested) is :1.

Caution
As you see, the optimization heavily relies on cardinality. So, you must ensure that the cardinality is set according to the actual data model. If it is not, the Optimize Join Columns option will produce unstable (though sometimes faster) results.

Save progress to your learning plan by logging in or creating an account