Using Join Nodes

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Implement a join node to combine data sources
  • Join more than two tables in a single join node
  • Configure a non-equi join
  • Describe how a dynamic join can prevent incorrect aggregations

Connecting Data Sources using Join Nodes

One important activity when creating calculation views, is to express the relationships between the different data sources. Most often, this is done by using joins.

A specific node type, the Join node, is used in modeling to define joins between one or more data sources.

In addition to the join node, the Star Join node, which is used to model a star schema, also defines one or several joins between a main data sources (fact "table") and the dimension calculation views.

For each Join node, you must define which columns of the two joined sources must participate in the join condition, as well as the join type. You can also specify the cardinality, with the help of the Propose Cardinality feature.

Sample Business Case and Data

To illustrate the behavior of the different types of joins in SAP HANA Cloud, consider the following tables:

  • Sales Order

  • Customer

  • State

The objective is to join these tables to retrieve the sales order amounts (facts) with the customer information, including the states in which the customers reside.

To begin with, you can make the following observations so that you will better understand the behavior of each join type that is illustrated in the next section:

  • Sales Order 8 does not have a customer master record.

  • Customer TOM does not have any orders.

  • State TX does not have a description.

  • No customer resides in Alabama.

Inner Join

The Inner Join is the most basic of the join types. It returns rows when there is at least one match on both sides of the join.

Inner Join in a CUBE Calculation View

The figure, Inner Join in a CUBE Calculation View, shows the behavior of Inner Joins in a CUBE calculation view.

With the sample scenario data, some facts are not retrieved because customer information is missing.

Left Outer Join

A Left Outer Join returns all rows from the left table, even if there are no matches in the right table.

Left Outer Joins and Design Time Filters

The figure, Left Outer Joins and Design Time Filters, shows the behavior of left outer joins with design time filters.

Left Outer Join in a CUBE Calculation View

The figure, Left Outer Joins in a CUBE calculation view, shows the use of left outer joins in a CUBE calculation view.

Compared with the Inner join, all the sales order data (including those with no corresponding customer information) is retrieved, but still an analysis of sales by customer or state will return irrelevant data.

Right Outer Join

A Right Outer Join returns all the rows from the right table, even if there are no matches in the left table.

Right Outer Join in a CUBE Calculation View

Full Outer Join

A Full Outer Join combines the behaviors of the Left and Right Outer Joins.

The result set is composed of the following rows:

  • Rows from both tables that match on joined columns

  • Rows from the left table with no match in the right table

  • Rows from the right table with no match in the left table

Caution

A Full Outer Join is supported by calculation views only, in the standard Join and Star Join nodes.

However, in a Star Join node, a full outer join can be defined only on one DIMENSION calculation view, and this view must appear last in the Star Join node.

Referential Join

SAP HANA Cloud offers a type of join that is optimized for performance: the Referential Join.

The key principle of a Referential Join is that, if referential integrity between two tables is ensured, then under some circumstances, the join between these tables will not be executed, which will save execution time.

The concept of referential integrity between two tables (A and B) means that, in the joined columns, there is always a match in table B for a row of column A, or the other way round, or both.

Let’s take the example of a cube with star join calculation view that is defined with many dimension calculation views, and the joins all have cardinality n..1 (meaning, a fact is connected to, at most, one member or each dimension). Let’s assume that the source system that sent the data to SAP HANA Cloud ensures that all records in the fact table always have a match in all the dimension calculation views.

Note

A referential join can be implemented in a standard join node as well as a star join node.

So, when you execute a query on your cube calculation view, the calculation engine can prune any dimension calculation view from which your query does not request any column. So the corresponding join will not be executed. This is where optimization occurs.

Referential Join

Relies on Referential IntegrityReferential integrity is the fact that matches exist between the joined tables (in one direction, for example left to right, or right to left, or both).
Optimized for performanceReferential Join is not executed in circumstances where the result (without the join) will be the same as if the join was executed.
Like an Inner Join when join is executedWhen a Referential Join is not pruned, it is executed as an Inner Join.

Defining a Referential Join

To define a Referential Join, you first add a Join node to the calculation view scenario, and assign two or more data sources. In the case of a CUBE with Star Join calculation view, you assign a lower node to the Star Join node and add one or more DIMENSION calculation view.

Then, you have to define the following settings:

  • Join Type

    This setting must be set to Referential.

  • Cardinality

    Cardinality must be specified. If it is not, the Referential Join that cannot be optimized.

  • Integrity Constraint

    This setting defines in which direction the referential integrity is guaranteed.

    • Left: Every entry in left table has at least one match in right table.

    • Right: Every entry in right table has at least one match in left table.

    • Both: Every entry in both tables has at least one match in the other table.

With the Integrity Constraint setting, you ensure that the optimization of the join occurs only when it is actually possible, because the cardinality alone is not enough. If Integrity Constraint is set to left, no join optimization is triggered if a query requests columns only from the right table.

Conditions for Referential Join Optimization

A join defined as a Referential Join between two tables or sources, A and B, is pruned (not executed) when all three following conditions are met:

  • No field is requested from B.

  • Integrity is placed on A.

  • The cardinality on B side is :1.

Note

When the cardinality on B side is not :1, the join will always be executed even if no column from B is requested. This is a requirement to get the correct number of rows in the output, which depends on the number of matching rows in B for each row in A.

Let’s take an example, based on the figure, Referential Join.

The Join is defined as Referential, the Integrity Constraint (Integrity) is placed on the Left "table". This means that any record from the SNWD_SO_I table has at least one match in the CVD_PD calculation view. However, the cardinality is n..1, which also tells us that this is at most one match.

So all in all, for each record of SNWD_SO_I, there is exactly one match in CVD_PD.

With this join definition, if a query selects NO column from the right "table" CVD_PD, then the join will not be executed.

Caution

Referential Joins must be used with caution because they assume that referential integrity is ensured at any time. Using Referential Joins in a context where referential integrity is not ensured might lead to different results depending on whether or not you select columns from one of the two data sources.

Join Multiple Data Sources using Join Nodes

Text Join

Often, attributes provide labels in multiple languages to support the scenario where different business users require reports to be displayed in their own local language. SAP HANA Cloud calculation views support this scenario.

A Text Join enables the display of attribute labels according to the language of the business user. The prerequisite condition to using this feature is that the source data provides the attribute labels in various languages. Calculation views do not translate labels.

Technically, a Text Join behaves like a Left Outer Join, with cardinality 1:1, but in addition you specify a language column, typically called SPRAS in SAP systems tables.

During join execution, the language of the end user querying the calculation view is used to retrieve descriptions from the text table (here, MAKT) in the corresponding language, based on the language column.

Note
In the source system, this design for master data tables is what allows to store a description for a single item (here, a given Material Number) in different languages using a dedicated text table (here, MAKT). The alternative would be to use a different column in the main table to store each language, which would be more complicated to handle.

Text Join Example

Temporal Join

Often, master data stores historical values. For example, in the employee table there might be two records for one employee. One record represents the employees job position in the past, and the second record represents the employees job position today. Each record contains dates to represent the validity of the record. So how does a calculation view know which record to request?

It is possible to add a temporal condition to a join in order to find matching records from two tables based on a date. The records are matched only if a date column of one table is within a time interval defined by two columns of the other table.

Temporal Join Example

In this example, the status of the customers can change over time, and this information is captured in a dedicated table (Customer Status). If you need to analyze the sales orders and include the status of each customer when they issued the order, you create an Inner Join on the ContactID column and add a temporal condition as follows:

  • Temporal column: Date (Sales Orders)

  • From Column: DateFrom (Customer Status)

  • To Column: DateTo (Customer Status)

  • Temporal Condition: Include Both

Note
  • Temporal conditions can be defined on columns of the following data types:

    • timestamp

    • date

    • integer

  • Only columns already mapped to the output of the Star Join node can be defined as Temporal Columns in the temporal properties of the join.

  • Temporal Joins are only supported in the Star Join of calculation views of the type cube with star join. The join type must be defined as Inner.

Join Cardinality

The cardinality of a join defines how the data from two tables joined together are related, in terms of matching rows.

For example, if you join the Sales Order table (left table) with the Customer table (right table), you can define an n:1 cardinality. This cardinality means that several sales orders can be related to the same customer, but the opposite is not possible (you cannot have a sales orders that relates to several customers).

Caution

We recommend that you specify the cardinality only when you are sure of the content of the tables. If not, just leave the cardinality blank.

Validating a Join

A feature in the SAP Business Application Studio suggests the recommended cardinality, based on an analysis of the tables that are joined together.

Caution

This analysis of joined tables is performed at the moment you define the join. If the content of the table evolves after that, the cardinality you have defined might become incorrect.

For example, you are validating the join between the Sales Order and Customer tables, but your data contains only one sales order per customer. In this case, the join validation might suggest a 1..1 cardinality, which does not correspond to the expected scenario in real life.

Joining Multiple Data Source in a Join Node

A join requires, at minimum, two data sources. But it is possible to include more than two data sources in a join node.

Watch this video to learn about the Multi-Join.

In the figure, Multi-Join Scenarios, the multi-join order property only applies in scenarios 1 and 2, and affects joins J1 and J2. The precedence between joins J1 and J3 (in scenario 2) or J1 and J2 (in scenario 3) is not controlled by the multi-join order setting.

When does Multi-Join Priority Affect the Join Node Results?

With more than two data sources feeding a Join node, the result sometimes depend on the join execution order, but this is not always the case. For example:

  • When all joins are Inner Joins, the result set is generally the same regardless of the join execution order.

  • With a mix of Inner and Left Outer Joins, the result set can vary based on the join execution order.

So, it is up to the Modeler to decide between joining more than two tables in a single node, or sticking to joining "only" two tables in a given join node, based on the potential differences in behavior, and which approach provides a better legibility of the Calculation View design in the calculation scenario and/or mapping tabs.

Non-Equi Join

SAP HANA cloud provides a type of join, called Non-Equi Join, where the join condition is not represented by an = (equal) operator, for example, the value of column CUSTOMER_ID in the table ORDERSequals the value of column ID in table CUSTOMERS, but instead, is based on other comparison operators such as Greater than.

This new type of join allows more flexible conditions to address specific scenarios.

In the SAP Business Application Studio for SAP HANA, this comes as a new type of node in which you define which operator must be applied when evaluating the join, that is, when comparing the rows based on the values of their joined columns.

Defining a Non-Equi Join condition is possible for the following types of joins:

  • Inner

  • Left Outer

  • Right Outer

  • Full Outer

Hint
In a Non-Equi Join, the operator can be specified differently for each pair of joined columns, as illustrated in Example 1 below.

In Example 1, two tables contain a list of products to be delivered and a list of sub-tasks which provides the availability date of components.

The objective is to display, in a calculation view, which sub-tasks will not finish early enough to allow the company to meet the planned completion date for some products.

This requirement can be achieved with a Non-Equi Join with the following join conditions:

  • ProductsToBeDelivered.id Equal SUBTASKS.id

  • ProductsToBeDelivered.dueOn Less Than SUBTASKS.plannedDate

Dynamic Join

Enhancing Model Flexibility with Dynamic Join

In some scenarios, you want to allow data analysis at different levels of granularity with the same calculation view.

This is generally possible in an Aggregation node when measures support the aggregation at different levels, that is, when it is possible to report measures grouped by one set of columns or another. For example, calculating the total sales by country and product in one case, and by region and product in another case.

The figure, Dynamic Join, shows an example of a more complex scenario, where you want to present two different measures side by side, either by Country or Region:

  • The sales by product

  • The total sales (all products)

In this case, assuming that you model your calculation view with a Regular Join on Country and Region, you will get correct results if you analyze the data by country, but the results will be inconsistent if you analyze the data by region.

Note
On further analysis of the example, you can see that the details of total sales by region and product are inconsistent for products that are not sold in all the countries of the region (HT-1001)... or (to be specific) in all the countries of the region that report sales.

Benefits of a Dynamic Join

With a Dynamic Join, only the join columns requested in the query are brought into context and play a part in the join execution. As a consequence, the same calculation view can be used for both purposes, that is, to analyze data by country or by region.

Note
A Dynamic Join can be defined only with multi-column joins.

If we consider the behavior of the join from an aggregation perspective:

  • In a Regular (static) Join, the aggregation is executed after the join.

  • In a Dynamic Join, when a joined column is not requested by the client query, an aggregation is triggered to remove this column, and then the join is executed based only on the requested columns.

Caution
With a Dynamic Join, if none of the joined columns are requested by the client query, you get a query runtime error.

Save progress to your learning plan by logging in or creating an account