Working with the Union Node

Objective

After completing this lesson, you will be able to Combine data from different sources using the union node.

Union Node

A Union node is used to combine two or more data sets to a common set of columns.

Screen capture of the definition of a union node that combines actual and plan data. If you want to combine multiple result sets with identical structures into a single result set, you can use a union node. A mapping of the sources to the target is required and allows you to adapt structural differences. This can be done via a drag-and-drop interface.

Depending on how different the column names are in the data source, you can map the columns automatically by name (columns with identical names will be mapped automatically) or define the mapping manually.

A union node can be implemented in a dimension, cube and cube with star join calculation views.

Standard Union

Depending on the requirement, you can use one of the following approaches:

  • A standard union

  • A union with constant values

Schematic example of a standard union. Two views A and B have the same 3 fields, Customer, Amount and Flag. Flag is always A (actual) for view A, and P (plan) for view B. The result after aggregation lists the summed amounts per customer and flag.

A standard union is where, for each target column in the union, there is always one source column that is mapped. In the example, you see how both sources provide a mapped column to the union. The source column does not have to have the same name as the target column. For example, a source column month could be mapped to a target column period. Additionally, a standard union does not have to provide a mapping for each source column. In other words, there could be source columns that are left behind and play no part in the union. This scenario is useful when you want to combine measures from multiple sources into one column. As the data sources can provide an attribute that describes the type of measure (for example, Plan or Actual), it means that we do not lose the meaning of each row.

Union with Constant Values

Schematic example of a union with constant values. View A has fields Customer and Amount_P (amount, plan data). View B has the fields Customer and Amount_A (amount, actual data). The union has fields Customer, Amount_A and Amount_P. Fields that are not mapped are filled with constant values 0. The result after aggregation lists the summed Amount_A and Amount_P per customer.

A union with constant values is where you provide a fixed value that is used to fill a target column where the data source cannot provide a mapped column. For example, you have a data source that provides the actual amount and you also have a second data source that provides the planned amount. You decide to avoid combining these measures into one column as they would lose meaning, because there is no attribute to describe the meaning. In this case, you would map the measure to a separate target column and provide a constant value ‘0’ to the target column for the missing measure on each side.

The choice between a standard union and a union with constant values depends on the data you are using and the way you want the end users to report on data.

If it is more beneficial to present different measures in different columns, you can use a union with constant values so that you have a way to provide a value (probably zero) to a column where there is no source mapping. On the contrary, if it is easier to present measures in a single column and differentiate them with an attribute, such as an ‘amount type’, use the standard union.

Mapping Columns Based on their Names

It is possible to automatically map columns in a Union node based on their names. This helps to save time when the data structures have some similarities in terms of column names.

The Auto Map by Name feature can be used in two different ways, depending on whether you make a selection in the Data Source area before selecting the icon.

Two Options to Map Columns by Name Automatically

ScenarioResult
You have not selected any data source (default behavior)

All the columns of all data sources are added to the output.

Columns with matching names are mapped together.

You have selected one or several data source(s)

Only the columns of the selected data source(s) are added to the output.

The columns from the other data source(s) are added to the output only if they have a matching column in the selected data source(s). They are mapped to their matching columns.

Note

Selecting a data source before triggering Auto Map by Name is useful when one or several other data sources have a lot of columns that you do not want to include in the output.

Unmapped Columns in a Union Node

There could be instances when a union needs to be performed between data sources that do not provide the same data structure.

  • If column names are different but contain the same type of information (for example, CURRENCY versus CURRENCY_CODE, you must define the mapping manually. This can be done with drag and drop.

  • If a column exists in only one data source but you need it in the target, you map it to the target. Then, for the other source, you must decide whether you allow the data in this column to be null or you can define a constant value that will be assigned to all the rows from this data source.

As an example, you have a data source that provides the customer status attribute and you have a second data source where the customer status attribute cannot be provided; there is no column for this. You know this second data source contains only customers with the status Active. Therefore, you simply fill the target column with a constant value for this data source to Active for every row.

Even if you have a source column available for mapping, you can choose to use a constant value instead. This creates the effect of overriding the source value.

Manage Mapping of Unmapped Columns in a Union Node

Although we described how constants are often used when one of the data sources cannot, or should not, provide a value to the union target column, we can also use constants when none of the data sources can provide a value. To do this, we first create an empty column in the union target. We then define a constant value for each data source. For example, I would like to create a union between table A, that contains part-time employees’ data, and another table B, that contains full-time employees’ data. The employment status is very important for my analysis but neither table contains a column that indicates employment status. So, I simply add a new column to the union target called ‘Employment Status’ and define a constant value for one data source as Full Time and the other data source as Part Time. Basically, I have manually tagged each source with a fixed label that now appears in each row and I can use this new column for filtering, aggregation, and so on.

To set the constant value, right-click the target column and choose Manage Mappings.

Screen capture showing how to define a constant value. The example shows the context menu of the output column CLIENT of a union node mapping. Choose Manage Mappings. Then, on the Manage Mapping window, enter a source column or a constant value, for example the CLIENT source column if the source model is ACTUAL and constant value 800 if the source model is PLAN.

Empty Union Behavior

We know that the data sources to a union can provide different columns. So, what happens when a query requests columns that are present in one source but not another source? Do you want to be made aware that one source was not able to provide a row? Or is this not important?

Screen capture of the property called Empty Union Behavior on the Mapping tab of the union node. For details, refer to the following text.

To illustrate this feature, imagine you have a data source, A, that contains the columns Product and Product Group and another data source, B, that contains only the column Product. If a query requests only the column Product Group, how does data source B respond when it does not have this column?

The answer depends on the setting in the property, Empty Union Behavior, which is set for each data source. The default behavior, as you might expect, is to provide No Row for data source B.

However, there are times when you might prefer to know that no rows could be returned from data source B. In that case, you should proceed as follows:

  1. Add a constant column to the union output.

  2. Provide a constant value for each data source with a suitable value that help you identify each source.

  3. Change the property Empty Union Behavior to Row with Constants.

To test this, you simply consume the calculation view that contains the union, with a query that does not request any column from one of the data sources. In our case, stated previously, the query should request only the column Product Group. The results will then contain multiple rows from data source A, and also a single row with the constant value that you defined for the data source B and nulls will fill the empty columns that it could not provide.

Combine Two Data Sources with a Union Node

Union All or Union Distinct

A Union node generates a list of all values from the input data sources, even if values are repeated. For example, if two data sources both include the same customer, then the customer will appear twice in the output data set.

This is the equivalent of UNION ALL in SQL and is desirable in many cases.

Schematic example of a Calculation view. A lower level UNION node is displayed. It generates a UNION ALL result set with repeated entries. To obtain the result set of a UNION without duplicates, include an AGGREGATION node on top. Only distinct values remain.

If repeating values are not required, you should include an Aggregation node on top of the Union node to aggregate the attributes. We normally associate aggregation behavior with measures, such as SUM, AVE, and so on, but aggregation can also be performed on attributes. The effect of aggregating attributes is to simply produce a distinct list of values.

This is the equivalent of UNION DISTINCT (or just UNION) in SQL.

Log in to track your progress & complete quizzes