What is Union Pruning?
There are opportunities to significantly improve the performance of unions by implementing union pruning, There are three approaches to implementing pruning rules in unions. Two of these approaches are based on column values and one is based on column names. We will first cover union pruning using values.
For each data source in a union node, you can define an additional column and fill it with a constant value to explicitly tag the meaning of the data source. For example, you could assign the constant valueplan to the data source that represents plan data and assign the constant value actual to the data source that represents actual data. If a query filter - where clause - does not match the constant value for a specific source of a union, then the source is ignored. This is called explicit pruning. Explicit pruning helps performance by avoiding access to data sources that are not needed by the query.
In the example in the figure, Union Pruning, the union node has a new column defined in the union node. To assign the constant value we use the Manage Mapping pane, and enter a constant value for the new column per data source. In our case the new column is named temperature. In our case either the value old or new is assigned. This is a simple way to prune data sources in unions where the rules are simple and rarely change. Although a union node usually has just one constant column, you can create and map multiple constant columns and use AND/OR filter conditions in your query. But a key point is that if you don't explicitly refer to the constant column in the sending query, then the data source is always included in the union and is not pruned.
Explicit pruning is a very simple and direct way to implement union pruning but it requires maintenance of the constant value in each calculation view union node. Maintaining pruning rules inside each calculation views could result in high maintenance.
Implicit pruning is implemented by defining one or more pruning rules in a dedicated table called the Pruning Configuration Table. The pruning configuration table sits outside the calculation view and allows you to describe in greater detail the rules that determine when each source of a union is valid. This implicit pruning approach does not need to provide constant values in the union node for each data source inside the calculation view. With a union pruning configuration table, you can define multiple single values per column that are treated with ‘OR’ logic. You can also use ranges and also greater than or less than. You can also define values across different columns in a rule. Values across columns are treated with ‘AND’ logic. This gives much more flexibility over explicit pruning where you can only enter a single value per data source. It also means you can easily update the pruning rules as these are simply records in a table that is shared by all calculation views.
The first time you create a pruning configuration table, you will need to provide a table name. You then provide the pruning rules using various operators:
The possible operators are =, <, >, <=, >= and BETWEEN.
The BETWEENoperator refers to a closed interval (including the LOW and HIGH values).
If several pruning conditions exist for the same column of a view, they are combined with OR.
On different columns, the resulting conditions are combined with AND.
The conditions in a Pruning Configuration Table can be defined on columns of the following data types:
For example, you could define that a source to a union should only be accessed if the query is requesting the column YEAR = 2008 OR YEAR = 2010 OR YEAR between 2012 – 2014 AND column STATUS = ‘A’ OR STATUS = ‘X’.
This means that if the requested data is 2008 and status ‘C’, then the source is ignored. If the request is for 2013 and status ‘A’, then the source is valid.
We can use input parameters to direct the query to use only the union sources that contain the data required.
Whichever technique you choose, you are providing the optimizer with important information that it uses when making run time decisions regarding the union sources. This ensures the best possible performance by avoiding processing sources that are not needed.
Column-based pruning is another approach to eliminating data sources within unions that are not required by queries. But unlike the other approaches, which are based on checking column values, column-based pruning looks at whether specific columns have been requested by a query and if those columns are defined as important to a result set.
There are two parts to the definition of the rules.
- Map required columns to the union output
- Define Focus column(s).
In our graphic we see that both data sources could provide the column SALES but only one data source salesOrder2019 provides the column to the output.
We also see that the column SALES has been defined as a Focus Column which means that if a query requests this column and the data source cannot provide it then we are not interested in that data source, even if other columns (YEAR etc.) could be provided.
It is possible to define multiple Focus Columns and they can be attributes or measures or a mix of both.
This approach provides a way to define flexible pruning conditions that reacts to the calling query based on which columns are requested.