What is Union Pruning?
There are opportunities to significantly improve the performance of unions by implementing union pruning. There are three approaches to implementing pruning rules in unions. Two of these approaches are based on column values and one is based on column names. We will first cover union pruning using values.
For each data source in a union node, you can define an additional column and fill it with a constant value to explicitly tag the meaning of the data source. For example, you could assign the constant valueplan to the data source that represents plan data and assign the constant value actual to the data source that represents actual data. If a query filter - where clause - does not match the constant value for a specific source of a union, then the source is ignored. This is called explicit pruning. Explicit pruning helps performance by avoiding access to data sources that are not needed by the query.
In the example in the figure, Union Pruning, the union node has a new column defined in the union node. To assign the constant value, we use the Manage Mapping pane and enter a constant value for the new column per data source. In our case, the new column is named temperature. In our case, either the value old or new is assigned. This is a simple way to prune data sources in unions where the rules are simple and rarely change. Although a union node usually has just one constant column, you can create and map multiple constant columns and use AND/OR filter conditions in your query. However, a key point is that if you do not explicitly refer to the constant column in the sending query, then the data source is always included in the union and is not pruned.
Explicit pruning is a very simple and direct way to implement union pruning but it requires maintenance of the constant value in each calculation view union node. Maintaining pruning rules inside each calculation views could result in high maintenance.
Implicit pruning is implemented by defining one or more pruning rules in a dedicated table called the Pruning Configuration Table. The pruning configuration table sits outside the calculation view and allows you to describe in greater detail the rules that determine when each source of a union is valid.
This implicit pruning approach does not need to provide constant values in the union node for each data source inside the calculation view. With a union pruning configuration table, you can define multiple single values per column that are treated with ‘OR’ logic. You can also use the greater than and less than operators. You can as well define values across different columns in a rule. Values across columns are treated with ‘AND’ logic.
This gives much more flexibility over explicit pruning where you can only enter a single value per data source. It also means that you can easily update the pruning rules as these are simply records in a table that is shared by all calculation views.
The first time you create a pruning configuration table, you will need to provide a table name. You then provide the pruning rules using various operators:
The possible operators are =, <, >, <=, >= and BETWEEN.
The BETWEEN operator refers to a closed interval (including the LOW and HIGH values).
If several pruning conditions exist for the same column of a view, they are combined with OR.
On different columns, the resulting conditions are combined with AND.
The conditions in a Pruning Configuration Table can be defined on columns of the following data types:
For example, you could define that a source to a union should only be accessed if the query is requesting the column YEAR = 2008 OR YEAR = 2010 OR YEAR between 2012 – 2014 AND column STATUS = ‘A’ OR STATUS = ‘X’.
This means that if the requested data is 2008 and status ‘C’, then the source is ignored. If the request is for 2013 and status ‘A’, then the source is valid.
We can use input parameters to direct the query to use only the union sources that contain the data required.
Whichever technique you choose, you are providing the optimizer with important information that it uses when making runtime decisions regarding the union sources. This ensures the best possible performance by avoiding processing sources that are not needed.