Implementing Union Pruning

Objective

After completing this lesson, you will be able to Implement union pruning to improve performance of calculation views.

Union Pruning

What is Union Pruning?

There are opportunities to significantly improve the performance of unions by implementing union pruning. There are three approaches to implementing pruning rules in unions. Two of these approaches are based on column values and one is based on column names. We will first cover union pruning using values.

Constant Pruning

For each data source in a union node, you can define an additional column and fill it with a constant value to explicitly tag the meaning of the data source. For example, you could assign the constant valueplan to the data source that represents plan data and assign the constant value actual to the data source that represents actual data. If a query filter - where clause - does not match the constant value for a specific source of a union, then the source is ignored. This is called explicit pruning. Explicit pruning helps performance by avoiding access to data sources that are not needed by the query.

Series of screen captures showing an example of explicit Union pruning. Use the Manage Mapping dialog to populate a new field temperature with constant value old for a source with older data, and value new for a source with current data. When a query contains a clause ...WHERE temperature = 'old', the source for current data is pruned.

In the example in the figure, Union Pruning, the union node has a new column defined in the union node. To assign the constant value, we use the Manage Mapping pane and enter a constant value for the new column per data source. In our case, the new column is named temperature. In our case, either the value old or new is assigned. This is a simple way to prune data sources in unions where the rules are simple and rarely change. Although a union node usually has just one constant column, you can create and map multiple constant columns and use AND/OR filter conditions in your query. However, a key point is that if you do not explicitly refer to the constant column in the sending query, then the data source is always included in the union and is not pruned.

Explicit pruning is a very simple and direct way to implement union pruning but it requires maintenance of the constant value in each calculation view union node. Maintaining pruning rules inside each calculation views could result in high maintenance.

Implicit Pruning

Implicit pruning is implemented by defining one or more pruning rules in a dedicated table called the Pruning Configuration Table. The pruning configuration table sits outside the calculation view and allows you to describe in greater detail the rules that determine when each source of a union is valid.

This implicit pruning approach does not need to provide constant values in the union node for each data source inside the calculation view. With a union pruning configuration table, you can define multiple single values per column that are treated with ‘OR’ logic. You can also use the greater than and less than operators. You can as well define values across different columns in a rule. Values across columns are treated with ‘AND’ logic.

This gives much more flexibility over explicit pruning where you can only enter a single value per data source. It also means that you can easily update the pruning rules as these are simply records in a table that is shared by all calculation views.

Series of screen captures showing an example of implicit Union pruning. Use a configuration table to list conditions for each source: In the example, field ChangedAt fcontains values lower 2012-10-19 in the source for older data, and values starting with 2012-10-19 in the source for current data.

The first time you create a pruning configuration table, you will need to provide a table name. You then provide the pruning rules using various operators:

  • The possible operators are =, <, >, <=, >= and BETWEEN.

  • The BETWEEN operator refers to a closed interval (including the LOW and HIGH values).

  • If several pruning conditions exist for the same column of a view, they are combined with OR.

  • On different columns, the resulting conditions are combined with AND.

The conditions in a Pruning Configuration Table can be defined on columns of the following data types:

  • INT
  • BIGINT
  • VARCHAR
  • NVARCHAR
  • Date

For example, you could define that a source to a union should only be accessed if the query is requesting the column YEAR = 2008 OR YEAR = 2010 OR YEAR between 2012 – 2014 AND column STATUS = ‘A’ OR STATUS = ‘X’.

This means that if the requested data is 2008 and status ‘C’, then the source is ignored. If the request is for 2013 and status ‘A’, then the source is valid.

We can use input parameters to direct the query to use only the union sources that contain the data required.

Whichever technique you choose, you are providing the optimizer with important information that it uses when making runtime decisions regarding the union sources. This ensures the best possible performance by avoiding processing sources that are not needed.

Implement Value-based Union Pruning

Optimize Union Pruning

Watch this video to learn about optimizing Union Pruning.

Column-based Union Pruning

Column-based Pruning

Column-based pruning is another approach to eliminating data sources within unions that are not required by queries. However, unlike the other approaches, which are based on checking column values, column-based pruning looks at whether specific columns have been requested by a query and if those columns are defined as important to a result set.

Column-based pruning explained by screen captures. If a column is not mapped from one source, this source can be pruned for this column.

There are two parts to the definition of the rules.

  1. Map-required columns to the union output
  2. Define Focus column(s)

In our graphic, we see that both data sources could provide the SALES column, but only one data source salesOrder2019 provides the column to the output.

We also see that the SALES column has been defined as a Focus Column, which means that if a query requests this column and the data source cannot provide it then we are not interested in that data source, even if other columns (YEAR and so on) could be provided.

It is possible to define multiple Focus Columns and they can be attributes or measures or a mix of both.

This approach provides a way to define flexible pruning conditions that reacts to the calling query based on which columns are requested.

Note

It is possible to mix all three union pruning approaches, if it makes sense.

Implement Column-based Union Pruning

Column-Based Pruning

Watch this video to learn about how to optimize Union Pruning based on the columns that are included in the query executed on top of the Calculation View.

Log in to track your progress & complete quizzes