Working with Aggregation Nodes

Objective

After completing this lesson, you will be able to aggregate measures using the aggregation node

Aggregation Node

The purpose of an Aggregation node is to apply aggregate functions to measures based on one or several attributes. The node generates a result set that is grouped by the selected attribute columns and computes the selected measure with the specified function, such as SUM, MIN, AVG, and so on.

Screen capture of an Aggregation node of a calculation view. In this example, details on the Columns tab define aggregation function SUM for the measure Amount_Total and MAX for the measure Largest_Amount.

By default, the granularity of the aggregation is defined by the attribute columns that are mapped to the output of the aggregation node. In the example, you will get one aggregate row for each country/order pair, showing two measures: the total amount of each order, and the biggest line item of each order.

Note

If you want to get the total amount and biggest order amount at the country level, removing the Order column from the output is not enough because you will still get the biggest line item detail in each country. In that case, you should first aggregate both measures with SUM in one aggregation node (without the Order column), and add another aggregation node on top defined like in the example (but without the Order column).

If you exclude an attribute from an upper node (in the same calculation view or another one that consumes it), or in a query executed on top of this view, by default these attributes are ignored. In addition, the calculation view aggregates the result set on the remaining attributes.

A graphical calculation view in SAP HANA Cloud supports the following aggregate functions:

  • Sum (the default function)

  • Min

  • Max

  • Count

  • Average

  • Variance

  • Standard deviation

  • Median

Caution

These aggregate functions should be used very carefully in stacked scenarios to avoid incorrect results.

In an Aggregation node, a calculated column is always computed AFTER the aggregate functions. If you need to calculate a column BEFORE aggregating the data from this column, you have to define the calculated column in another node, for example a Projection node, executed BEFORE the aggregation node in the calculation tree.

Defining an Aggregation in the Client Query

When you execute an SQL query on top of a CUBE Calculation View (with or without Star Join), an attribute that is not selected is ignored, as discussed earlier. But the aggregation behavior depends on whether your SQL query includes a GROUP BY clause or not.

  • If your SQL query does not include a GROUP BY clause, each requested measure is aggregated as specified in the Semantics of the Calculation View.
  • If your SQL query includes a GROUP BY clause, the aggregate function you specify for each requested measure overwrites the one defined in the Semantics of the Calculation View. This provides a lot of flexibility, but can be error-prone in some scenarios. It is recommended to use this with care.

Note

The default data preview SQL statement for a CUBE calculation view (with or without star join) in SAP Business Application Studio and Database Explorer uses the second approach, thus showing you which aggregate function is applied to each measure as per the Semantics. Be careful with the data consistency if you change any aggregate function.

Defining Aggregation Attributes

From QRC 1/2023 onwards, you can provide in the Semantics node of a calculation view of type CUBE (with or without star join) information about which aggregation level is relevant and which is not.

Screen capture of an Aggregation node of a calculation view. In this example, details on the Columns tab define aggregation attribute CURRENCY_CODE for the measure GROSS_AMOUNT_SUM. For the consequence, see the following text

In this example, the GROSS_AMOUNT should not be aggregated without keeping the column CURRENCY_CODE in the result set, at least if the column is not converted. Indeed, this might result in adding amounts expressed in more than one currency, for example USD and EUR, making the output of the calculation view inconsistent.

For each measure of the calculation view, you can define one or several attributes that should be kept when aggregating data.

The information entered in the Semantics is meant to be used by reporting tools through the SAP HANA Analytics Catalog, which is made up of BIMC Views (schema _SYS_BI). The main table holding the aggregation attribute(s) information is BIMC_EXCEPTION_AGGREGATION_ATTRIBUTES.

Caution

The setting defined in the Semantics has no influence on the way aggregations are executed by the calculation engine. Besides, it does not prevent the execution of a query that might return inconsistent data because an aggregation attribute needed for a given column has not been included.

Controlling the Behavior of the Aggregation Node

When you work with Aggregation nodes, the list of columns requested by the client query can influence the way the aggregation is executed, especially in complex scenarios.

The following features can help you control the aggregation of measures, in order to build more flexible models:

  • Keep Flag

  • Transparent Filter

Keep Flag

Let’s consider a scenario where a data source contains the details of sales orders by order ID. The only measures available are quantity and price.

Note

Each order ID relates to one store, one customer, and only one product (in order to simplify the example).

You want to calculate the quantity and total sales for the product Mouse and the month of February.

Screen capture of a data preview showing the effect of the Keep Flag. In this example, the raw data contains four sales orders for the month of February and the product Mouse, each with a quantity of 2 and a price of 5. Without the Keep Flag property, the quantities would first sum to 8 and the prices would sum to 20, and then the calculation would result in an incorrect sales value of $160. But, with the Keep Flag property on the ORDER_ID field, the preview correctly displays $40 because the sales value is calculated per ORDER_ID.

Let's mandate a level of aggregation (Month, Product) that is generic. Hence, the total sales is calculated by multiplying the sum of quantities by the sum of unit prices.

Setting the Keep Flag property to true for the Order ID column forces the calculation to be triggered at the relevant level of granularity, even if this level is not the grouping level defined by the client query.

The Keep Flag option can also be defined on the shared columns in the Star Join node of a CUBE with Star Join calculation view (shared columns are the ones defined in the underlying DIMENSION calculation views).

Transparent Filter

In some scenarios, using a filter (where clause) in a client query forces a column to be used in the Group By columns set.

Screen capture of a data preview showing the effect of the Transparent Filter property. In this example, the raw data contains one record for customer Susan in store TigerDirect and two records for customer John in stores TigerDirect and Amazon. Without the Transparent Filter property on the Customer column, the store count would be 1 + 2 = 3, incorrectly missing the duplication of the TigerDirect store. But, with the Transparent Filter property, the StoreCount is correctly 2. For details, see the following text.

In this scenario, for example, calculating the number of stores that sold mice to John or Susan triggers an intermediate calculation of the StoreCount sum by product and by customer, instead of calculating the StoreCount by product.

Setting the Transparent Filter property to true for all models and nodes that reference the Customer column will stop the Customer column from being unnecessarily used in the Group By clause.

This property is required in the following situations:

  • When using stacked views where the lower views have distinct count measures

  • When queries executed on the upper calculation view contain filters on columns that are not projected

Note

The Transparent Filter setting, when defined on an output column of a node (that is, in the right pane of the Mapping tab), can only influence the calculation made in the upper node. In particular, if a filter defined in a lower node of a calculation view should be transparent for another view that consumes it, you must select the Transparent Filter property of all nodes from that node up to the top calculation view node.

Log in to track your progress & complete quizzes