Implementing Rank Nodes

The purpose of the Rank node is to enable the selection, within a data set, of the top or bottom 1, 2, ... n values for a defined measure, and to output these measures together with the corresponding attributes and, if needed, other measures.

For example, with a Rank node, you can easily build a subset of data that gives you the five products that generated the biggest revenue (considering the measure GROSS_AMOUNT) in each country. The country, in this example, defines a Logical Partition of the source data set.

Note

The Rank node itself does not perform any type of aggregation on the source data set (this important topic is discussed later on in this lesson).

Main Settings of a Rank Node

The main settings of a Rank node are as follows:

Main Settings of a Rank Node

Setting	Purpose
Aggregation Function	Define the key computation executed on the data set (see below)
Result Set Direction	Decide whether to extract the Top or the Bottom (Down) rows from the ordered data set
Result Set Type	Absolute or Percentage
Target Value and Offset	Define the number of rows to return
Generate Rank Column	Indicate if you want to output a rank column and specify its name
Logical Partition	Partition the source data set by one or several columns, before executing the rank computation
Dynamic Partition Elements	Define whether the partition can be adjusted automatically based on the columns that are selected by an upper node or an upper view/query that you execute on top of the current one
Sort Column	The columns that are used to order the data set to execute the ranking

Partitioning the Source Data Set

The source data set can be partitioned by one or several columns. This means that the extraction rule you define - for example, return the top five total sales amount - will be executed in each partition, for example, for each Country and each Year.

If you choose the Dynamic Partition Element, the columns listed in the Partition will be ignored if they are not requested by an upper node or top query. To follow the same example, you could return the top five total sales for each Country only, for each year only, just by excluding the column you do not need from your top query, and without redesigning your calculation view.

Choosing an Aggregation Functions

After partitioning the source data set by one or several columns and ordering it, an Aggregation Function is applied to the data set.

The Rank node offers four Aggregation Functions which can be classified into two categories:

Three functions computing the row numbers. These are as follows:
- Row
- Rank
- Dense Rank
One function computing the values of the sorted column, for example, a Sales amount.
This is Sum.

Table with sample data and different possible outputs depending on the applied aggregation function.

The figure, Aggregation Function, shows the behavior of the different aggregation functions on the same data set. Here is the description of each of them:

Aggregation Functions

Aggregation Function	Description
Row	Standard row numbering (ties have different ranks)
Rank	Olympic ranking (ties have the same rank)
Dense Rank	Enumerates every rank value (no gap in numbering sequence)
Sum	Adds up values of the Order By column

Row, Rank, and Dense Rank only differ in the way they deal with tie values (identical values in the sorted column). The Sum aggregation function generates a cumulative sum of the sorted column up to the current row.

Using Multiple Sort Columns

In scenarios where more than one sort column are defined, they are treated in sequence of appearance for the aggregation functions Row, Rank, and Dense Rank. This can be useful to better handle identical values in the first sort column.

The Sum aggregation function only uses the first Sort Column.

Generating the Result Set

The final stage is to extract the result set. This is done based on the Target Value setting.

Tables of sample data to illustrate the target value. Target value 2 always returns 2 rows if the aggregation function is Row, but may return more than 2 rows if the aggregation function is Rank. For the aggregation function Sum, the target value acts as a limit for the sum value.

The setting must be a rank value - for example, extract the data up to the rank 2), except for the Sum aggregation function where you set a cumulative value (for example, extract the data up to a cumulative Sales amount of 200).

The target value can be fixed, that is, defined in the Rank Node definition, or it can be set at runtime by means of an input parameter.

Note

The rank node returns the rows for which the computed ranking is lower or equal to the target value.

SAP HANA Cloud QRC 2/2024 introduces an additional capability: you can now set the Target Value setting to All Values. The rank node then sorts the value without eliminating any of them from the result set.

Working with Percentage Instead of Absolute Values

Instead of Absolute, which has been used up to now in our examples, you can set the Result Set Type to Percentage.

The figure, Percentage Result Type, shows the difference between the two methods.

Tables with sample data for Result Set Type Percentage for all possible aggregation types. Percentage value is calculated as absolute value of the current row divided by the maximum absolute value. The target value should be lower than 1, for example 0.5. The aggregation function may affect the number of rows returned.

With the Percentage Result Set Type, you can address requirements such as the following:

Return the 50% best-selling products for each customer (for example, with a Row aggregation function)
Return the best-selling products representing 30% of the total sales for each customer
(with a Sum aggregation function).

The figure, Percentage Result Type, shows in a frame the rows that would be included in the extracted data set.

Defining an Offset on the Result Set

It is possible to exclude a number of elements from the top of the sorted data set by defining an offset.

Tables with sample data for the offset property for various aggregation types. Target value 2 and offset 2 means that up to 2 further rows with rank higher than 2 are returned.

In this example, the rows ranked up to (and including) 2 are excluded. Then the rank node returns the following rows based on the target value you have set.

Another example is when you want to assign members of a statistic collection to their inter-quartile interval. You could define a rank node as follows:

Aggregation Function: Row
Result Set Type: Percentage
Target Value: 0.5
Offset: 0.25

Is an Aggregation Needed Before Ranking?

As already discussed, the Rank node can partition the source data, but it does NOT perform any aggregation on the source data.

In other words, the way the source data is structured has a major impact on the way the Rank node will compute this data.

For example, if your source data contains a Sales Order ID column, you might not process directly a ranking of best-selling products in each country, because there might be several rows for the same Country and Product (but different Sales Orders).

The effect of aggregation before ranking. This example shows a source table with country (used for logical partitioning), product, order id, and sales amount. Option 1: Do not aggregate, returns the products and amounts of the best individual orders per country. Option 2, Aggregate data by Country and Product before ranking, returns the products per country that sell best overall.

When the source data for the Rank node is a table, you must make sure that the data structure suits your modeling needs. If not, you might need to first aggregate data by adding an Aggregation node used as a data source by the Rank node.

When the source data for the Rank node is a CUBE calculation view, the aggregation defined in that calculation view is implicitly triggered based on the columns requested by the rank node, but a column that is mapped to the output of the Rank node but not consumed by the upper node will not be requested to the data source.

Caution

As is the case with other types of nodes, some columns are passed to the upper nodes even when they are not requested. For example, this is the case when the source calculation view has a sort order defined in the Semantics node (all the columns used for sorting are passed to consuming views). This is also the case when special settings such as Keep Flag are used.

Finally, keep in mind that further processing after a rank node (in the calculation view containing the rank node, or one that consumes it) might change the order of rows in your result set. To avoid this, in some scenarios, you might need to do one of the following:

Use an ORDER BY clause on the rank column in the query to keep the result set ordered
Use the Sort Result Set option in the semantics of the calculation view.

Assigning a Type to the Rank Column

Depending on how you want to use the ranking information, you can decide to assign to the rank column the type Attribute or Measure in the Semantics node.

Rank Column: Measure or Attribute?

Attribute
Assigning the type, Attribute, is a simple approach.
It is probably a bit less error-prone because you are not tempted to perform an irrelevant aggregation of ranking positions.
Measure
With the type, Measure, the rank column can provide some flexibility.
For example, if you set the default aggregate function to MAX, you can retrieve summarized data, such as the total sales generated by the five biggest orders in each country, while keeping the information about how many orders are actually totaled in each country. Indeed, there might be countries that have received less than five orders over the considered period, and this information could be of interest when analyzing the data.

Because the information in the rank column can differ a lot depending on the aggregation function and result set type you used, you might want to give it an explicit name if you find that Rank_Column (default column name) is too generic.

Implementing Rank Nodes

Rank Node

Main Settings of a Rank Node

Main Settings of a Rank Node

Partitioning the Source Data Set

Choosing an Aggregation Functions

Aggregation Functions

Using Multiple Sort Columns

Generating the Result Set

Working with Percentage Instead of Absolute Values

Defining an Offset on the Result Set

Is an Aggregation Needed Before Ranking?

Assigning a Type to the Rank Column

Rank Column: Measure or Attribute?

Use Rank Nodes