Implement Good Modeling Practices

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Implement good modeling practices

Understand the Instantiation Process

Before we dive into the detail of how to improve calculation view performance, it is helpful to learn a little about the way a calculation view responds to a query that calls it.

The calculation engine pre-optimizes queries before they are worked on by the SQL optimizer.

A calculation view is instantiated at run-time when a query is executed. During the instantiation process, the calculation engine simplifies the calculation view into a model that fulfills the requirements of the query. This results in a reduced model that can be further optimized by the calculation engine. For example, it considers settings such as dynamic join, join cardinality and union node pruning

After this, the SQL optimizer applies further optimizations and determines the query execution plan, for example, it determines the optimal sequence of operations, and also those steps that could be run in parallel.

The instantiation process transforms an original (one that you define) calculation view into an execution model based on a query that is run on top of a calculation view. The generated view is pruned of unnecessary elements from the original calculation view and is technically a column view that references one specific node of the original calculation view.

During the instantiation process, the query and the original calculation model are combined to build the optimized, execution calculation model.

Best Practices for Modeling

In this lesson, we will cover recommendations and best practices for data modeling with SAP HANA Cloud.

General Design Recommendations

Break down large models into individual calculation views so that they are easier to understand and also allow you to define reusable components, thus avoiding redundancy and duplicated maintenance as a consequence.

Try to develop calculation views in layers so that each layer consumes the lower layers. Any changes to the lowers layers immediately impacts the higher layers. The lowest layers should provide the foundation calculation views. The higher layers add increasing semantics to provide more business meaning to the views.

Note

Do not be concerned that breaking up large calculation views into individual building blocks will damage overall performance. At run time, the SQL optimizer attempts to flatten the entire stack of calculation views into a single executable statement. Remember the calculation view definitions, however many there are in the stack, represent a logical view of the data flow. How the optimizer constructs the final SQL query can never be known at the design stage by simply observing the calculation view definitions. Use SQL tools such as SQL Analyzer to check the generated SQL if required,

Aim for Query Unfolding

At run time, your calculation view is analyzed by the SQL processor. A key aim is to convert each calculation view operation into a corresponding SQL operator to convert the complete data flow graph into a single SQL statement. This is called unfolding. A benefit of unfolding to SQL is that only the SQL engine is needed and no other SAP HANA engine is called, such as the Calculation Engine or other SAP HANA Cloud engines such as Spatial or Graph. However, due to your calculation view design, there might be steps that cannot be translated into SQL operators. If these steps cannot be translated, they will be processed in the specialist SAP HANA engines which generate one or more materialized column views. These materialized column view are then read by SQL. So this means, instead of SQL reading directly from source tables, column views have to be first generated and then the SQL has to read from the interim column views. That would result in a partially unfolded query that might not perform as well as a completely unfolded query due to limited SQL optimizations.

Some nodes and operators cannot be unfolded when there is no equivalent SQL operator for the calculation view feature you are implementing.

It should be emphasized that although unfolding is generally desirable, sometimes unfolding can actually damage overall performance. For example, query compilation time can increase when the query has to be unfolded each time the calculation view is called with new selections. Therefore, it is possible to set a SQL hint that prevents unfolding for individual calculation views. This is found under Semantic NodePropertiesAdvanced.

You can set a global parameter to prevent unfolding for all calculation views.

You can check if your calculation view is unfolded by using the Analyze > Explain Plan in the SQL Console, where you can see the list of operators and data sources.

If a column view is used as a data source instead of a table, then you know this is not an unfolded query as the result has been materialized as an interim view and not read directly from a table. A completely unfolded query access only tables.

Also check the query compilation time versus the execution time to see how unfolding is affecting the overall run time.

It is possible to identify how many queries were not unfolded. Run a query on system view M_FEATURE_USAGE as follows:

Code snippet
SELECT
 	FEATURE_NAME,
 	CALL_COUNT 
FROM 
	"M_FEATURE_USAGE" 
WHERE 
	COMPONENT_NAME = 'CALCENGINE'
AND
	FEATURE_NAME 
IN 
  ('QUERY TOTAL','QUERY UNFOLDING')
Copy code

Unblock Filter Push-down

In your calculation view, if you have defined a node that is a source to more than one other node (in other words the data flow splits), then you need to be aware that this can potentially block filter push-down. The SQL optimizer does not want to break the semantics of the query and so by default it keeps the filter at the node at which it is defined (and this node might be high up in the stack causing large data sets to travel up the stack).

You can set the flag Ignore Multiple Outputs for Filter so that the blockage of filter push down is removed and the optimizer is free to push the filter down to the lowest level to get the best performance, but be very careful to check the query semantics are not broken and wrong results are produced.

Note
A similar setting can be applied at the view level in the semantic node View Properties > Advanced. It has the same effect but applies to scenarios where one calculation view is consumed multiple times by another calculation view.

Switch Calculation View Expressions to SQL

Expressions are used in many places within calculation views including calculated columns, restricted measures, and filter expressions.

For SAP HANA on-premise it is possible to write expression using either plain SQL or the native HANA language called column engine language. SAP recommends that you only use SQL and not column engine language to ensure best performance (Column Engine language limits query unfolding)

When you build calculation view expressions on SAP HANA Cloud, you can only choose SQL expression language. Column engine language is not available in SAP HANA Cloud.

Calculation views that were imported to SAP HANA Cloud from SAP HANA on-premise and include column engine language expressions will still run, but you should change the expression language to SQL from column engine, to achieve better performance. Once you select SQL language, the language selector is then grayed out so that you cannot return to Column Engine language.

You should be able to recreate column engine expressions to plain SQL. For example, the concatenate operator + is replaced with the plain SQL operator || or you can use the CASE keyword instead of If...Then.

Use Dedicated Views or Tables for ‘List of Values’

For performance issues, we generally recommend that you create dedicated calculation views or tables to generate the list of help values for variables and input parameters.

This approach enables a faster display of the list of values, without reading the entire view or table on which the calculation view is based.

This is also best practice so that the list of values is reusable and consistent from one calculation view to another, and users see the same selection values.

The list can be driven by simple fixed filters in the help calculation view or even by analytic privileges that offer a dynamic list restricted to what each user is allowed to choose from.

Column Pruning

Make sure you do not map columns that are not needed. This might sound like a fairly obvious recommendation, but it is very common for developers to simply map all columns without considering if and why there are used. They are often added 'just in case' they are needed. later. Too many attributes can lead to very large, granular data sets especially when only aggregated data is needed.

Optimal Filtering and Calculated Columns

Always try to filter as early as possible in the stack. The objective is to prune data as early as possible to avoid large data sets that can harm performance.

Avoid including calculated columns in filter expressions. It’s better to first create a new column based on the calculation, and then add a projection node on top to filter the result set based on the column (which is no longer calculated).

Calculate as high as you can in the stack so that you avoid calculating on granular data and instead calculate on aggregated data.

Optimal Aggregations

Always reduce the data set by introducing aggregations early in the modeling stack.

If you consume a calculation view that produces attributes, do not change them to measures, and vice-versa.

Choose a cube with star join to build dimensional OLAP models. Do not try to create OLPA models using standard join nodes which creates a relational model. These may not perform as well.

Partitioning Tables

Table Partitioning

Data in column store tables is separated into individual columns to support high performance on queries and also for better memory management. But it is also possible to subdivide the rows of column tables into separate blocks of data. We call this table partitioning. If column separation is vertical subdivision, think of table partitioning as horizontal subdivision.

Note
Partitioning only applies to column store tables and not row store tables. This is why it is a recommended to define tables as column store that will be used for querying large data sets.

Generating partitions is usually the responsibility of the SAP HANA Cloud administrator. There are many decisions to be made relating to partitions including the type of partition. For example, hash, round robin or range. Partitions can also include sub-partitions. The administrator uses monitoring tools to observe the performance of partitions and makes adjustments. What the modeler will need to do is to provide the information relating to the use of the data e.g. how queries will access the data, so that the partitions can be defined optimally.

Reasons for Partitioning a Column Store Table

Save progress to your learning plan by logging in or creating an account