In this lesson, we will cover recommendations and best practices for data modeling with SAP HANA Cloud.
General Design Recommendations
Break down large models into individual calculation views so that they are easier to understand and also allow you to define reusable components, thus avoiding redundancy and duplicated maintenance as a consequence.
Try to develop calculation views in layers so that each layer consumes the lower layers. Any changes to the lowers layers immediately impacts the higher layers. The lowest layers should provide the foundation calculation views. The higher layers add increasing semantics to provide more business meaning to the views.
Do not be concerned that breaking up large calculation views into individual building blocks will damage overall performance. At run time, the SQL compiler attempts to flatten the entire stack of calculation views into a single executable model. Remember the calculation view definitions, however many there are in the stack, are simply a logical view of the data flow and provide the essential logic. How the compiler puts together the final query execution can never be known at the design stage by simply observing the calculation view definition.
Avoid mixing CDS views with calculation views. It is tempting to reuse CDS views as data sources in your calculation views that might provide you with a fast path to accessing data, such as master data. However, this is not recommended if you are looking for the very for best performance.
It might seem pretty obvious but you must thoroughly test your calculation views on data sets that are realistic and not just with a few sample records. This applies not just to data volume but also to the variety of data values. Make sure you include unexpected values to ensure they are handled as you expect. For example, null values, or a hierarchy that would produce orphan nodes.
Aim for Query Unfolding
At run time, your calculation view is analyzed by the SQL processor. A key aim is to convert each calculation view operation into a corresponding SQL operator to convert the complete data flow graph into a single SQL statement. This is called unfolding. A benefit of unfolding to SQL is that only the SQL engine is needed and no other SAP HANA engine is called, such as the Calculation Engine or other SAP HANA Cloud engines such as Spatial or Graph. However, due to your calculation view design, there might be steps that cannot be translated into SQL operators. If these steps cannot be translated, they will be processed in the specialist SAP HANA engines which generate one or more materialized column views. These materialized column view are then read by SQL. So this means, instead of SQL reading directly from source tables, column views have to be first generated and then the SQL has to read from the interim column views. That would result in a partially unfolded query that might not perform as well as a completely unfolded query due to limited SQL optimizations.
Some nodes and operators cannot be unfolded when there is no equivalent SQL operator for the calculation view feature you are implementing.
It should be emphasized that although unfolding is generally desirable, sometimes unfolding can actually damage overall performance. For example, query compilation time can increase when the query has to be unfolded each time the calculation view is called with new selections. Therefore, it is possible to set a SQL hint that prevents unfolding for individual calculation views. This is found under Semantic Node → Properties → Advanced.
You can set a global parameter to prevent unfolding for all calculation views.
You can check if your calculation view is unfolded by using the SQL Analyzer where you can see the list of operators and data sources. If a column view is used as a data source instead of a table, then you know this is not an unfolded query as the result has been materialized as an interim view and not read directly from a table. A completely unfolded query access only tables.
Also check the query compilation time versus the execution time to see how unfolding is affecting the overall run time.
It is possible to identify how many queries were not unfolded. Run a query on system view M_FEATURE_USAGE as follows:
SELECT FEATURE_NAME, CALL_COUNT FROM "M_FEATURE_USAGE" WHERE COMPONENT_NAME = 'CALCENGINE' AND FEATURE_NAME IN ('QUERY TOTAL','QUERY UNFOLDING')
Set Join Cardinalities
When you define a join, you can optionally set the cardinality to describe the relationship between the two data sources, for example, 1:1 ,1:m, n:m.
We recommend that you always set join cardinalities so the optimizer can decide if joins can be omitted at run time based on the query that is being sent to the calculation view.
If tables are used as the data sources, and not calculation views, then you can use the Propose Cardinality option.
One of the main benefits of having the correct cardinality set is that the SQL optimizer can prune entire data source from a join if certain prerequisites are met.
Referential Join Type
Consider using the join type Referential if your joins do not have to ensure integrity of the data, and are defined to provide access to additional columns where you already trust the data integrity (e.g. the sales item will always have a header).
It might be possible for the SQL optimizer to prune the data source from the join if it is not needed, if certain conditions are met.
Unblock Filter Push-down
In your calculation view, if you have defined a node that is a source to more than one other node (in other words the data flow splits), then you need to be aware that this can potentially block filter push-down. The SQL optimizer does not want to break the semantics of the query and so by default it keeps the filter at the node at which it is defined (and this node might be high up in the stack causing large data sets to travel up the stack).
You can set the flag Ignore Multiple Outputs for Filter so that the blockage of filter push down is removed and the optimizer is free to push the filter down to the lowest level to get the best performance, but be very careful to check the query semantics are not broken and wrong results are produced.
General Join Recommendations
Joins are a source of potential performance improvements. Make sure you consider all the recommendations when defining joins to ensure you get the best performance from your calculation views.
Switch Calculation View Expressions to SQL
Expressions are used in many places within calculation views including calculated columns, restricted measures, and filter expressions.
For SAP HANA on-premise it is possible to write expression using either plain SQL or the native HANA language called column engine language. SAP recommends that you only use SQL and not column engine language to ensure best performance (Column Engine language limits query unfolding)
When you build expressions using SAP HANA Cloud calculation views, you can only choose SQL expression language. Column engine language is not available.
Calculation views that were imported to SAP HANA Cloud from SAP HANA on-premise and include column engine language will still run, but you should change the language to SQL to achieve better performance. Once you select SQL language, the language selector is then grayed out so that you cannot return to Column Engine language.
You should be able to recreate column engine expressions to plain SQL. For example, the concatenate operator + is replaced with the plain SQL operator ||.
Use Dedicated Views or Tables for ‘List of Values’
For performance issues, we generally recommend that you create dedicated calculation views or tables to generate the list of help values for variables and input parameters.
This approach enables a faster display of the list of values, without reading the entire view or table on which the calculation view is based.
This is also best practice so that the list of values is reusable and consistent from one calculation view to another, and users see the same selection values.
The list can be driven by simple fixed filters in the help calculation view or even by analytic privileges that offer a dynamic list restricted to what each user is allowed to choose from.
Make sure you do not map columns that are not needed. This might sound like a fairly obvious recommendation, but it is very common for developers to simply map all columns without considering if and why there are used. They are often added 'just in case' they are needed. later. Too many attributes can lead to very large, granular data sets especially when only aggregated data is needed.
Optimal Filtering and Calculated Columns
Always try to filter as early as possible in the stack. The objective is to prune data as early as possible to avoid large data sets that can harm performance.
Avoid including calculated columns in filter expressions. It’s better to first create a new column based on the calculation, and then add a projection node on top to filter the result set based on the column (which is no longer calculated).
Calculate as high as you can in the stack so that you avoid calculating on granular data and instead calculate on aggregated data.
Always reduce the data set by introducing aggregations early in the modeling stack.
If you consume a calculation view that produces attributes, do not change them to measures, and vice-versa.
Choose a cube with star join to build dimensional OLAP models. Do not try to create OLPA models using standard join nodes which creates a relational model. These may not perform as well.