Filtering data is often required when analyzing data, in order to reduce the result set. Typically, you might want to retrieve the sales details for a particular country or region, for a particular range of products, or specific customers (new customers, customers who haven’t ordered any product for more than one year). These three examples are all about data filtering based on attributes.
On the other hand, you might want to filter data based on measures. For example, you might want to list only the sales order for which the total amount exceeds a threshold.
The two approaches can also be combined, for example, when you want to retrieve the list of US customers who have posted more than 10 orders during the past month. In this case, filtering the data by customer country (US) can be done as early as possible, but you must compute the total number of orders per customer (for last month) before applying the threshold (10).
So, filtering is sometimes a question of mitigating between performance (filtering as early as possible) and the consistency of the result you get (filtering too early without care can lead to wrong results). This is particularly true when data is aggregated.
Filter Criteria: Hard-coded or Flexible?
From a functional perspective, a key question when defining filters is whether they are hard-coded in the calculation view, or provide a more flexible way to define the filter criteria at runtime. In a number of cases, it is possible to execute a query with a client tool on top of a calculation view, including filtering criteria.
A primary goal during modeling is to minimize data transfers between engines. This statement holds true both internally, that is, in the database between engines, and also between SAP HANA Cloud and the end user client application. For example, an end user will never need to display a million rows of data. Such a large amount of information just cannot be consumed in a meaningful way.
Whenever possible, data should be aggregated and filtered to a manageable size before it leaves the data layer.
When deciding which records should be reported upon, a best practice is to think at a set level, not a record level.
A set of data can be aggregated by a region, a date, or some other group, to minimize the amount of data passed between views.