Filtering data is often required when analyzing data, to reduce the result set to make it more meaningful to the business user. Typically, you might want to retrieve the sales details for a particular country or region, for a particular range of products, or specific customers (new customers, customers who haven’t ordered any product for more than one year). These three examples relate to filtering based on attributes.
On the other hand, you might want to filter data based on measures. For example, you might want to list only the sales orders for which the total amount exceeds a threshold.
The two approaches can also be combined, for example, when you want to retrieve the list of US customers who have created more than 10 orders during the past month. In this case, filtering the data by country (US) can be done as early as possible in the data flow, but you must compute the total number of orders per customer (for last month) before applying the threshold (10).
So, filtering is sometimes a question of mitigating between performance (filtering as early as possible) and the consistency of the result you get (filtering too early without care can lead to wrong results). This is particularly true when data is aggregated.
Filter Criteria: Hard-coded in Calculation View or Flexible at Run-Time?
A key question when defining filters is whether they should be hard-coded in the calculation view, or provided at runtime to provide more flexibility. It is possible to execute a query with a reporting tool that passes different filter criteria to the calculation view.
A primary goal during modeling is to minimize data transfers between the database and the front end consuming tools. For example, an end user will never need to display a million rows of data. Such a large amount of information just cannot be consumed in a meaningful way.
Whenever possible, data should be aggregated and filtered to a manageable size before it leaves the data layer on its way to the front end reporting tool.