Filtering Data

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Filter data

Using Filter Operations

Filtering data is often required when analyzing data, in order to reduce the result set. Typically, you might want to retrieve the sales details for a particular country or region, for a particular range of products, or specific customers (new customers, customers who haven’t ordered any product for more than one year). These three examples are all about data filtering based on attributes.

On the other hand, you might want to filter data based on measures. For example, you might want to list only the sales order for which the total amount exceeds a threshold.

The two approaches can also be combined, for example, when you want to retrieve the list of US customers who have posted more than 10 orders during the past month. In this case, filtering the data by customer country (US) can be done as early as possible, but you must compute the total number of orders per customer (for last month) before applying the threshold (10).

So, filtering is sometimes a question of mitigating between performance (filtering as early as possible) and the consistency of the result you get (filtering too early without care can lead to wrong results). This is particularly true when data is aggregated.

Filter Criteria: Hard-coded or Flexible?

From a functional perspective, a key question when defining filters is whether they are hard-coded in the calculation view, or provide a more flexible way to define the filter criteria at runtime. In a number of cases, it is possible to execute a query with a client tool on top of a calculation view, including filtering criteria.

A primary goal during modeling is to minimize data transfers between engines. This statement holds true both internally, that is, in the database between engines, and also between SAP HANA Cloud and the end user client application. For example, an end user will never need to display a million rows of data. Such a large amount of information just cannot be consumed in a meaningful way.

Whenever possible, data should be aggregated and filtered to a manageable size before it leaves the data layer.

When deciding which records should be reported upon, a best practice is to think at a set level, not a record level.

A set of data can be aggregated by a region, a date, or some other group, to minimize the amount of data passed between views.

Filters in Calculation View Nodes

In Calculation Views, filters can be applied to many node types. These include the following:

  • Projection

  • Union

  • Join

  • Rank

  • Aggregation

  • Star Join

As a consequence, you generally do not need to use a Projection Node for the sole purpose of filtering data.

To define a filter, you display the Filter Expression tab and define a valid SQL or Column Engine expression. It is recommended to use SQL expressions whenever possible, because they most often allow better optimization.

An example of filter expression is "COUNTRY"='US'. The expression can combine several columns and logical operators; for example, "COUNTRY"='US' OR "COUNTRY"='CA'.

Caution

When a column from a data source, for example COUNTRY, is mapped to the node output with a different name, for example, COUNTRY_CODE, the output column name must be used in the expression. In this case, the correct expression would be something like "COUNTRY_CODE"='US'. If you enter the expression "COUNTRY"='US', the expression might validate but building the calculation view will fail.

The name of a column that is used in a filter expression cannot be changed. This is to avoid creating inconsistencies. If you need to change a column name , first remove the filter expression, and set the (adapted) filter expression back after the name has been changed.

Where and When is the Filter Applied?

The behavior of filter expressions depends on the node where the filter is defined.

  • Filters defined in the top node

    In the top node, a filter expression is generally applied before the data processing defined in this node. For example, in an Aggregation or Star Join node, the filter is applied to the data before the aggregation is executed.

  • Filters defined in other nodes
  • In other nodes (NON top nodes), the filter expression is generally applied to the output of the node.

In some cases, a filter expression can be defined in one or another node of the calculation scenario without impacting its output. In other cases, however, particularly when defining filters on measures, the position of the filter expression in the calculation scenario can affect the result set.

Filtering the Data Set During Data Preview

When you query a Calculation View, all the filters are applied. Some of them can be pushed down to lower levels by the optimizer; in that case, the optimizer tries to ensure data consistency. There are additional ways to further restrict the result set at runtime:

  • Define filters graphically, with the Add Filter button in the ribbon of the Raw Data tab

    When applying the filters, the query against the entire data set is re-executed

    In the corresponding query, you can, in turn, edit some filters or add new ones (see below)

  • Edit the default Data Preview SQL Query to add filters and execute the modified query.

    A filter on an attribute is written as a WHERE clause. A filter on a measure is written as a HAVING clause (filtering based on an aggregate column) when the query include a GROUP BY clause; otherwise as a WHERE clause.

  • An additional filtering capability is offered in the Raw Data tab, in the dropdown list at the right of each column header.

    This is a text filter (matching the entered text string against the column's values) and it applies to the current data set: the query is not re-executed against the source calculation view to find all matching rows. So, this filtering functionality is more suited for activities such as basic testing.

Ignore Multiple Output for Filters

Nodes that are consumed by multiple nodes in a calculation view prevent filter push-down from consuming nodes. The purpose of this is to ensure semantic correctness.

However, if push-down does not violate semantic correctness, it most often makes the calculation view execution time shorter.

So, the flag Ignore Multiple Outputs For Filter can be set to enforce filter push-down. Its impact is local to the node where it is set.

Note
A similar setting can be defined at the View Level, in the Advanced Properties tab of the Semantics node. It has the same effect, but applies to scenarios where one calculation view is consumed multiple times by another.

Using Variables and Input Parameters for filtering purposes

Variables are also typically used in calculation views to filter data. They are a powerful filtering method, because you can define the filter condition at runtime; that is, when executing the view.

However, compared with filter expressions, variables allow you to filter a data set only by attributes, not by measures. To implement filtering on measures columns at runtime, you will need to use an input parameter and use the parameter value in a filter expression.

You will learn about Variables and Input Parameters later, in a dedicated lesson, Using Variables and Input Parameters.

Define and Use Filters

Client-Dependent Views

The CLIENT (sometimes also called SAP Client to avoid any confusion with the concept of Customer) is a general concept in SAP Systems such as SAP Business Suite and SAP Business Warehouse. The main purpose is to isolate different types of data (for example, development versus test data) based on a specific column (generally named CLIENT or MANDT, which stands for Mandant (a German word for Client). The values in this column are three-digit numbers, such as 001, 200, 800.

Almost all tables in the SAP Business Suite and SAP S/4HANA data models are client-dependent. Only a small number of tables do not have the client as the first primary key field.

SAP HANA Cloud handles the SAP Client column to enable client filtering when you work on SAP Applications such as the SAP ERP.

In the SAP Business Application Studio for SAP HANA, you have to explicitly define, in the properties of a source table, which column holds the SAP Client information.

Whenever you create SAP HANA models on SAP data sources that include a Client column, we recommend that you always define joins on this column. Otherwise, your view might return inconsistent results if there is data for several clients. For example, you might join data for clients 200 and 800 (cartesian product), which in general would not make sense.

You should also pay attention to aggregations when the SAP Client is not part of the requested columns, because this might result in an irrelevant aggregation.

Defining Client-Specific Filtering in Calculation Views

Client-specific filtering makes sense only when some (or all) of the source tables have an SAP Client column.

Watch this video to learn about creating a Client-Dependent View.

By default, a calculation view defined in the SAP Business Application Studio returns all the values, regardless of the SAP Client value. To implement client–based filtering, you proceed as follows:

  • Define, for all relevant data source tables, the SAP client column.

    This is done in the Mapping tab of the node where the source table is referenced, with the Client Column property.

  • Define how you want to retrieve data.

    This is done in the Semantics node of the Calculation View. You choose one of the three following options for the Default Client setting:

    • Cross-Client: All the data is retrieved regardless of the client number.

    • Fixed Client: You specify an SAP Client number (for example, 200) and the data sources are automatically filtered to include only the rows with this client number.

    • Session Client: The source tables are filtered dynamically based on a client value that is specified for each user in the USERS table of SAP HANA Cloud.

      The Session Client is the default setting applied to a new view.

Let’s consider a simple table that contains data for two different clients, 200 and 800. This information is stored in a column named MANDT. You create and build a calculation view based on this table, and then query the data with your user. The following table shows what data is retrieved depending on both the default client setting for the calculation view and whether or not the MANDT column has been defined as the (SAP) Client Column in the view properties.

Default Client Setting Effect

Default Client SettingEffect when NO Client Column is specifiedEffect when MANDT is defined as the Client Column
Cross-ClientAll rowsAll rows
Fixed Client: 200All rowsOnly rows for which MANDT=200
Session Client (use case 1: the user has no default client assigned)All rowsNo rows at all
Session Client (use case 2: the user has default client 800)All rowsOnly rows for which MANDT=800

Important Note Regarding the Technical User in the SAP Business Application Studio

In the SAP Business Application Studio, you connect with a regular user, but most of the tasks you perform with defining a calculation view and previewing its data are executed on your behalf by a technical user.

Because the technical user has no default client assigned, if you specify a Column Client property for a calculation view and set the Default Client to Session Client, the data preview in the Developer perspective will not retrieve any data.

There are two alternative approaches that can be used to bypass the technical user. These are as follows:

  • You can query the data from an external tool with your regular user (for example, STUDENT##)

  • You can display the column view data in the Database Explorer perspective from a connection to the classic database (for example, in the training environment, H00 DB). A query from the classic database connection does not involve the technical user.

    This can be done either by opening the column view or by executing an SQL query on top of it.

Session-Dependent Functions

To dynamically select table data in calculation views, you can use several functions. These functions return client or locale information for the connected user, and are as follows:

  • SESSION_CONTEXT('CLIENT')

    Returns the client's value based on the current user profile.

  • SESSION_CONTEXT('LOCALE')

    Returns the session's language in POSIX format (set by 'locale' parameter of JDBC/ODBC/ODBO connection, for example, en_en, de_de, de_at).

  • SESSION_CONTEXT('LOCALE_SAP')

    Returns the session's language following the SAP internal format (like the SPRAS column in the text tables of Master Data).

Example

SELECT field1, field2 FROM my_table WHERE mandt = SESSION_CONTEXT('CLIENT') AND spras = SESSION_CONTEXT('LOCALE_SAP')

It is also possible to refer to the system variable $$client$$ in a filter expression of a Calculation View. This variable picks up the client value stored in the user ID.

Define Filtering on SAP Client

Save progress to your learning plan by logging in or creating an account