Filtering Data

Objective

After completing this lesson, you will be able to Implement a filter to restrict data.

Using Filter Operations

Filtering data is frequently required when analyzing data to reduce the result set to make it more meaningful to the business user. Typically, you might want to retrieve the sales details for a particular country or region, for a particular range of products, or specific customers (new customers, customers who have not ordered any product for more than one year). These three examples relate to filtering based on attributes.

On the other hand, you might want to filter data based on measures. For example, you might want to list only the sales orders for which the total amount exceeds a threshold.

The two approaches can also be combined, for example, when you want to retrieve the list of US customers who have created more than 10 orders during the past month. In this case, filtering the data by country (US) can be done as early as possible in the data flow, but you must compute the total number of orders per customer (for last month) before applying the threshold (10).

Therefore, filtering is sometimes a question of mitigating between performance (filtering as early as possible) and the consistency of the result you get (filtering too early without care can lead to wrong results). This is particularly true when data is aggregated.

Comparing different options to restrict data. A fixed filter in a calculation view defines the restriction value at design time. The flexible prompt enables users to choose a value for the restriction at runtime.

Filter Criteria: Hard-coded in Calculation View or Flexible at Run-Time?

A key question when defining filters is whether they should be hard-coded in the calculation view, or provided at runtime to provide more flexibility. It is possible to execute a query with a reporting tool that passes different filter criteria to the calculation view.

A primary goal during modeling is to minimize data transfers between the database and the front end consuming tools. For example, an end user will never need to display a million rows of data. Such a large amount of information just cannot be consumed in a meaningful way.

Whenever possible, data should be aggregated and filtered to a manageable size before it leaves the data layer on its way to the front-end reporting tool.

Filters in Calculation View Nodes

In Calculation Views, filters can be applied to many node types. These include the following:

  • Projection

  • Union

  • Join

  • Rank

  • Aggregation

  • Star Join

To define a filter, you display the Filter Expression tab and define a valid expression using the SQL language.

An example of filter expression is "COUNTRY"='US'. The expression can combine several columns and logical operators, for example, "COUNTRY"='US' OR "COUNTRY"='CA'.

Caution

When a column from a data source (for example COUNTRY), is mapped to the node output with a different name (for example, COUNTRY_CODE), the output column name must be used in the expression. In this case, the correct expression would be something like "COUNTRY_CODE"='US'. If you enter the expression "COUNTRY"='US', the expression might validate but building the calculation view will fail.

The name of a column that is used in a filter expression cannot be changed. This is to avoid creating inconsistencies. If you need to change a column name, first remove the filter expression, and set the (adapted) filter expression back after the name has been changed.

Where and When is the Filter Applied?

The behavior of filter expressions depends on the node where the filter is defined.

  • Filters defined in the top node

    In the top node, a filter expression is generally applied before the data processing defined in this node. For example, in an Aggregation or Star Join node, the filter is applied to the data before the aggregation is executed.

  • Filters defined in other nodes
  • In other nodes (NON top nodes), the filter expression is generally applied to the output of the node.

Define and Use Filters

Filter Pushdown

Unblock Filter Push-down

By default, nodes that are consumed by multiple nodes prevent filter push-down from the consuming nodes to ensure semantic correctness.

If push-down does not violate semantic correctness, flag "Ignore Multiple Outputs For Filter" can be set to enforce filter push-down.

Screenshot of a calculation view scenario with one source projection then two branches and a union node on top. the Ignore multiple Outputs for Filter property. is selected on the source projection.

In your calculation view, if you have defined a node that is a source to more than one other node (in other words the data flow splits), then you need to be aware that this can potentially block filter push-down. The SQL optimizer does not want to break the semantics of the query and so by default it keeps the filter at the node at which it is defined (and this node might be high up in the stack causing large data sets to travel up the stack).

You can set the flag Ignore Multiple Outputs for Filter so that the blockage of filter push down is removed and the optimizer is free to push the filter down to the lowest level to get the best performance, but be very careful to check that the query semantics are not broken and wrong results are produced.

Note

A similar setting can be applied at the view level in the semantic node View Properties > Advanced. It has the same effect but applies to scenarios where one calculation view is consumed multiple times by another calculation view.

Client-Dependent Views

The CLIENT (sometimes also called SAP Client to avoid any confusion with the concept of customer) is a general concept in SAP Systems such as SAP S/4HANA, SAP BW/4HANA, and their predecessors. The main purpose is to isolate different types of data (for example, development versus test data) based on a specific column (generally named CLIENT or MANDT, which stands for Mandant (a German word for Client). The values in this column are three-digit numbers, such as 001, 200, 800.

Almost all tables in SAP applications are client-dependent. When you select data from an SAP application table, it usually only makes sense to request data for one client number. Mixing client data makes no sense.

The good news is that a calculation view can use the SAP client column to enable automatic client filtering when you work on data from SAP applications. You do not need to define an explicit filter on the client column each time you create a calculation view.

Whenever you create calculation views on SAP data sources that include a client column, we recommend that you always define joins on this column. Otherwise, your view might return inconsistent results if there is data for several clients. For example, you might join data for clients 200 and 800 (Cartesian product), which in general would not make sense.

Defining Client-Specific Filtering in Calculation Views

Client-specific filtering makes sense only when some (or all) of the source tables have an SAP Client column.

How to Define a Client Dependant View:

  1. Define the Client Column in the properties of the table.
  2. Set the Default Client property to Session Client.
  3. In the User Management application of your SAP HANA Cockpit (or using a SQL statement), assign a Session Client to the user.

When the user runs a query based on the displayed model, the user’s Session Client (in this example: 800) is used to filter the source data.

Watch this video to learn about creating a Client-Dependent View.

To implement client-based filtering, you proceed as follows:

  • For all relevant data source tables, define the SAP client column.

    This is done in the Mapping tab of the node where the source table is referenced, with the Client Column property.

  • Define how you want to retrieve data.

    This is done in the Semantics node of the Calculation View. You choose one of the three following options for the Default Client setting:

    • Cross-Client: All the data is retrieved regardless of the client number.

    • Fixed Client: You specify an SAP Client number (for example, 200) and the data sources are automatically filtered to include only the rows with this client number.

    • Session Client: The source tables are filtered dynamically based on a client value that is specified for each user in the USERS table of SAP HANA Cloud.

      The Session Client is the default setting applied to a new view.

Let’s consider a simple table that contains data for two different clients, 200 and 800. This information is stored in a column named MANDT. You create and build a calculation view based on this table, and then query the data with your user. The following table shows you the data that is retrieved depending on both the default client setting for the calculation view and whether or not the MANDT column has been defined as the (SAP) Client Column in the view properties.

Default Client Setting Effect

Default Client SettingEffect when NO Client Column is specifiedEffect when MANDT is defined as the Client Column
Cross-ClientAll rowsAll rows
Fixed Client: 200All rowsOnly rows for which MANDT=200
Session Client (use case 1: the user has no default client assigned)All rowsNo rows at all
Session Client (use case 2: the user has default client 800)All rowsOnly rows for which MANDT=800

Important Note Regarding the Technical User in SAP Business Application Studio

In the SAP Business Application Studio, you connect with a regular user, but most of the tasks you perform when defining a calculation view and previewing its data are executed on your behalf by a technical user.

As the technical user has no default client assigned, if you specify a Column Client property for a calculation view and set the Default Client to Session Client, the data preview in the Developer perspective will not retrieve any data.

There are two alternative approaches that can be used to bypass the technical user. These are as follows:

  • You can query the data from an external tool with your regular user (for example, STUDENT##).

  • You can display the column view data in the Database Explorer perspective from a connection to the classic database (for example, in the training environment, H00 DB). A query from the classic database connection does not involve the technical user.

    This can be done either by opening the column view or by executing an SQL query on top of it.

Define Filtering on SAP Client

Log in to track your progress & complete quizzes