Comparing Data Provisioning Options in SAP BW/4HANA

Objectives

After completing this lesson, you will be able to:

  • Compare Data Provisioning Options in SAP BW/4HANA

Introduction

Scenario

You are a consultant working at a large multinational corporation that relies heavily on data analytics to make informed business decisions. The company recently started a project to use SAP BW/4HANA to build an enterprise data warehouse.

How do you explain data provisioning to SAP BW/4HANA to your team? Let's start with terminology related to data provisioning.

Ensure that the team is familiar with the different options available for data provisioning. Then, discuss the role of source systems in SAP BW/4HANA and how they're essential to providing accurate and up-to-date data for informed business decisions.

Terminology in Data Provisioning

Data provisioning is a broad term that refers to the acquisition of data from a source system to a target system. The word acquisition is preferred to data loading, because data can be acquired without the need to physically load it to a target system. In fact, with advances in technology, moving data around an organization is becoming less common. It's often easier to read the data remotely.

There are many reasons why data provisioning is needed, including:

  • Extracting data from business applications and loading it to a central data warehouse.
  • Providing real-time access to data sources for analytics.
  • Distributing data from a central system to regional systems.
  • Consolidating data from multiple systems into a central system.
  • Keeping systems in sync.
  • Migrating data from a legacy system to a new system.

In the simplest data provisioning scenarios, there are only two systems involved: the source and the target. But often there are multiple systems involved. For example, let's say that you want to combine data from multiple source systems into a single target system. It can also be the other way around: a single source system that distributes its data to multiple target systems. Finally, we can have a combination of both: Multiple source systems consolidating data and distributing it to multiple target systems.

First, let's specify the terminology related to data provisioning:

  • Virtualization means that a model uses data that exists already elsewhere without creating new data. Virtual access is realized by reading on demand from the following sources:

    • From an existing table of the same system. For example, through a reference characteristic, a view, or a synonym.

    • From a remote source, for example, through a virtual table (which is then called remote access.)

  • Federation means that data from different sources are combined through virtual access.

  • Data Replication, data load, and data ingestion are different terms to indicate that existing data is copied and maybe transformed. Ingestion means that the data stems from another data source, often from files or other less structured data. This can also be true for replication and data load. New persistent data is created in the following ways:

    • By direct import, meaning manual or automatic data ingestion.

    • By batch load, scheduled once or in regular intervals.

    • By near-time or real-time replication, for example, as streaming (the source delivers changes frequently) or change data capture (monitoring changes by a trigger or log.)

  • Integration means that data from different sources are combined and optionally transformed through replication, often by batch load.
    • ETL means that data is transformed before it is saved (loaded).
    • ELT means that data is transformed after it is saved (loaded); that means that a transformation is performed when data it is accessed and presented. This becomes more prominent when there is enough RAM available, especially with SAP HANA.

Data Persistence and Virtual Data Access in SAP BW/4HANA

The Virtual Data Paradigm

To meet changing business requirements and to respond to changes in technology, data warehouses have changed quite a bit. Traditionally, people wanted to put all of their data into a single data-warehouse-type environment to analyze it there. Besides strategic reporting, operational data analysis scenarios were often implemented in SAP Business Warehouse (SAP BW) due to a lack of ERP (Enterprise Resource Planning) reporting capabilities and performance limitations of ERP systems. The idea was to optimize the data warehouse for analytical processes on large data sets and to integrate data from different sources through batch load jobs. This is a physical data warehouse with the following disadvantages:

  • You must duplicate all your data, so you need more storage size.
  • You must copy all that data through the network, which clogs up and slows your network.
  • You double your cost and effort.
  • The data in the new data warehouse is never up to date.
  • The single data warehouse system itself cannot handle all the different types of data requirements, such as:
    • Unstructured data.
    • Graph-engine data.
    • Key-value pairs.
    • Spatial data.
    • A huge variety of data.

Data quality enhancement steps are resource-consuming. With new technology on the market, the decision about where to perform these steps and the role of SAP BW/4HANA in general must be reevaluated.

A data warehouse is still needed, with a focus on enterprise data warehousing, for example, to harmonize and consolidate data. But harmonization does not necessarily mean that you must store the data persistently. With SAP solutions for enterprise information management (EIM), you can also design a logical data warehouse directly on SAP HANA. This means that it seems to contain all the data, but you keep the data largely in your source systems: don't copy it through the network, and don't store copies of it. The new strategy is to load the data from the various source databases or ingest source files only when required. Combine the loaded bits of data with other data that is stored elsewhere, and report on it.

Another use case is when you plan to get rid of old legacy databases, or consolidate various databases. This should not affect your users and their reporting. In this case, connect to the various back end databases through SAP Vora, and transform data through SAP solutions for EIM, as a logical data warehouse.

SAP BW/4HANA - Architecture.

To integrate external data structures, typically from non-SAP sources, with SAP HANA solutions for EIM, use:

  • SAP HANA smart data access (SDA) with remote access to other databases.
  • SAP HANA smart data integration (SDI) with adapters to integrate various remote sources.
  • SAP HANA smart data quality (SDQ) with library functions for data cleansing and enhancement.

Within this technology, SAP provides a way to access data in real time or to persist the data coming from various sources into SAP BW/4HANA. Data that is stored in SAP HANA tables must not be duplicated. You can provide virtual data access using Open ODS Views and CompositeProviders. Only if complex transformations in SAP BW/4HANA are required, implement batch load or real-time replication to physical DataStore Objects (advanced).

Remember that you can use SAP HANA calculation views and native Structured Query Language (SQL) features to combine data. However, you can also use SAP BW/4HANA objects for federation.

Depending on the circumstances, you can harmonize the data and create one version of truth with:

  • SAP HANA procedures.
  • SAP HANA calculation views.
  • SAP BW/4HANA data transfer processes.
  • SAP BW/4HANA CompositeProviders.

We recommended that you consider virtual options first. The Layered Scalable Architecture for SAP BW/4HANA (LSA++) reflects this philosophy.

Watch this video to see how virtual data access and data persistency are reflected in LSA++.

Note
You can check the learning journey Implementing Data Modeling Scenarios in SAP BW/4HANA to learn more details about LSA++.

To optimize SAP BW/4HANA, you must combine nonpersistent options (virtual access, federation) and persistent options (replication, load, and integration). The following table compares options for virtual access and data persistence on several levels (not only SAP BW/4HANA, but also for the database and source system.)

Virtual Options and Persistent Options

 Virtual Options (View, Remote Access)Persistent Options (Load, Realtime Replication)
Data AccessWhen needed, could be oftenOnce, in any case
Use CaseOperative reporting, rare accessStrategic reporting, trends, history
Objects in SAP HANA

Remote Source (SDA or SDI) + Virtual Table

SAP HANA calculation view

Database (DB) Procedure

SAP HANA Core Data Services View (CDS View)

Replication Task (SDI)

Flowgraph (SDI + SDQ = EIM)

  • generating a Batch task (to be scheduled)
  • or Realtime task (to be executed)
  • or DB-Procedure (to be scheduled)
Objects in SAP S/4HANADB-View, ABAP CDS ViewSum table, Setup table
Objects in SAP BW/4HANA

Open ODS View for table

CompositeProvider for SAP HANA calculation view

Virtual InfoObject (on SAP HANA calculation view)

Streaming process chain + DataStore Object (advanced)

Regularly scheduled process chain

Data CombinationJoin, association (join on demand)Read values during load
Storage SpaceNo additional storageTables filled redundantly in SAP HANA
Administrative EffortNo administration of data load processesRequires administration / monitoring of data load process
Data Read PerformanceMight be slowSpeedy

Log in to track your progress & complete quizzes