Introducing Data Warehousing, SAP BW/4HANA and the Role of SAP HA

Data Warehouse

A data warehouse is a core part of a business intelligence (BI) solution and its three key roles are as follows:

Acquire, integrate, and manage data from anywhere across an organization
Provide sophisticated data modeling capabilities on the acquired data
Present a single view of modeled data to all analytical applications and reporting solutions

A data warehouse collects data from across the entire enterprise from all source systems and either loads the data to the data warehouse periodically, or accesses data in real time. During the data acquisition, data is cleaned up. This usually means data is thoroughly checked for invalid or missing values. Correct values are generated, or data can even be rejected. Data is usually integrated from multiple source systems across an enterprise, for example, inventory and sales data managed in two separate systems can be integrated to build a complete picture of sales demand and stocking levels.

When data warehouses first appeared in the early 1990s, their role was to relieve the source systems (typically mission-critical systems that run the key business processes) of their reporting duties. As the demands of the analytical users were increasing, the source systems could not cope. The solution was to make a copy of the data in a separate dedicated analytical system: the data warehouse, where reporting could be managed. Data was periodically extracted and stored in the data warehouse; typically this was an overnight activity.

But in recent years, modern data warehouses provide not only those capabilities but also a permanent, remote connection to live data sources, so that data loading and storage to the data warehouse is not always necessary. Customers have to decide whether to extract and load, or to implement a remote connection. Modern data warehouses can connect to data that is located on-premise and also in the cloud.

The decision is usually based around performance requirements. For example, reporting performance can be poor if large amounts of live data need to be accessed in the remote sources, and then complex processing is needed in the data warehouse to clean up and harmonize the data sources before the data is ready to be consumed by the reporting tool. In this case, it might be better to periodically extract the data from the source systems and then process the data in the data warehouse and store the results, so that the data is ready for immediate consumption by the reporting tools. The downside to this is that you now have duplicated data (this is called data redundancy). Also, this approach means that data is no longer live and is valid only at the time of extraction. For some types of data, this is acceptable. But for data that must be up to date, extraction and loading is less desirable, and a live connection is usually implemented.

Once data is acquired and integrated, it is then usually transformed by adding additional business information to add more value to the data. For example, aggregate measures, calculate new values, convert currencies, apply hierarchies to the data. Finally, the transformed data is then exposed to analysis tools that are operated by the business user.

A key objective of a data warehouse is to provide the complete history of the organization’s entire data set that is accurate, and at all levels of granularity, right down to the original transaction (grain). It should be possible to ask any analytical question on the data and trust the answers. Performance should be acceptable. A well-built data warehouse avoids data redundancy by avoiding storing summarizations of data that is already stored in detail. As data volumes have increased, the traditional role of the data warehouse as a mass storage archive has changed in recent years, and now physical data storage is managed across the data warehouse storage layer, but also in other cheaper storage solutions, such as data lakes. But the key role of a data warehouse has not changed and still continues to be responsible for the management of all data, regardless of location, and to provide a single source for all reporting and analytical requirements.

A data warehouse is often referred to as a data mart. But a data mart is simply a data warehouse used to manage only a limited part of an organization's data, such as sales or procurement, and is usually managed locally. The data is often used for a specific use case and not shared. One of the risks of developing data marts is the creation of silos of data that create redundancies and inconsistencies. When a data warehouse manages the entire organization's data, it is referred to as an enterprise data warehouse (EDW).

SAP BW/4HANA

SAP BW/4HANA is a packaged solution that provides a complete tool set to rapidly deploy an enterprise-wide, on-premise data warehouse out of the box.

SAP BW/4HANA provides tools that support the connectivity of any source system, SAP and non-SAP. Data can be extracted, transformed, and loaded to SAP BW/4HANA either periodically – for example during the night – or even in real-time. Many source systems support the loading of only the data that has changed or is new since the last load. This is known as a delta load. One of the most popular sources of data is from the SAP ERP suites such as SAP Business Suite or SAP S/4HANA, where large numbers of ready-made extractors are provided by SAP. This could be considered as prewiring of the SAP sources to SAP BW/4HANA, and that is one of the most appealing aspects of SAP BW/4HANA for customers who already run SAP systems. For non-SAP sources, it is possible to develop extractors using the provided SAP BW/4HANA tools. Many sources support a remote connection to data, so that real-time views of operational data are possible and no loading and storing of data is necessary in SAP BW/4HANA.

SAP BW/4HANA provides tools for the development of dedicated objects that support advanced data modeling. A modeler creates a data flow that guides data all the way from the source systems into special storage objects such as DataStore Objects (advanced). As data travels through the flow, it is possible to define transformations to clean up data, and fill in missing values using lookups and formulas. During the load, data can be checked for quality and bad data can be flagged. Once data is loaded, it can then be modeled using tools that add additional business semantics, such as assigning currencies to measures, or defining hierarchies. The modeling layer is comprised of physical objects that store the data and logical objects that create shapes of data ready to be consumed by the business user.

Note

The picture shows an overview of SAP BW/4HANA objects. This learning journey will explain them later in detail.

In order to consume the data model, a developer creates a Query on top of the logical objects to define the specific layout of the report and any special requirements such as sub-totals, sorting order, extra calculations, and filters.

But a Query is not an end user reporting tool. A Query sits between the data model and the end report. A reporting tool is not provided with SAP BW/4HANA. Customers usually have their own preferences for which reporting tool they would like to deploy. SAP offers cloud and on-premise reporting tools such as SAP Analytics Cloud, SAP Analysis for Microsoft Office, and many more. SAP BW/4HANA also supports non-SAP reporting tools using open standards.

The role of SAP HANA in SAP BW/4HANA

The SAP BW/4HANA application server runs on an SAP HANA database. This offers the following advantages:

Better data access times for all processes, because data can be kept in-memory and in column store tables
Data is processed in SAP HANA rather than in SAP BW/4HANA application server, which offers better data processing times, for example in data loading, data management, planning, and reporting.

SAP HANA is not just a database used to store data. It contains multiple, powerful data processing engines. This means that SAP HANA is able to take on all data processing tasks that would normally be handled by the application running on the database. When data is processed in the database, it means that we do not need to move data from the database to the application layer for processing, and we do not send back the results for storage. No data movement is needed as all processing is done where the data lives.

So what does this mean for SAP BW/4HANA? It means that SAP HANA takes on the data processing tasks related to data loading, activation of data, summarizing data, currency conversion of data, planning function, query processing, and much more. And as SAP HANA is an in-memory database, we can expect very high performance. So not only do we avoid moving the data from the database, we also process data completely in memory.

SAP HANA supports the traditional row table architecture but also supports column store tables. Column store tables are optimal for analytical use cases. Data warehousing, and in particular querying, is the ideal use case for column tables and that is why almost all of the tables in SAP BW/4HANA are column store. The key reason for this is that queries almost never request all of the columns of a table. But when you read a row table, all columns are included in the read, even those that you did not request. When you read column tables, only the requested columns are read. This helps with performance and means that we do not fill memory with data that we did not request.

Hint

If you want to read more about columnar and row-based data storage, you can follow this link: https://help.sap.com/docs/HANA_SERVICE_CF/6a504812672d48ba865f4f4b268a881e/bd2e9b88bb571014b5b7a628fca2a132.html

Another benefit of column tables is that the data in the tables is automatically compressed when it is loaded. This massively reduces the footprint of the table, by as much as 90%. Compression essentially strips out all repetition of column values and a separate store of dictionary and index information is held in order to recode the original relationship between values. The benefit of compression is that the stored values are held in a machine-readable format and the CPU can process them much faster than with variable length alphanumeric strings. Also, a smaller footprint means we can store more data in memory.

In an SAP BW/4HANA landscape, SAP HANA’s main priority is to manage the database tables that belong to SAP BW/4HANA. However, SAP HANA also includes powerful tools to create data models right in the database, called calculation views. SAP BW/4HANA data models can be integrated with SAP HANA calculation views to create hybrid models. This is known as mixed modeling. Mixed models combine the best features of SAP HANA calculation views with SAP BW/4HANA modeling. CompositeProviders in SAP BW/4HANA can be used to combine the models using unions and/or joins.

Note

Mixed modeling scenarios are not in scope for this learning journey. However mixed modeling is covered in the course BW430.

Introducing Data Warehousing, SAP BW/4HANA and the Role of SAP HANA in SAP BW/4HANA

SAP BW/4HANA as a Data Warehouse

Data Warehouse

SAP BW/4HANA

The role of SAP HANA in SAP BW/4HANA