The Virtual Data Paradigm
To meet changing business requirements and to respond to changes in technology, data warehouses have changed quite a bit. Traditionally, people wanted to put all of their data into a single data-warehouse-type environment to analyze it there. Besides strategic reporting, operational data analysis scenarios were often implemented in SAP Business Warehouse (SAP BW) due to a lack of ERP (Enterprise Resource Planning) reporting capabilities and performance limitations of ERP systems. The idea was to optimize the data warehouse for analytical processes on large data sets and to integrate data from different sources through batch load jobs. This is a physical data warehouse with the following disadvantages:
- You must duplicate all your data, so you need more storage size.
- You must copy all that data through the network, which clogs up and slows your network.
- You double your cost and effort.
- The data in the new data warehouse is never up to date.
- The single data warehouse system itself cannot handle all the different types of data requirements, such as:
- Unstructured data.
- Graph-engine data.
- Key-value pairs.
- Spatial data.
- A huge variety of data.
Data quality enhancement steps are resource-consuming. With new technology on the market, the decision about where to perform these steps and the role of SAP BW/4HANA in general must be reevaluated.
A data warehouse is still needed, with a focus on enterprise data warehousing, for example, to harmonize and consolidate data. But harmonization does not necessarily mean that you must store the data persistently. With SAP solutions for enterprise information management (EIM), you can also design a logical data warehouse directly on SAP HANA. This means that it seems to contain all the data, but you keep the data largely in your source systems: don't copy it through the network, and don't store copies of it. The new strategy is to load the data from the various source databases or ingest source files only when required. Combine the loaded bits of data with other data that is stored elsewhere, and report on it.
Another use case is when you plan to get rid of old legacy databases, or consolidate various databases. This should not affect your users and their reporting. In this case, connect to the various back end databases through SAP Vora, and transform data through SAP solutions for EIM, as a logical data warehouse.
To integrate external data structures, typically from non-SAP sources, with SAP HANA solutions for EIM, use:
- SAP HANA smart data access (SDA) with remote access to other databases.
- SAP HANA smart data integration (SDI) with adapters to integrate various remote sources.
- SAP HANA smart data quality (SDQ) with library functions for data cleansing and enhancement.
Within this technology, SAP provides a way to access data in real time or to persist the data coming from various sources into SAP BW/4HANA. Data that is stored in SAP HANA tables must not be duplicated. You can provide virtual data access using Open ODS Views and CompositeProviders. Only if complex transformations in SAP BW/4HANA are required, implement batch load or real-time replication to physical DataStore Objects (advanced).
Remember that you can use SAP HANA calculation views and native Structured Query Language (SQL) features to combine data. However, you can also use SAP BW/4HANA objects for federation.
Depending on the circumstances, you can harmonize the data and create one version of truth with:
- SAP HANA procedures.
- SAP HANA calculation views.
- SAP BW/4HANA data transfer processes.
- SAP BW/4HANA CompositeProviders.
We recommended that you consider virtual options first. The Layered Scalable Architecture for SAP BW/4HANA (LSA++) reflects this philosophy.
Watch this video to see how virtual data access and data persistency are reflected in LSA++.
To optimize SAP BW/4HANA, you must combine nonpersistent options (virtual access, federation) and persistent options (replication, load, and integration). The following table compares options for virtual access and data persistence on several levels (not only SAP BW/4HANA, but also for the database and source system.)
Virtual Options and Persistent Options
| Virtual Options (View, Remote Access) | Persistent Options (Load, Realtime Replication) |
---|
Data Access | When needed, could be often | Once, in any case |
Use Case | Operative reporting, rare access | Strategic reporting, trends, history |
Objects in SAP HANA | Remote Source (SDA or SDI) + Virtual Table SAP HANA calculation view Database (DB) Procedure SAP HANA Core Data Services View (CDS View) | Replication Task (SDI) Flowgraph (SDI + SDQ = EIM) - generating a Batch task (to be scheduled)
- or Realtime task (to be executed)
- or DB-Procedure (to be scheduled)
|
Objects in SAP S/4HANA | DB-View, ABAP CDS View | Sum table, Setup table |
Objects in SAP BW/4HANA | Open ODS View for table CompositeProvider for SAP HANA calculation view Virtual InfoObject (on SAP HANA calculation view) | Streaming process chain + DataStore Object (advanced) Regularly scheduled process chain |
Data Combination | Join, association (join on demand) | Read values during load |
Storage Space | No additional storage | Tables filled redundantly in SAP HANA |
Administrative Effort | No administration of data load processes | Requires administration / monitoring of data load process |
Data Read Performance | Might be slow | Speedy |