Influencing Context for Big Data Warehouse Architecture
A modern data warehouse is usually embedded in a complex environment. It must be able to handle large amounts of data from various areas and structures. From a simplified view, there are two main areas of data that a modern data warehouse usually has to deal with:
- Commercially-oriented data, such as Sales and Distribution (SD), Financial Accounting & Controlling (FI / CO), Production Planning (PP), and Materials Management (MM)
- Production-oriented data, such as machine/IoT data, delivered through a manufacturing execution system or big data processing frameworks (such as Hadoop). In some cases this data can also come from customers of the manufacturing company or other third parties.
Modern data warehouses must also be capable to provision the data to an established Business Intelligence with a wide set of different tools, for example, for analysis, planning, reporting, or predictive purposes. In short, a modern data warehouse connects business data, production data, and business users. This can be done excellently by utilizing SAP HANA.
Four Requirements of a Big Data Warehouse Architecture
DWH Architecture: Use Only What You Need
The reference data warehouse architecture introduced in this lesson consists of the following components/areas:
- Acquisition Layer, which consists of the Data-InHub, Replication Area, and Staging Area
- RAW DWH
- Business Integrated DWH
- Virtual Analytical Layer
- Data Mart
- Agile Area
Data is not persisted in the Data-InHub area. The following object types are used in this area:
- Source objects: SAP HANA objects (CDS, CalcViews, ABAP objects, etc.), virtual access to SAP extractors (ERP, BW, etc.), database tables (DB2, Oracle, etc.) , Flat files (XLSX, CSV, etc.), others (for more information, see SAP the EIM guide)
- Target objects: Virtual tables
In the replication area, source tables are replicated 1:1 in real time and persisted permanently. The following object types are used in this area:
- Source objects: SAP HANA tables, database tables (DB2, Oracle, etc.)
- Target objects: SAP HANA tables, replication tasks
In the staging area, data is persisted temporarily and loaded with the delta load technique. For data provisioning, a range of ETL tools can be used, for example: Smart Data Integration (SDI), SAP Data Services (BODS), other ETL tools (such as Informatica Powercenter, IBM InfoSphere Data Stage), and SAP Extractors. The following object types are used in this area:
- Source objects: Any
- Target objects: SAP HANA tables
RAW Data Warehouse
In the RAW Data Warehouse, data is persisted as raw, unprocessed information. The data structures conform 100% with the source data structures. Data can undergo historization processes which represent the history of the source data. Data is enriched with surrogate keys (in connection with the Business Integrated DWH). There is no source system overlapping integration, no quality control, no business transformation, and no end user access.
Business Integrated Data Warehouse
In the Business Integrated Data Warehouse, cross-application integration, harmonization, standardization, and surrogate key creation take place. Business transformation, enrichment, and calculation are performed. Quality management (such as bad data, dummy values, correction, and auto-completion) takes place. Data is also historized, versioned, and modeled utilizing the data vault modeling approach (hubs, satellites, links). There is no end user access to the Business Integrated Data Warehouse.
Virtual Analytical Layer
Master data and measures are transferred from lower architectural areas to analytical base objects. Master data is turned into business entities. Basic measures are enriched inside fact tables. Objects include business-relevant data. There is no attribute selection. Data is projected completely, without connectors. Measures are not aggregated. There is no denormalization in order to create dimensional structures.
The Virtual Analytical Layer follows the virtualization-first approach: Objects are virtual (HANA Calculation View), OLAP functionalities and parameters are utilized, an explicit analysis engine on SAP HANA (Calculation Engine) is used.
It is possible to run multiple instances at the same time and to persist data in the Virtual Analytical Layer through Architected Analytical Objects (for example, to counteract performance issues). Due to all the aspects mentioned above, the Virtual Analytical Layer requires a security concept.
The Data Mart layer provisions virtual analytical models (virtual analytical view) and represents the business perspective on an extraction from the data warehouse. It enables dimension cutting, fact aggregation, and information filtering and it utilizes the star and/or snowflake schema. It is possible to run multiple instances of the Data Mart layer at the same time. It is also possible to persist data through Architected Data Marts (for example to counteract performance issues).
The Agile area enriches data warehouse data with its own information. This area is governed by business users. End users are enabled to create their own models on a self-service basis, without the need of specialized IT knowledge. Furthermore, the Agile area enables prototyping in an agile environment. The target group of the Agile area is data preparation specialist departments/IT.
Corporate Memory 2.0
The Corporate Memory 2.0 area is the single place for structured data of the entire enterprise on SAP HANA SQL Data Warehouse with simple, fast data provisioning. The following object types are used in this area:
- Source objects: Any
- Target objects: SAP HANA tables