Introducing SAP Datasphere

Describing SAP Datasphere Concepts

Working with Objects in SAP Datasphere

Outlining Integration Options

Objective

After completing this lesson, you will be able to describe integration options in SAP Datasphere.

Data Integration

Let's explore the integration options in SAP Datasphere related to data acquisition.

Importing CSV Files

The figure shows a screenshot of importing a CSV file in the Data Builder.

In SAP Datasphere, the Data Builder provides a CSV file upload function by default.

You can use the Import CSV File option for manual uploads (under 25MB). For this feature, no connection has to be defined.

It automatically creates a table with derived columns based on file structure. You can apply various data wrangling and transformation rules, such as concatenate, split, extract, replace, change or filter.

To load data into an existing table (without transformations), use Upload Data From CSV File in the table editor.

Note

For larger or automated loads, a generic SFTP connection has to be set up to connect to and access files on a Secure File Transfer Protocol (SFTP) server.

Acquire Data Using a CSV file

In this simulation, data is acquired using a CSV file.

ExerciseStart Exercise

Data Integration Using Connections

Overview sketch with the options to replicate or federate data. Remote Tables are used for federation, snapshot replication, or real-time replication. A table in SAP Datasphere can be filled using a data flow, or by using a replication flow for 1:1 updates, followed by a transformation flow.

Using connections to sources, there are several approaches of data integration in SAP Datasphere:

Remote tables
You can use remote tables to:
- directly access data in the source (remote access).
- copy the full set of data (snapshot or scheduled replication).
- copy data changes in real-time (real-time replication).
Flows
Three types of flows are offered:
- Data flow: Traditional ETL approach: transform data first, then store it, following the established ETL paradigm.
- Replication flow: For ELT scenarios where data is extracted and loaded first, then transformed.
- Transformation flow: For post-load transformations on already loaded data.

Data Integration Architecture

Overview of data connectivity using oData, dpAgent, Cloud Connector or Amazon Athena or Google Big Query.

SAP Datasphere leverages different technologies to setup connections to sources. As a result, each connection provides different functionality, prerequisites and user experience.

Remote table

Mostly SAP sources are ready to integrate their data as remote tables. For this data integration approach, the SAP HANA Federation Framework is currently mainly based on SAP HANA Smart Data Integration (SDI) and its data provisioning framework (dpServer + dpAgent).

The dpAgent is a lightweight component running outside the SAP Datasphere environment. It hosts data provisioning adapters for connectivity to remote sources, enabling data federation and replication scenarios. The dpAgent acts as a gateway to SAP Datasphere providing secure connectivity between the database of your SAP Datasphere tenant and the adapter-based remote sources.

The dpAgent is managed by the Data Provisioning Server. It is required for all SDI connections in general. Through the Data Provisioning Agent, the pre-installed data provisioning adapters communicate with the Data Provisioning Server for connectivity, metadata browsing, and data access. The Data Provisioning Agent connects to SAP Datasphere using JDBC. It needs to be installed on a local host in your network and needs to be configured for use with SAP Datasphere.

There are some sources which are able to setup a direct connection (for example SAP SuccessFactors and SAP HANA Cloud). For a direct connection to SAP HANA on-premise, SAP HANA Smart Data Access (SDA) with the Cloud Connector is used.

Replication flow and data flow

SAP embedded Data Intelligence (DI) provides the functionality of data flow and replication flow. In this scenario, DI connectors are used to reach remote sources. To establish a connection to these sources the Cloud Connector is required to act as link between SAP Datasphere and the source. Before creating the connection, the Cloud Connector requires an appropriate setup.

The Cloud Connector serves as a link between SAP Datasphere and your on-premise sources and is required for connections that you want to use for the following use cases:

Data flows.
Replication flows.
Model import from SAP BW/4HANA model transfer connections (Cloud Connector is required for the live data connection of type tunnel that you need to create the model import connection).
In rare cases also for remote tables: only for SAP HANA on-premise via SDA.

Modeling Integration

Let's explore the integration options in SAP Datasphere related to modeling functionality.

Semantic Onboarding

Semantic Onboarding offers a central entry point for importing semantically-rich objects from your SAP systems and the Content Network, as well as the Public Data Marketplace and other marketplace contexts.

Import Objects with Semantics from SAP Systems

The figure shows screenshots of the Import Entities wizard for metadata from SAP S/4HANA , SAP BW/4HANA and SAP BW Bridge.

You can use the Import Entities wizard to import semantically-rich objects from selected SAP systems. The wizard creates Business Builder and Data Builder entities (along with all the objects on which they depend) in SAP Datasphere.

You can import objects from the following SAP systems:

SAP S/4HANA Cloud
SAP S/4HANA on-premise
SAP BW/4HANA
SAP Datasphere, SAP BW Bridge
SAP HANA Cloud

When importing from SAP S/4HANA systems, we recommend that, where possible, you use the Import Entities wizard for importing CDS views from these connection types, as it is able to leverage their rich semantics to import higher-level objects and to follow associations to dimensions, hierarchies, and text entities and include them in the import.

Only SAP S/4HANA Cloud connections can include associated entities in the import. SAP S/4HANA on-premise connections cannot follow associations, but the information is included in each imported object's CSN definition, and associations will be automatically recreated in SAP Datasphere if their target entities are already present or are subsequently imported.

Using HDI Containers

The figure shows how data of HDI Containers and Spaces can be exchanged in modeling scenarios as described below this picture.

It is possible to enable SAP SQL data warehousing on your SAP Datasphere tenant to exchange data between your HDI containers and your SAP Datasphere spaces without the need for data movement.

This integrated SAP SQL data warehousing approach allows you to add HDI containers to your space and exchange data between them:

Use calculation views and other SAP Business Application Studio objects as sources for your SAP Datasphere local tables and data flows.
Use SAP Datasphere views that are exposed for consumption as sources for your SAP Business Application Studio calculation views and flowgraphs.
Use SAP Business Application Studio tables as targets for your SAP Datasphere data flows.

Note

For detailed configuration of these three scenarios, you can find more information here: Using HDI Containers with SAP Datasphere

In SAP Datasphere, you can also use SAP HANA calculation views, generated from SAP BW/4HANA objects, which are located in schema _SYS_BIC.

Open SQL Schema

An Open SQL Schema in SAP Datasphere provides a flexible SQL endpoint that enables secure data integration and modeling capabilities through database users. It creates a dedicated schema in the underlying SAP HANA database that allows both read and write operations, facilitating seamless data exchange between SAP Datasphere and external systems.

Here are some examples how the Open SQL Schema can be used:

Create tables, views, and stored procedures using DDL and DML statements.
Write data to tables.
Create a table to act as a target to receive data written from a data flow.
Serve as a bridge for data extracted from third-party systems via middleware.
Leverage SAP HANA's embedded machine learning capabilities including APL, PAL, spatial engine, text processing, and graph algorithms for advanced analytics and predictive modeling.
Leverage data anonymization and data masking
Data science: use tooling like Jupyter notebooks

Continue to quiz