Introducing SAP Datasphere Integration Options

Introduction

SAP Datasphere provides a large set of default connections to access data from a wide range of sources, in the cloud or on premise, from SAP or from non-SAP sources or partner tools.

You define connections in a separate view that can be found in the main navigation window of the left. Connections are individual objects in SAP Datasphere. You create and maintain them per SAP Datasphere space, so that only members of a specific space can make use of the related connections.

You can use Connections created this way in the different tools such as SQL View Builder, Graphical View Builder, and Data Flow to create and fill your data models.

To learn more about integrating SAP applications, refer to the how-to-paper by SAP at: First Guidance: Data Integration for ABAP Source Systems.

Each connection type supports a defined set of features. Depending on the connection type and the connection configuration, you can use a Connection for one or more of the following features:

Remote Tables
The remote tables feature supports building views. After you create a connection in the graphical view editor of the Data Builder, a modeler can add a source object (usually a database table or view) from the connection to a view. The source object deploys a remote table.
During import, the tables deploy as remote tables. Depending on the connection type, you can use remote tables for the following tasks:
- Directly access data in the source (remote access)
- Copy the full set of data (snapshot or scheduled replication)
- Copy data changes in real time (real-time replication)
Data Flows, Replication Flows and Transformation Flows
The flow feature supports building data flows, replication flows and transformation flows. After you have created a connection, in the respective flow editors of the Data Builder, a modeler can add a source object from the connection to a data flow to integrate and transform your data.
External Tools
SAP Datasphere is open to SAP and non-SAP tools to integrate data to SAP Datasphere.

By default, when you import a remote table, its data does not replicate and you must access it using federation each time from the remote system. You can improve performance by replicating the data to SAP Datasphere and you can schedule regular updates (or, for many connection types, enable real-time replication) to keep data fresh and up-to-date.

Data Integration Based On Data Flows

SAP Datasphere Data Flow functionality enables the definition of more advanced Extraction-Transformation-Load (ETL) flows that complement existing data federation and replication services.

In the data flows, we can use a series of standard transformations without the need for programming knowledge. However, we can also create transformations based on scripts.

Therefore, what is the difference between data views and data flows? Data views are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future), obtaining a single output structure. Data flows transform and persist the changes in one or multiple structures.

In a data flow, we use views or tables that we may already have in our SAP Datasphere. Alternatively, we use the connections to get data from other systems. In that case, we first create all the necessary connections in our space.

Insert a script operator to transform incoming data with a Python script and output structured data to the next operator. The script operator receives the data from a previous operator. You can provide transformation logic as a body of the transform function in the script property of the operator. Incoming data feeds into the data parameter of the transform function and the result from this function is returned to the output.

The script operator allows data manipulation and vector operations by providing support for NumPy and Pandas modules. You can use non-I/O objects and functions of NumPy and Pandas with aliases np and pd, respectively, without requiring you to explicitly import them.

The incoming data parameter in the transform function is of type Pandas DataFrame. The input table converts into a DataFrame and feeds into transform function as data parameter. You are expected to provide scripts for the intended transformation of the incoming DataFrame and also return a valid DataFrame from transform function. It is important that the returning DataFrame from the transform function has the same column names, types, and order as the specified table for the output. Otherwise, execution of the data flow results in failure.

In a data flow, the script operator can receive the incoming table in multiple batches of rows, depending on the size of the table. The transform function is called multiple times, for each batch of rows. Its data parameter contains only the rows for data given batch. As a result, it cannot have operations that require the complete table in the data parameter, such as removing duplicates.

SAP Datasphere Replication Flow functionality enables the definition of more advanced ETL (Extract, Transform, Load) flows that complement existing data federation (including also Snapshot and real-time access replication).

The intention of Replication Flow is to simplify the realization of data replication use cases in SAP Datasphere. Replication Flow functionality enables you to copy multiple data sets from the same source to the same target in a fast and easy way with simple projections and filters.

In case there is a need for more complex transformation, the recently introduced functionality of Transformation Flow can also be used to establish a more advanced ELT (Extract, Load, Transform) flow.

Introducing SAP Datasphere Integration Options

Integration of Data into SAP Datasphere

Introduction

Data Integration Based On Data Flows