SAP Datasphere provides a large set of default connections to access data from a wide range of sources, which might be in the cloud as well as on premise, or from SAP as well as from non-SAP sources or partner tools.
Connections are defined in a separate view that can be found in the main navigation window of the left. Connections are individual objects in SAP Datasphere. They are created and maintained per SAP Datasphere space, which means that only members of a specific space are able to make use of the related connections.
Connections created in this way can be used in the different tools like SQL View Builder, Graphical View Builder and Data Flow to create and fill your data models.
To learn more about integrating SAP applications, we recommend the following how-to-paper by SAP: First Guidance: Data Integration for ABAP Source Systems https://www.sap.com/documents/2021/06/e8238e12-e47d-0010-bca6-c68f7e60039b.html
Each connection type supports a defined set of features. Depending on the connection type and the connection configuration, a connection can be used for one or more of the following features:
- Remote Tables
The remote tables feature supports building views. After you have created a connection in the graphical view editor of the Data Builder, a modeler can add a source object (usually a database table or view) from the connection to a view. The source object then is deployed as a remote table.During import, the tables are deployed as remote tables. Depending on the connection type, you can use remote tables to:
directly access data in the source (remote access)
copy the full set of data (snapshot or scheduled replication)
copy data changes in real time (real-time replication)
- Data Flows
The data flow feature supports building data flows. After you have created a connection, in the data flow editor of the Data Builder a modeler can add a source object from the connection to a data flow to integrate and transform your data.
- External Tools
SAP Datasphere is open to SAP and non-SAP tools to integrate data to SAP Datasphere.
- Model Import
The model import is a special feature for SAP BW/4HANA or SAP S/4HANA Cloud as a source. It supports importing meta data instead of having to rebuild them manually. After you have created a connection, from the entry page of the Business Builder a modeler can import source meta data from the connection. See details in lesson "Transferring SAP BW/4HANA Models"
By default, when you import a remote table, its data is not replicated and must be accessed using federation each time from the remote system. You can improve performance by replicating the data to SAP Datasphere and you can schedule regular updates (or, for many connection types, enable real-time replication) to keep the data fresh and up-to-date.
Data Integration based on Data Flows
SAP Datasphere Data Flow functionality enables the definition of more advanced Extraction-Transformation-Load (ETL) flows that complement existing data federation and replication services.
In the data flows, we will be able to use a series of standard transformations without the need for programming knowledge, but we also have the possibility of creating transformations based on scripts.
Therefore, what is the difference between data views and data flows? Mainly that the data views are oriented to create views that transform the data at the moment they are read without having persistence (although this will change in the future) obtaining a single output structure, while the data flows transform and persist the changes in one or multiple structures.
In a data flow, we will use views or tables that we may already have in our SAP Datasphere or use the connections to get data from other systems, in that case we should first create all the necessary connections in our space.
Insert a script operator to transform incoming data with a Python script and output structured data to the next operator. The script operator receives the data from a previous operator. You can provide transformation logic as a body of the transform function in the script property of the operator. Incoming data is fed into the data parameter of the transform function and the result from this function is returned to the output.
The script operator allows data manipulation and vector operations by providing support for NumPy and Pandas modules. Non-I/O objects and functions of NumPy and Pandas can be used with aliases np and pd, respectively, without any requirements to explicitly import them.
The incoming data parameter in the transform function is of type Pandas DataFrame. The input table is converted into a DataFrame and fed into transform function as data parameter. You are expected to provide scripts for the intended transformation of the incoming DataFrame and also return a valid DataFrame from transform function. It is important that the returning DataFrame from the transform function has the same column names, types, and order as the specified table for the output. Otherwise, execution of the data flow results in failure.
In a data flow, the script operator may receive the incoming table in multiple batches of rows, depending on the size of the table. This means that the transform function is called multiple times, for each batch of rows, and that its data parameter contains only the rows for data given batch. Hence, the operations that require the complete table within the data parameter are not possible. For example, removing duplicates.
For more details refer to following sources:
- First Guidance: Data Integration for ABAP Source Systems https://www.sap.com/documents/2021/06/e8238e12-e47d-0010-bca6-c68f7e60039b.html
- Blog: Why SDI ABAP for virtual access in SAP Datasphere should be avoided https://blogs.sap.com/2022/05/11/why-sdi-abap-for-virtual-access-in-sap-data-warehouse-cloud-should-be-avoided/