Introduction to Standalone Connector

Standalone Connector

What is the Standalone Connector?

The standalone connector handles the communication between the source system and SAP Signavio Process Intelligence. This connector can be used, if the source system is not covered by one of the standard connectors in SAP Signavio Process Intelligence (or any other third party systems). It extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed.

However, the ETL scripts need to run externally (outside of SAP Signavio Process Intelligence) but uses the API to push the data to a process within the system.

The connector consists of multiple components working together to achieve this. This includes:

A collection of extraction and transformation SQL scripts
A configuration file in YAML format
An SQLite database to ensure the correct data is loaded each time in case of regular loads
A java application for triggering the actual extraction, transformation and loading

The following lesson will describe how these components work together and can be deployed to serve Process Intelligence with the required data.

Let's continue to learn more on the functionality on an SAP example.

The Connector uses A SAP technical (service) user to pull data from the source system and stores it in a S3 bucket.

The Connector uses Athena to generate an eventlog file from transformed S3 data and downloads this file.

The Connector uploads the event log file to the Process Intelligence API.

ETL Setup Using the Standalone Connector

In order for an automated ETL to work, we first need to setup an environment for the connector to run in. To do this, we need to setup the virtual machine.

Select each step below to learn more on setting this up.

Set Up the Virtual Machine

Staging Environment Setup

Depending on whether the data transformation can be performed in the source system, you might have to setup a dedicated staging environment. In most cases, this is much faster and better-suited for process mining. This also enables you to use multiple source systems.

In the case of AWS, an account is required with both S3 for data storage and Athena for running the transformation scripts.

Configuration - Connection

Once the environment setup is finished, the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP. It defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations.

In the lesson, we will go over these parameters and provide a simplified example in which we'll extract example data from an SAP system and do the preparation for an Order to Cash analysis.

First, let's look at each step to begin configuring our connector.

Connector Configuration

Extraction Configuration

Now that our connection is established, the next step is to define the extraction and necessary data. Let's look at the necessary parameters for delta loads. This can be done under tableSyncConfigurations. We start with the general parameters for each table that should be extracted. Learn more about each step to set up the general parameters of our tables.

Extraction Configuration

Transformation Configuration

Now that we have both our source system and extraction information, the next step is the transformation of our source data into the event log format. For this, we need 3 columns (case ID, event name, timestamp) under eventCollectorConfigurations. Learn more about each step in our Transformation Configuration.

Transformation Configuration

Execution

The connector can be started as a Java application by running the below information in the terminal. First, go to the source directory of the connector then execute to begin.

Steps

java -jar signavio-connector.jar <command>

Commands:

Based on the tableSyncConfiguration:
- extract => extracts the raw table data from the source system by using the defined extraction scripts and uploads it to the staging area where it will be saved as raw tables. Table names in staging area will be dependent on the ones provided in tableSyncConfiguration.
- createschema => generates the schema for the raw tables
- transform => optimizes the raw table schema and merges row updates in case there are changes to overlapping rows between different delta loads. Updates on data that are already extracted in a previous load will be recognized based on the keyColumn and mostRecentRowColumn parameters.
Based on eventCollectorConfiguration:
- eventlog => creates the event log out of the staging system based on the transformation scripts and uploads it to Process Intelligence.