Design Business Processes with SAP Signavio Solutions

Introduction to Standalone Connector for Consultants

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Set up the Standalone Connector

Standalone Connector

What is the Standalone Connector?

The standalone connector handles the communication between the source system and SAP Signavio Process Intelligence. This connector can be used, if the source system is not covered by one of the standard connectors in SAP Signavio Process Intelligence (or any other third party systems). It extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed. 

However, the ETL scripts need to run externally (outside of SAP Signavio Process Intelligence) but uses the API to push the data to a process within the system.

The connector consists of multiple components working together to achieve this. This includes:

  • a collection of extraction and transformation SQL scripts
  • a configuration file in YAML format
  • an SQLite database to ensure the correct data is loaded each time in case of regular loads
  • a java application for triggering the actual extraction, transformation and loading

The following lesson will describe how these components work together and can be deployed to serve Process Intelligence with the required data.

Let's continue to learn more on the functionality on an SAP example.

The Connector uses A SAP technical (service) user to pull data from the source system and stores it in a S3 bucket.

The Connector uses Athena to generate an eventlog file from transformed S3 data and downloads this file.

The Connector uploads the event log file to the Process Intelligence API.

ETL Setup Using the Standalone Connector

In order for an automated ETL to work, we first need to setup an environment for the connector to run in. To do this, we need to setup the virtual machine.

  • Specifications
    • Set up a host with the following specification (sizing may vary depending on use case):
    • 10GB RAM

    • single core CPU
    • 10GB free disk space 
    • network connectivity to both the source system and Signavio
  • Installation
    • Install JDK 8 or higher
    • Oracle
    • OpenJDK
  • Microsoft Windows Environment
    • If the connector is running in a Microsoft Windows environment, you must also install the Microsoft Visual C++ 2010 Redistributable Package.
  • Download SAP Signavio Connector
    • Download and unpack the Signavio Connector, provided by SAP Signavio. Depending on your source system additional steps might be required.

Staging Environment Setup

Depending on whether the data transformation can be performed in the source system, you might have to setup a dedicated staging environment. In most cases, this is much faster and better-suited for process mining. This also enables you to use multiple source systems. 

In the case of AWS, an account is required with both S3 for data storage and Athena for running the transformation scripts.

Configuration - Connection

Once the environment setup is finished, the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP. It defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations.  

In the lesson, we will go over these parameters and provide a simplified example in which we'll extract example data from an SAP system and do the preparation for an Order to Cash analysis.

First, let's look at each step to begin configuring our connector. 

  • SAP Signavio
    1. apiToken: can be found in PI under Process Settings →  API → New Token
    2. url: in the same settings as the API token under Upload Endpoint
  • Staging Area - these inputs can be provided to you by the admin of your AWS. You will need the following information:
    1. accesskey: access information provided by the account creator
    2. secretKey: access information provided by the account creator
    3. athena Workgroup: name of athena workgroup if multiple exist
    4. region: region your AWS instance runs in
    5. bucket: name of the S3 bucket the data will be stored in
    6. athenaDatabase: name of your database in athena
  • Source System - the information to establish a connection will vary based on each system but some typical inputs are:
    • source: name of your source system
    • user: user in the system with necessary access rights
    • password: password of that user
    • logfile: name of the extraction log file
    • verbosity: amount of detail that will be stored in the log file, more information here

Extraction Configuration 

Now that our connection is established, the next step is to define the extraction and necessary data. Let's look at the necessary parameters for delta loads. This can be done under tableSyncConfigurations. We start with the general parameters for each table that should be extracted. Learn more about each step to set up the general parameters of our tables. 

  • SAP Signavio - We begin by setting the general parameters for each table that should be extracted. This includes:
    • name: name for the table in staging area
    • datasource: specify data source in case of multiple source systems
    • sqlFilePath: name of the sql file to be used for extraction
    • keyColumn: key column of the table used for merging multiple rows in staging
    • mostRecentRowColumn: used to identify the newest row with the most recent information
  • Additional Parameters - The next parameters are for scheduled data loads in case new data needs to be extracted on a regular basis to keep the investigation up to date. They can be in either date or ID format which needs to be a continuously growing number. This includes:
    • name: name of the table column that distinguishes the different delta loads, mostly either a creation and/or change date
    • initial: the initial date or ID from where to start extraction on the first load
    • date case
      • format: format of the column (e.g YYYY-MM-dd)
      • type: date in this case
    • ID case
      • idformat: format of the column (e.g '%010d')
      • type: id in this case
  • Left & Right Boundaries - The extraction scripts will have a condition to make sure no data is missed or loaded twice during multiple delta loads. These conditions usually make use of a left and right boundary (normally dates) and extract all data between these values. :
    • Left boundary: the initial one for the first load and the value from the previous load on all future loads. These are saved in the previously mentioned log files on each load. 
    • Right boundary: is usually the current date (in date-based loads) or the newest ID (in ID-based loads).
    • When there are large source tables, partitions of the data are possible to reduce the load on the source system of the initial load. This requires additional parameters. Partitions:
      • name: name of the column to partition by
      • query: query to retrieve the specified column (e.g. SELECT column FROM table)

Transformation Configuration 

Now that we have both our source system and extraction information, the next step is the transformation of our source data into the event log format. For this, we need 3 columns (case ID, event name, timestamp) under eventCollectorConfigurations. Learn more about each step in our Transformation Configuration. 

  • Event Log Format - We need 3 columns (case ID, event name, timestamp) under eventCollectorConfigurations to begin transforming our source data into event log format. This includes: 
    • name: name to distinguish different events during transformation, not the actual event name
    • collectFrom: where the source tables should be queried, either staging environment or the source system itself
    • sqlFilePath: name of the transformation script for this event
    • mapping: the following inputs will map the query results to the necessary event log columns. Usually they refer to results of the specified transformation script:
      • caseId: either reference to transformation script or column of source system where applicable
      • timestamp: fieldName: name of the timestamp column in the finished eventlog
      • timestamp: pattern: format of the timestamp e.g. (yyyy-M-d HH:mm:ss.SSS)
      • activityName: name of the event in finished eventlog
    • attributes: for date attributes (e.g Invoice clearing date for Invoice cleared event) the format needs to be specified here
      • name: name of the event attribute
      • type: date
      • dateFormat: format of the timunestamp e.g. (yyyy-M-d HH:mm:ss.SSS)
  • Use of Additional Information - Lastly, additional case attributes can be specified to enrich the analysis with more useful information. As with events, the actual creation happens in the transformation script, but with date attributes the format needs to be specified here as well. This is all done under caseAttributesConfigurations. This includes: 
    • name: Case Attributes
    • collectFrom: where the source tables should be queried, either staging environment or the source system itself
    • sqlFilePath: name of the transformation script for the case attributes
    • mapping: used to allocate the events to the right case
    • caseId: reference to transformation script
    • attributes
      • name: name of the case attribute
      • type: date
      • dateFormat: format of the timestamp e.g. (yyyy-M-d HH:mm:ss.SSS)

Execution 

The connector can be started as a Java application by running the below information in the terminal. First, go to the source directory of the connector then execute to begin.

Steps

java -jar signavio-connector.jar <command>

Commands: based on the tableSyncConfiguration:

  • extract => extracts the raw table data from the source system by using the defined extraction scripts  and uploads it to the staging area where it will be saved as raw tables. Table names in staging area will be dependent on the ones provided in tableSyncConfiguration.
  • createschema => generates the schema for the raw tables 
  • transform => optimizes the raw table schema and merges row updates in case there are changes to overlapping rows between different delta loads. Updates on data that are already extracted in a previous load will be recognized based on the keyColumn and mostRecentRowColumn parameters.

based on eventCollectorConfiguration:

eventlog => creates the event log out of the staging system based on the transformation scripts and uploads it to Process Intelligence.

Save progress to your learning plan by logging in or creating an account