Introducing to Data Transformation

Objective

After completing this lesson, you will be able to explain data requirements and setup
Introduction to Data Transformation.

Introduction to Data Transformation

Activities and interactions within source systems leave behind digital traces. These traces are generated as users engage with various software applications, leaving behind a trial of timestamped events. These events capture essential information such as the activities performed, the order of execution, and the entities involved (e.g., users, systems, or departments). Digital traces serve as the raw data source for process mining analyses, offering a detailed and objective perspective on how processes unfold in practice.

Business objects represent the entities that compose a business process. These could include documents, orders, invoices, etc. Subsets of a business process act on individual business objects, and the compilation all the business objects compose the entire business process. Using a procure-to-pay process as an example, the associated business objects are:

  • Purchase Requisition (PR)
  • Purchase Order (PO)
  • Invoice

All changes and transactions referring to these objects are stored in a database. Now, with SAP Signavio Process Intelligence, those details can be explored. They get extracted and transformed in a way that allows backtracking of all the steps. Those recreated steps are stored in an event log.

The event log is a chronological and structured record of events that have occurred within a business process. Each entry in the event log represents an occurrence of a specific activity at a particular point in time within the business process. Data transformation is the process of changing the format, structure, or values of raw data from ERP systems in order to create the event log.

An event log contents a mandatory fields case ID (e.g.: Order ID), event name (e.g.: Name of the activity), and times stamp (e.g.: The time when the specific activity occurred) along with additional attributes.

Importance

Why is data transformation necessary to create an event log? Because all data is stored in different tables. We need to ensure that the extracted data is linked by a unique identifier to their specific cases. How will a system know that Order ID 123 in the order table and Invoice ID 456 in the invoice table belong to the same case?

What's your Case?

Defining the correct case identifier (ID) is one of the most important points in data transformation. The case ID defines the scope of the process. It determines where the process starts and ends. In a Procurement process, if the case ID is defined by the purchase document ID, every single request will be considered a new case - it doesn't matter if multiple requests might be combined into one order.

If the case ID is defined by the order ID, the data set will contain all orders as cases, regardless of their underlying purchase requests. A combination of both would also lead to cases for every purchase request. At the end, the answer depends on what business object or document should be analyzed in terms of its lifecycle. The case attribute and the event logs are then build on the basis of the case ID.

Data Load

The last part of ETL is the Data Load phase. This covers the tasks to upload the transformed data into the Process Intelligence front end.