Identify Motivation and Challenges in Data Modeling

Objectives

After completing this lesson, you will be able to:

  • Identify motivation and challenges in data modeling

Motivation for Data Modeling

Difference in Values

Different sales results from different departments.

Values collected from information processing systems often drive business decisions. Therefore, numerous reports are distributed. But where do the values come from and how reliable are they?

Imagine the decision-maker asking for the total quantity that has been delivered to a specific customer, getting two different answers:

  • The storage department states that they sold 14 KG.
  • The sales department states that the company sold 15 KG.

Without a central data collection strategy, each department relies on their own calculations. If every department keeps their own, independent database, based on the same original business process, this is called a silo.

Silos are multiple copies of the same data source with different, independent steps of consolidation, harmonization, and enhancement.

The problems with silos are:

  • Misunderstandings in discussions.
  • Aggregation of errors.
  • Different beliefs about the truth and the recommended behavior.
  • Differences in values between different public reports.
Different departments show different sales results.

Why do different departments get different results? Let's look at an example with possible reasons for the differences:

  • Different database: Different source tables (such as an aggregated source table versus a details source table), different filter values, or different times, or different source systems regularly lead to different extraction results.
  • Different actuality of values: If, at an early stage, the wrong data was extracted from the source system that was corrected later. It's important to update the values in the reporting system.
  • Different times of extraction: Values in the source vary. New orders might be added, or status information may change. Then, values may differ depending on whether they're extracted in the morning or in the evening.
  • Different calculation definitions: How is a margin defined? Which costs must be distracted? Is a percentage value based on kilograms or liters?
  • Different steps of calculation: Is an aggregation over different products calculated before a unit conversion is calculated on an average conversion factor? Or is the conversion carried out with a product-specific conversion factor before values are aggregated? Is a share (percentage value) calculated before or after a unit conversion is carried out?
  • Different times of calculation: If conversion factors change, different conversion factors lead to different results.
  • No good quality management: You've no control over the reliability of the output.

Data Modeling Directive

Aspects of a Data Modeling Directive

Many companies decide to implement a data warehouse, such as SAP BW/4HANA, with a central storage of all enterprise information, and fast information processing.

However, even the best software solution must be accompanied by a good data modeling directive.

Aspects of a data modeling directive.

A data modeling directive is a guideline regarding what must be done: where, when, with what data, and in which way?

A data modeling directive has to address the following:

  • What business processes are in focus?
  • Are values needed for the entire enterprise, or for a local subsidiary?
  • What fields of data management are intended? Reporting actual values, predictive analysis, and operation planning?
  • To what level of detail are the values needed?
  • Are current values (up to the last second) essential? Or is a latency (a delay), acceptable? Is it sufficient to process data once per day?
  • Are historical values required? How far back? If values change, is it necessary to identify obsolete values?
  • In calculations, are intermediate results needed as well?

A data modeling directive must address what to do with the data:

  • Are the values read, copied, processed, or presented?
  • Are selection processes used to reduce the set of data?
  • Are tables joined with other tables?
  • Are values transformed? Are they harmonized?
  • Which values must be checked, for instance for referential integrity, against duplicates, for the correct data format, against mathematical errors such as a sum of more than 100%?
  • Is a result presented as a result on the screen, or is it permanently stored?
  • Is data archived or deleted?

A data modeling directive has to address when, and how often, these processes occur. Is it:

  • On demand? (Whenever a business user is interested in the values).
  • At a predefined point in time?
  • Periodically, for instance daily or monthly.
  • Every time the data has changed in the source?

A data modeling directive has to address where data is read and processed:

  • Which source systems are used as a database, and which target systems are used to store values?
  • Which tables or views contain the database?
  • Are calculations applied in the database, or in the application level?
  • Which database is used?
  • Which objects are needed to meet the approved requirements?

Finally, a data modeling directive must address how the data is treated:

  • Are values from imports using flat files, a database connection, or other technologies?
  • What are the join conditions?
  • What are the filter conditions?
  • Which calculations must be carried out?
  • How are plan values derived?
  • Which currencies or units are used, and how can values be converted from one currency to another one?
  • How are values aggregated?

For instance, you must define how key figures from the source are made available. For example, a directive could suggest a raw hub with basic key figures as they're imported from the data source. Business users can then create their own calculations and restrictions on those key figures.

A core hub is another solution that provides predefined, calculated, or restricted key figures for typical use cases. In this case, a data model must define how the new key figures are derived.

You can also provide both options.

Modeling Strategy with Different Reporting Requirements

Different Reporting Requirements

It's important to realize that different users in a company may have different requirements related to business reports.

Watch the video to learn about the different reporting requirements for ITelO.

Strategy - Combining Different Requirements

Sometimes, there are good ways to let two requirements coexist:

  1. Deliver a standard report with reliable truth, but allow business users to create a local copy with their changes. SAP provides several tools with flexible self-service features.
    • In SAP Analysis for Microsoft Office, you can:
      • Choose characteristics for drill-down.
      • Add calculations.
      • Create filters for the Top N results.
      • Create highlighting for values below an individually defined target value.
    • In SAP Analytics Cloud, business users can influence the graphical display.
  2. Deliver different options within the same project and namespace, such as:
    • Provide a strategic report with a long history and another report with more details for the current year or month.
    • Allow the users to choose one of two different hierarchy versions in one report.
    • List two alternative key figures, such as quantity in KG and quantity in number of pieces.
  3. Sometimes, you may need to establish different projects with their own local reports. However, to avoid conflicts about the correct interpretation of values, we recommended using separate key figures, or different filter values. Some examples of these reports are as follows:
    • Provide a German HR report without information about the number of sick days per employee, and an American report that orders employees by number of days on leave.
    • In a customer relationship management project, provide a local report showing the number of recurring orders. In a sales project, create another local report showing the number of open orders. However, a global strategy should detect that both use the number of orders, and provide a common ground with harmonized order key figures for both.

Strategy - Target Prioritization

Sometimes, you may decide to exclude certain content or keep it out of the project focus. There might be different reasons, such as:

  • It's impossible to satisfy both targets. For example, it’s impossible to have complex algorithms that are easy to understand at first sight.
  • It's time-consuming and expensive to produce such results.
  • There are legal or compliance barriers.
  • There are global standards. For example, if the global decision was made in favor of showing a standard product categorization based on the first digits of their technical names, other categorizations aren't supported.
  • For users, it's confusing to have different versions.
  • The data sources can't be trusted.
  • Other targets have a higher priority.

If there are conflicts, your priorities must be as follows:

  1. Correct values (true, in line with legal requirements, understandable)
  2. Required functionality
  3. Flexibility (being open for future requirements)
  4. Reporting performance
  5. Staging performance (correlates with low data volume)

If the conditions change, the targets might be reevaluated, and later projects may be set up to accommodate the targets that were left out.

To avoid confusion, you must aim for:

  • Valid values.
  • Comprehensible data flows.
  • A transparent data origin.
  • A well-structured model.
  • Clear responsibilities.
  • Standard calculations and enterprise-wide available data.
  • Single point of truth.
  • Access control (authorization).
Conflicting requirements from different users.

Therefore, Cathy's requirement about integrating dynamic values from the Internet is out of scope.

Case Study - Start

Analyze Business Segments

Different business segments have different processes, issues, and reporting requirements.

Let's first focus on analyzing the business segments of a company, because business segments have different processes, different issues, and different reporting requirements. You may hear the following questions:

  • What are your business processes?

  • What are the main reporting requirements of each business?

  • Where are the issues relating to analysis and reporting?
  • Where are strategic reports relevant, where is the focus on real-time insight, and where is harmonization across systems an issue?

In the example in the figure above, you identify storage and sales as business segments with urgent requests for new reports. To find reporting requirements, start by analyzing the business process of the business segment and the issues.

In storage, the business process involves an automatically controlled inflow and outflow of goods. You may not realize ahead of time that there's a danger of insufficient material stock. You need real time data analysis with good performance on a detailed level.

The sales team wants to integrate data from different source systems and therefore wants to generate reliable harmonized data. In the sales business process, orders are placed in different currencies. The different currencies must be converted to the company's currency. To guarantee reliable data, the results must be saved permanently.

Let's have a more detailed look at the requirements for sales in ITelO.

Requirements for sales in ITelO

Let's have a more detailed look at the requirements for storage in ITelO.

Requirements for storage in ITelO

Separate Business Segments

Storage and sales processes are involved in global reporting. The organization is split into origin, business segment, storage location and organization unit. Separation is organized into different systems and clients.

As a first step, the business segments were analyzed and in this case storage and sales are the two most important business segments to focus on for ITelO.

As a second step, you have to define how the organization is divided. In this case study, the main distinction is the origin of the data. This distinction is important because ITelO and Retailer King 3000 have different data models. Moreover, each organization consists of different organizational units. Each organization unit operates different storage locations (which is especially important for storage, but also for sales).

The third step is to investigate how this separation is represented in the technical systems.

Each business segment stores its data in a set of database tables. Make a note of any tables that are used by more than one business segment. In our case study, the product master data tables are used by both storage and sales.

Identify which field represents the organization unit. If you want to extract values only for one organization unit, apply a filter on this field.

When inspecting the relevant tables, ask yourself the following questions:

  • What fields always have unique values? What fields contain values that can be repeated in different rows?
  • Is there a check table for a field? Can you only enter values that are listed in other tables?
  • What values are likely to change and when?

Bringing Together Separate Segments

Bringing the Storage and Sales segments together.

In a fourth step, you define a target landscape. In the ITelO case study, the main task is to combine storage and sales values from different sources. For example, ITelO wants to see the sum of all sales volumes for the same category.

To define this task on a business level, check what separation must remain and what level of integration is intended.

To understand the integration task, first check what differences exist between corresponding business processes. Then, check the differences in the data model (table names, table fields, allowed values, and behavior when data changes).

After investigating the differences, you can design a model to integrate the data by following these steps (this process will be defined in more detail later in this course):

  1. Homogenize the master data.

    In this case study, it’s easy to integrate product master data because Retailer King 3000 sells the same products and uses the same product ID. However, Retailer King 3000 uses different master data such as categories and product prices. Decide which system is the leading system, that is, from which system the price and category information is taken. Alternatively, both options can be presented as a choice to the business user.

  2. Prepare transactional data.

    In this case study, storage and sales remain separate areas of reporting. For storage, both sources have tables of similar design, and a view that brings together both sources is sufficient. For the sales data, you must load harmonized values. Map data from different systems to the same format, then add a source distinguishing field to the key of each table.

  3. Define a query, view, or report.

    Define a query, a view, or a report that aggregates the sales values across both sources and joins the master data, and other external data.

Extra tasks involve the integration of other sources. In our case study, you want to compare sales and purchase amounts.

Log in to track your progress & complete quizzes