Implementing Joins

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Implement joins
  • Differentiate between inner joins and outer joins
  • Implement nested joins

Basics and Syntax of Joins

Definition of a Join

Sometimes it is necessary to combine two SQL data sources into one result set, consisting of columns of both data sources. This is called a join.

As an example, when reading flight connections, it can be necessary to read details on the flight carrier, too, for example, the name of the carrier.

The rows of the result set are determined by the join type and by join conditions between columns of the data sources. The join conditions start with keyword ON and define relations between columns of the data sources.

The join type is defined by the corresponding keyword. In ABAP SQL, supported join types are INNER JOIN, LEFT OUTER JOIN, and RIGHT OUTER JOIN.

We will discuss the different results of these join types soon. But let's have a look at the ABAP SQL syntax of joins, first.

The example combines carrier data (carrier ID and carrier name) and connection data (connection ID, departure airport, and destination airport) into one result set. The left data source is database table /DMO/CARRIER, whereas database table /DMO/CONNECTION serves as right data source. The keyword INNER JOIN between the two data source names specifies the join type.

The join condition after ON request the content of field CARRIER_ID of database table /DMO/CARRIER equals the content of field CARRIER_ID of database table /DMO/CONNECTION.

In the join conditions as well as in the field list, WHERE clause, and so on, the columns of data sources are fully identified by the name of the data source and the column name, separated by the tilde sign ( ~ ).

Note
From other SQL dialects, you may be used to the period or full stop sign ( . ). This had to be replaced by another sign in ABAP SQL, because the period sign already has the meaning of ending the statement.

It is possible to address the column names without the data source name. The pre-condition is that the column name only exists in one of the data sources and that the other data source does not contain a column of the same name.

In the example, only database table /DMO/CARRIER contains columns NAME and CURRENCY_CODE, whereas columns CONNECTION_ID, AIRPORT_FROM_ID, and AIRPORT_TO_ID only exist in database table /DMO/CONNECTION. Therefore, these column names are unique without preceding/DMO/CARRIER~ or /DMO/CONNECTION~.

On the other hand, both tables contain a column CARRIER_ID. Therefore, for this field /DMO/CARRIER~ or /DMO/CONNECTION~ is mandatory.

It is good practice to use aliases for the data source names to shorten the statement and increase readability. This is done using addition AS after the name of the data source. In the example, alias A is used for database table /DMO/CARRIER and alias C for database table /DMO/CONNECTION.

If a data source has an alias, it is mandatory to use the alias instead of the original data source name.

Required Aliases in Self-Joins

In some situations, aliases for data sources are more than just a syntax alternative. They become indispensable in statements that contain the same data source more than once.

Let's have a look at an example. Let's consider we have a table with flight connections (DMO/CONNECTION). We are interested in destinations that we can reach starting from Frankfurt International Airport (FRA). But instead of direct flights, we are looking for flights with one transfer.

The solution is a join of table /DMO/CONNECTION with itself. The first join condition requests that the second connection (2nd leg) starts from the airport at which the first connection (1st leg) ends. The second join condition, linked to the first with the AND operator, excludes combinations where the second connection is a direct return flight to the original departure airport.

The highlighted row in the result is a connection with LH 400 from Frankfurt (FRA) to New York (JFK) and from there with AA 0015 to San Francisco (SFO).

It is quite obvious that such a self-join would not work without the alias names a for the left usage of (DMO/CONNECTION and b for the right usage of /DMO/CONNECTION. Without the different aliases, it would not be possible to distinguish between the first connection and the second connection - neither in the join condition nor in the field list or WHERE clause.

Note
In this kind of self-join, the alias name is required in front of each column name because every column name exists in both data sources.

Try It Out: Syntax of Joins

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code snippet
    
        SELECT FROM /dmo/carrier INNER JOIN /dmo/connection
    *    SELECT FROM /dmo/carrier AS a INNER JOIN /dmo/connection AS c
                 ON /dmo/carrier~carrier_id = /dmo/connection~carrier_id
    
             FIELDS /dmo/carrier~carrier_id,
                    /dmo/carrier~name AS carrier_name,
                    /dmo/connection~connection_id,
                    /dmo/connection~airport_from_id,
                    /dmo/connection~airport_to_id
    
              WHERE /dmo/carrier~currency_code = 'EUR'
               INTO TABLE @DATA(result).
    
        out->write(
          EXPORTING
            data   = result
            name   = 'RESULT'
        ).
    
    Expand
  3. Select CTRL + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
  5. Place the cursor in the first line (the line with keyword SELECT), and select CTRL + < to add a star sign (*) in the first column of that line.
  6. Remove the star sign (*) from the beginning of the next line. Analyze the syntax error that tells you that table name /DMO/CARRIER is obscured by an alias name.
  7. In the ON condition, replace table name /DMO/CARRIER with its alias name A and table name /DMO/CONNECTION with its alias name C.
  8. In the FIELDS list and the WHERE clause, remove the alias name and the tilde sign (~) in front of every field name where this is possible. You should find, that it is only mandatory in front of field carrier_id.
  9. Activate and execute again.

Join Types

Let's look at the differences between inner joins and outer joins.

This example illustrates the result of a left outer join. The two rows highlighted in the result set would not be there in an inner join. They correspond to carriers that do not have flight connections in database table /DMO/CONNECTION.

These rows only contain information about the carrier, that is, information from the left data source, table /DMO/CARRIER. The columns that usually contain information from the right data source, that is columns CONNECTION_ID, AIRPORT_FROM_ID, and AIRPORT_TO_ID, are filled with initial values. These values were entered by the ABAP runtime instead of the Null values returned by the database.

Note
The type of column CONNECTION_ID is based on ABAP built-in type N. The initial value for this type is not a series of blanks but a series of zeroes.

Try It Out: Inner Join and Outer Joins

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code snippet
    
     SELECT FROM /dmo/Agency AS a
                    INNER JOIN /dmo/customer AS c
    *           LEFT OUTER JOIN /dmo/customer AS c
    *          RIGHT OUTER JOIN /dmo/customer AS c
                 ON a~city         = c~city
    
             FIELDS agency_id,
                    name AS Agency_name,
                    a~city AS agency_city,
                    c~city AS customer_city,
                    customer_id,
                    last_name AS customer_name
    
              WHERE ( c~customer_id < '000010' OR c~customer_id IS NULL )
                AND ( a~agency_id   < '070010' OR a~agency_id   IS NULL )
    
               INTO TABLE @DATA(result_Join).
    
    
        out->write(
          EXPORTING
            data   = result_join
            name   = 'RESULT_JOIN'
        ).
    
    Expand
  3. Select CTRL + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
    Note
    The code example retrieves pairs of customers and travel agencies with an address in the same city. It uses a WHERE clause to keep the result set small. So, with the INNER JOIN, only customer '000010' and travel agency '070010' are found. The WHERE clause is written in a form that works for outer joins, too. Conditions <field_name> IS NULL are needed to not exclude results with NULL values from the database.
  5. In the FROM clause of the SELECT statement, replace INNER JOIN with LEFT OUTER JOIN and activate again. Alternatively, you can comment out the line with INNER JOIN and uncomment the line with LEFT OUTER JOIN.
  6. Execute again, and compare the console output to the first output.
  7. In the FROM clause of the SELECT statement, replace LEFT OUTER JOIN with RIGHT OUTER JOIN and activate again. Alternatively, you can comment out the line with LEFT OUTER JOIN and uncomment the line with RIGHT OUTER JOIN.
  8. Execute again and compare the console output to the first two outputs.

Nested Joins

Join of Multiple Data Sources

In ABAP SQL, you can join more than two data sources at the same time. Let's begin with the join of two data sources, namely the join flight carriers and flight connections that we discussed earlier.

Now let's consider we want to add database table /DMO/AIRPORT to read the name and city of the departure airport (AIRPORT_FROM_ID).

To join the three data sources, we first surround the existing join with a pair of brackets. This first join now serves as left data source for the next join. We add the join keyword (INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN), the right data source (/DMO/AIRPORT), an alias (AS apf), and the join condition for the additional join (ON c~airport_from_id = apf~airport_id).

Now let's consider we want to add database table /DMO/AIRPORT once again to also read the name and city of the destination airport (AIRPORT_TO_ID).

To implement this join of four data sources, we repeat what we did for the join of three data sources: We surround the existing join of three data sources with another pair of brackets and add the join keyword, right data source and join condition.

Note
You can repeat this until you reach the maximum number of data sources that can be linked to each other. Currently, up to 50 data sources (49 joins) are allowed in one SELECT statement.

Sequence of Joins

In the previous example, the next data source was always added on the right-hand side of the next join. The left data source was either a database table or a join. The right data source of all three joins was a database table.

Thanks to the brackets, it is also possible to use a join as the right data source of another join. In the lower example, we start with a join of /DMO/CONNECTION and /DMO/AIRPORT (AS apf). We surround it with brackets and use it as left data source of a join with /DMO/AIRPORT (AS apt). To add /DMO/AIRLINE, we surround the existing join with another pair of brackets. Then we add the left data source (/DMO/CARRIER) and the join keyword before the opening bracket and the ON-condition after the closing bracket.

Note
In our example, the two results are identical. But there are situations where the result can depend on the sequence in which the database evaluates the joins.

The Importance of Brackets

From a syntax point of view, the brackets are not necessary. But for the sake of readability, it is strongly recommended not to omit the brackets in nested joins. This should be illustrated by the following example.

The SELECT statement contains a join of four database tables but does not use any brackets. At a superficial glance, one could think that the join of /DMO/CARRIER and /DMO/CONNECTION is evaluated first because these tables are the first to appear in the FROM clause.

But this conclusion is wrong!

Without brackets, the sequence of evaluation is determined by the sequence of the ON conditions and not by the sequence of the data sources. On closer inspection, we see that the ON-condition that links tables /DMO/CARRIER (alias a) and /DMO/CONNECTION (alias c) to each other is placed at the very end of the FROM clause. Therefore, this join is the last to be evaluated.

Hint
Always use brackets in nested joins to avoid this kind of confusion.

Log in to track your progress & complete quizzes