Processing Strings Using Functions and Regular Expressions

Objectives

After completing this lesson, you will be able to:

  • Describe built-in string functions in ABAP
  • Work with built-in string functions in ABAP
  • Explain the use of regular expressions in ABAP

Built-in String Functions

In addition to string concatenation with the && operator and formatting with string templates, ABAP offers a large variety of built-in string functions. Based on the type of the result, we distinguish between three categories of string functions:

Note

If the input of a built-in string function consists of several parameters, the name of the main input parameter is always val.

Parameters of String Functions

For some built-in string functions, the input consists of just one char-like data object. These functions do not use input parameters. The input is specified directly within the brackets. Description function NUMOFCHAR( ) is a good example for these type of functions.

Functions with more than one input use parameters with names and the input is assigned to these parameters. Many parameters are common ground between different built-in string functions. The most important parameters are the following:

VAL

You pass the text string to be processed by the function to the parameter val. You can also specify functional method calls, table expressions, and constructor expressions whose return value is convertible to the type string. Only elementary data types can be processed. If a character-like data object with a fixed length is specified, any trailing blanks are ignored. Non-character-like return values are converted to the data type string.

SUB

Parameter sub is used to pass a character string whose characters are to be searched for or inserted. Only arguments with elementary types can be specified. If a character-like data object with a fixed length is specified, any trailing blanks are ignored.

CASE

Searches and comparisons in string functions are case-sensitive by default, but this can be overridden if necessary, by using the parameter case. The parameter case requires the input of type abap_bool (C LENGTH 1) with the value of the constants abap_true ('X') or abap_false (' '). If case contains the value of abap_true, the search is case-sensitive; if it contains the value of abap_false, it is not.

OCC

In string functions where searches are performed, the parameter occ specifies the occurrence of a match. If occ is positive, the occurrences are counted from the left; if occ is negative, they are counted from the right. The values 1, 2, .... indicate the first, second, ... occurrences. The values -1, -2, .... indicate the last, second-to-last, ... occurrences. The default value of occ is 1.

Note

Except in the case of the replacement function replace, the value 0 raises exception CX_SY_STRG_PAR_VAL. If replace is used, the value 0 replaces all occurrences.
OFF and LEN

Parameter off is used to pass an offset and parameter len is used to pass a length. In functions where both off and len can be passed, they determine the subarea in which a string is to be processed.

The default value of off is generally 0 and the default value of len is set to the length of the complete string after the offset.

Note

Inappropriate combinations of values can lead to exception CX_SY_RANGE_OUT_OF_BOUNDS.

Let's have a look at some examples to see the impact of these parameters:

Try It Out: Common Parameters of String Functions

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code Snippet
    12345678910111213141516171819
    DATA text TYPE string VALUE ` Let's talk about ABAP `. DATA result TYPE i. out->write( text ). result = find( val = text sub = 'A' ). * * result = find( val = text sub = 'A' case = abap_false ). * * result = find( val = text sub = 'A' case = abap_false occ = -1 ). * result = find( val = text sub = 'A' case = abap_false occ = -2 ). * result = find( val = text sub = 'A' case = abap_false occ = 2 ). * * result = find( val = text sub = 'A' case = abap_false occ = 2 off = 10 ). * result = find( val = text sub = 'A' case = abap_false occ = 2 off = 10 len = 4 ). out->write( |RESULT = { result } | ).
  3. Select CTRL + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
  5. Remove the comments in front of the different calls of function FIND( ) and activate the program again.
  6. Debug the program and analyze the content of data object result after each assignment. Can you understand the different results?
  7. Play around with the different parameters to get familiar with their meaning.

Description Functions

Length functions

An important group of description functions for strings are the length functions NUMOFCHAR( ) and STRLEN( ). Most of the time, the two functions return the same result. However, there is one exception: If the argument is of type string and contains one or more blanks at the end, the result of STRLEN( ) includes those trailing blanks, whereas NUMOFCHAR( ) ignores them. For arguments with fixed length, for example arguments of TYPE C or N, both functions ignore the blanks at the end.

Groups of search functions

There are two groups of search functions for strings:

Function COUNT( ) and the functions starting with COUNT_...( ) return the total number of occurrences of a search argument. Function FIND( ) and the functions starting with FIND_( ) return the position (offset) of one particular occurrence of a search argument.

Parameter sub in search functions

Functions COUNT( ) and FIND( ) can either search for a substring (optional parameter sub) or a regular expression (optional parameter pcre). We will discuss the regular expressions later in this course.

In the case of functions ending with _ANY_OF, the name of parameter sub is a bit misleading. Here, the value of parameter sub is not a substring but rather a list of characters. Instead of searching for the substring, that is, the exact combination of characters, these functions evaluate the individual characters and consider each character a match that is part of the provided list. The functions ending with _ANY_NOT_OF work in a similar way, but here a only those characters are considered a match that are different from all the characters in the list.

Note

Like with the function FIND( ), you can use the optional parameter occ to specify which finding of sub you want to consider.

Watch some examples of the description functions.

Try It Out: Description Functions

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code Snippet
    1234567891011121314151617181920212223242526
    DATA result TYPE i. DATA text TYPE string VALUE ` ABAP `. DATA substring TYPE string VALUE `AB`. DATA offset TYPE i VALUE 1. * Call different description functions ****************************************************************************** * result = strlen( text ). * result = numofchar( text ). result = count( val = text sub = substring off = offset ). * result = find( val = text sub = substring off = offset ). * result = count_any_of( val = text sub = substring off = offset ). * result = find_any_of( val = text sub = substring off = offset ). * result = count_any_not_of( val = text sub = substring off = offset ). * result = find_any_not_of( val = text sub = substring off = offset ). out->write( |Text = `{ text }`| ). out->write( |Substring = `{ substring }` | ). out->write( |Offset = { offset } | ). out->write( |Result = { result } | ).
  3. Select Ctrl + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
  5. Remove the comments in front of the different value assignments of variable result and activate the program again.
  6. Set a break-point and analyze the content of data object result after each assignment. Can you understand the different results?
  7. Play around with the start values for text and substring to get familiar with the functions.

Processing Functions

Watch the following video to learn more about the processing functions.

Hint

A good use-case for SEGMENT( ) is the import and processing of data in a column-separated values format (csv).

Let's have a look at a few examples, to illustrate the result of some string processing functions :

  • In this example the input string contains a mixture of uppercase and lowercase letters. Function TO_UPPER( ) transforms them all to uppercase letters.
  • Function TO_MIXED( ) translates the string into a mixture of uppercase and lowercase letters. It searches for a separator string, removes it, and transforms the first character after the separator to uppercase. All other characters are transformed to lowercase. Note that the parameter sep is optional with the underscore sign (_) as its default value.
  • Function REVERSE( ) returns the characters in reversed order.
  • The first example for the SHIFT_LEFT( ) function specifies parameter places to 2. Therefore, the first two characters (##) are removed from the result.
  • The second example for SHIFT_LEFT( ) specifies parameter circular = 2. Therefore, the first two characters are not deleted but moved to the end.
  • The example of function SUBSTRING( ) extracts a substring of length 4, starting with an offset of 2 characters, which means it returns the characters at positions 3, 4, 5, and 6.
  • Function SUBSTRING_AFTER( ) searches for substring "is" and returns all characters after this finding.
  • Function SUBSTRING_FROM( ) does the same, but the result contains substring "is" as well.
  • The first example of function SEGMENT( ) looks for all occurrences of the separator string, a single underscore. It finds two and splits the input string into three segments. Then it returns the second segment.
  • The second example does the same but returns the third segment. Note that the separator string itself is not part of the result.

Try It Out: Processing Functions

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code Snippet
    12345678910111213141516171819202122232425262728293031323334353637
    DATA text TYPE string VALUE ` SAP BTP, ABAP Environment `. * Change Case of characters ********************************************************************** out->write( |TO_UPPER = { to_upper( text ) } | ). out->write( |TO_LOWER = { to_lower( text ) } | ). out->write( |TO_MIXED = { to_mixed( text ) } | ). out->write( |FROM_MIXED = { from_mixed( text ) } | ). * Change order of characters ********************************************************************** out->write( |REVERSE = { reverse( text ) } | ). out->write( |SHIFT_LEFT (places)= { shift_left( val = text places = 3 ) } | ). out->write( |SHIFT_RIGHT (places)= { shift_right( val = text places = 3 ) } | ). out->write( |SHIFT_LEFT (circ) = { shift_left( val = text circular = 3 ) } | ). out->write( |SHIFT_RIGHT (circ) = { shift_right( val = text circular = 3 ) } | ). * Extract a Substring ********************************************************************** out->write( |SUBSTRING = { substring( val = text off = 4 len = 10 ) } | ). out->write( |SUBSTRING_FROM = { substring_from( val = text sub = 'ABAP' ) } | ). out->write( |SUBSTRING_AFTER = { substring_after( val = text sub = 'ABAP' ) } | ). out->write( |SUBSTRING_TO = { substring_to( val = text sub = 'ABAP' ) } | ). out->write( |SUBSTRING_BEFORE= { substring_before( val = text sub = 'ABAP' ) } | ). * Condense, REPEAT and Segment ********************************************************************** out->write( |CONDENSE = { condense( val = text ) } | ). out->write( |REPEAT = { repeat( val = text occ = 2 ) } | ). out->write( |SEGMENT1 = { segment( val = text sep = ',' index = 1 ) } | ). out->write( |SEGMENT2 = { segment( val = text sep = ',' index = 2 ) } | ).
  3. Select Ctrl + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output. Can you understand the results of the different functions?
  5. Play around with the start values different functions to get familiar with their results.
  6. Play around with the start values for text, substring and the values for the different parameters to get familiar with the functions.

Predicate Functions

Predicate function CONTAINS( ) is true if a specified substring appears at least once in the input string. In other words, if CONTAINS( ) is true, function FIND( ) returns a result larger than 0.

The same relation exists between predicate functions CONTAINS_ANY_OF( ) and CONTAINS_ANY_NOT_OF( ), and the corresponding description functions FIND_ANY_OF and FIND_ANY_NOT_OF( ).

MATCHES( ) is a dedicated predicate function to compare the complete input string to a regular expression. We will look at regular expressions in the next section.

Regular Expressions in ABAP

A regular expression, or Regex for short, is a pattern of literal and special characters that describes a set of character strings. Regular expressions are often used in text searches, in "search-and-replace" operations, or to validate the content of character-like fields. A search using a regular expression is more powerful than a search for a simple character string because the regular expression represents a greater (potentially infinite) number of character strings and searches for them concurrently.

The syntax of regular expressions is widely standardized but there are differences between common standards such as Perl or POSIX and different syntax flavors such as XPath or XSD regular expressions.

In ABAP, the preferred standard is Perl Compatible Regular Expression (PCRE), but ABAP supports also some other standards and flavors.

Note

The examples in this course follow the PCRE standard. For other standards and syntax flavors see the ABAP keyword documentation.

Examples of Regular Expressions

Let's have a look at some examples of regular expressions:

  • The simplest regular expression is a literal.
  • By using a pair of square brackets ( [ ] ) you can specify a set of characters that are allowed at the specific position. The first example specifies a character set that consists of the two individual characters B and S. Therefore, either letter B or S is allowed between the two letters A.
  • The next example uses a hyphen sign (-) to specify a range of allowed characters. B-D defines a character set that includes B, D, and all characters between them in a lexical order. Of course, you can combine individual values and several ranges. [AL-NRX-Z], for example, defines a character set that consists of letters A, L, M, N, R, X, Y, Z.
  • The next example use a ^ sign after the opening bracket to define an exclusion list rather than a positive list. [^LX] excludes characters L and X but allows all other characters.
  • To specify a character set that includes all available characters and excludes none, regular expressions use a single fullstop sign (.). In the example, any character is allowed between the two As, but that character is required and only one such character is allowed.
  • For more flexibility, you can introduce quantifiers. A quantifier is a pair of curly brackets ( {} ). It specifies how often the element on its left has to be repeated. In the example, the 3 in the quantifier means that exactly 3 letter Bs are required between the two As.
  • Instead of one fixed value, a quantifier can specify a lower and upper value for the repetition. In the example, at least one B is required between the two As, but two Bs are also allowed.
  • When you leave away the upper value, you only set a minimum. In the example, there can be any number of letters B between the two As but there must be at least one.
  • With ordinary brackets ( ) you group several elements together. A good use case is a group on the left of a quantifier. Instead of a single character, the entire group is repeated based on the numbers in the quantifier. In the example, the group BA is repeated between letters A and P.
  • Finally, we want to look at the union operator |. Outside of a group it combines two patterns and combines the result set of the two patterns. In the example, we use the union operator inside of a group. On the left of the closing literal AP, two patterns are allowed: Either there is the literal AB or the single letter S.

Note

These examples are rather simple and their purpose is to illustrate some basic concepts. Many other concepts exist and by combining all these concepts you can build extremely powerful expressions. For the complete picture, start with the ABAP documentation from where you can navigate to further sources of information.

Built-In String Functions and Regular Expressions

Many built-in string functions offer a parameter pcre, especially when they perform searches. Pcre is often an alternative for a parameter sub. When you pass a sequence of characters to parameter pcre, instead of parameter sub, the function interprets the input as a Perl Compatible Regular Expression (PCRE) and searches for substrings that match this pattern. If a function has both parameters, sub, and pcre, you can only supply one of them.

Note

In the ABAP documentation you can find more parameters for regular expressions. One of them, parameter regex, was used for POSIX regular expressions. It is obsolete and you should not use it.

For functions like FIND( ) and COUNT( ) it is obvious that they perform searches for substrings. But other functions perform searches too. Examples are predicate function CONTAINS( ) and processing functions REPLACE( ), SUBSTRING_FROM( ), SUBSTRING_AFTER( ), and so on.

There are also built-in string functions that only work with regular expressions. Predicate function MATCHES( ) is true if the complete character string matches the regular expression. MATCH( ) works similar to FIND( ). It searches in a character string for a substring that matches the regular expression. But where FIND( ) returns the offset of the finding, MATCH( ) returns the found substring.

This video illustrates some examples with parameter PCRE.

Note

The matches( ) function is a sharp and elegant tool when you have to implement complicated validations for character-like input fields.

Use String Processing Functions

In this exercise, you use string processing functions to enrich text symbols with the content of variables before you add them to the console output.

Template:

  • /LRN/CL_S4D401_TCS_TEXT_POOL (Global Class)

Solution:

  • /LRN/CL_S4D401_TCS_STRING_PROC (Global Class)

Task 1: Copy Template (Optional)

Copy the template class. If you finished the previous exercise, you can skip this task and continue editing your class ZCL_##_SOLUTION.

Steps

  1. Copy class /LRN/CL_S4D401_TCS_TEXT_POOL to a class in your own package (suggested name: ZCL_##_SOLUTION, where ## stands for your group number).

    1. In the Project Explorer view, right-click class /LRN/CL_S4D401_TCS_TEXT_POOL to open the context menu.

    2. From the context menu, choose Duplicate ....

    3. Enter the name of your package in the Package field. In the Name field, enter the name ZCL_##_SOLUTION, where ## stands for your group number.

    4. Adjust the description and choose Next.

    5. Confirm the transport request and choose Finish.

  2. Activate the copy.

    1. Press Ctrl + F3 to activate the class.

Task 2: Use String Processing Functions

Make sure you are logged on with the original language of your global class. Then adjust the implementation of method GET_DESCRIPTION of local class LCL_PASSENGER_FLIGHT. Replace the literal text in the string templates with translatable text from the text pool. For the first text, use a single text symbol with placeholders. Then use a string processing function to replace the placeholders with the attribute values.

Steps

  1. Display the original language of your ABAP class and make sure you are logged on in the same language.

    1. Open the source code of your global class ZCL_##_SOLUTION.

    2. From the tabs below the editor, choose the Properties tab.

    3. Take note of the Original Language property. It should be set to EN.

    4. In the Project Explorer view, analyze the description of your ABAP cloud project. After the system ID, the logon client, and the user ID, it displays the logon language.

    5. If the logon language does not match the original language of your ABAP class, create a new project with the correct logon language.

  2. Navigate to the implementation of method GET_DESCRIPTION in local class LCL_PASSENGER_FLIGHT.

    1. For example, you can open the Local Types tab in the editor and search for METHOD get_description.

    2. Alternatively, you can expand ZCL_##_SOLUTIONLCL_PASSENGER_FLIGHT in the Outline view on the left and choose GET_DESCRIPTION.

  3. Comment out the first APPEND statement and replace it with an APPEND of a local variable of type STRING (suggested name: txt).

    1. Choose all code rows that belong to the first APPEND statement and press Ctrl + <.

    2. Adjust the code as follows:

      Code Snippet
      123456789
      * APPEND |Flight { carrier_id } { connection_id } on { flight_date DATE = USER } | && * |from { connection_details-airport_from_id } to { connection_details-airport_to_id } | * TO r_result. DATA txt TYPE string. APPEND txt TO r_result.
  4. Before the APPEND statement, fill the data object with a string literal that reflects the previous string template. Use suitable placeholders where the string template contains embedded expressions, for example, &carrid&, &connid&, &date&, &from&, and &to&.

    1. Adjust the code as follows:

      Code Snippet
      1234567
      DATA text TYPE string. txt = 'Flight &carrid& &connid& on &date& from &from& to &to&'. APPEND text TO r_result.
  5. Link the text literal to a new text symbol (suggested ID for the text symbol: 005).

    1. Adjust the code as follows:

      Code Snippet
      123
      txt = 'Flight &carrid& &connid& on &date& from &from& to &to&'(005).
  6. Use a quick fix to create the new text in the text pool with a maximum length of 132.

    1. Place the cursor on (005) and press Ctrl + 1.

    2. Choose Create text 005 in text pool.

    3. Choose the slider to set the Maximum Length to 132 and choose Finish.

  7. Return to the ABAP code. Before the APPEND statement, use a suitable string processing function to replace placeholder &carrid& in string variable txt with the contents of instance attribute carrier_id.

    1. Adjust the code as follows:

      Code Snippet
      12345
      txt = 'Flight &carrid& &connid& on &date& from &from& to &to&'(005). txt = replace( val = txt sub = '&carrid&' with = carrier_id ). APPEND txt TO r_result.
  8. Repeat this for the other placeholders. Do not forget the formatting of the flight date according to the user's preferred date format.

    1. Adjust the code as follows:

      Code Snippet
      12345678910
      txt = 'Flight &carrid& &connid& on &date& from &from& to &to&'(005). txt = replace( val = txt sub = '&carrid&' with = carrier_id ). txt = replace( val = txt sub = '&connid&' with = connection_id ). txt = replace( val = txt sub = '&date&' with = |{ flight_date DATE = USER }| ). txt = replace( val = txt sub = '&from&' with = connection_details-airport_from_id ). txt = replace( val = txt sub = '&to&' with = connection_details-airport_to_id ). APPEND txt TO r_result.
  9. Optional: Replace the remaining literal text in this method with text symbols.

    1. Perform this step as before.

    2. Finally, your code should look similar this:

      Code Snippet
      123456789101112131415161718192021222324252627
      METHOD get_description. * APPEND |Flight { carrier_id } { connection_id } on { flight_date DATE = USER } | && * |from { connection_details-airport_from_id } to { connection_details-airport_to_id } | * TO r_result. DATA txt TYPE string. txt = 'Flight &carrid& &connid& on &date& from &from& to &to&'(005). txt = replace( val = txt sub = '&carrid&' with = carrier_id ). txt = replace( val = txt sub = '&connid&' with = connection_id ). txt = replace( val = txt sub = '&date&' with = |{ flight_date DATE = USER }| ). txt = replace( val = txt sub = '&from&' with = connection_details-airport_from_id ). txt = replace( val = txt sub = '&to&' with = connection_details-airport_to_id ). APPEND txt TO r_result. APPEND |{ 'Planetype:'(006) } { planetype } | TO r_result. APPEND |{ 'Maximum Seats:'(007) } { seats_max } | TO r_result. APPEND |{ 'Occupied Seats:'(008) } { seats_occ } | TO r_result. APPEND |{ 'Free Seats:'(009) } { seats_free } | TO r_result. APPEND |{ 'Ticket Price:'(010) } { price CURRENCY = currency } { currency } | TO r_result. APPEND |{ 'Duration:'(011) } { connection_details-duration } { 'minutes'(012) }| TO r_result. ENDMETHOD.
  10. Activate and test the global class as a console app.

    Note

    You should not see any difference. The improvement is, that the texts are translatable now. However, this has no effect until the system administrator actually creates a translation project. Yet this is beyond the scope of this course.
    1. Press Ctrl + F3.

    2. Press F9.

Log in to track your progress & complete quizzes