Processing Strings Using Functions and Regular Expressions

Objectives
After completing this lesson, you will be able to:

After completing this lesson, you will be able to:

  • Describe built-in string functions in ABAP
  • Work with built-in string functions in ABAP
  • Explain the use of regular expressions in ABAP

Built-in String Functions

In addition to string concatenation with the && operator and formatting with string templates, ABAP offers a large variety of built-in string functions. Based on the type of the result, we distinguish between three categories of string functions:

Note
If the input of a built-in string function consists of several parameters, the name of the main input parameter is always val.

Parameters of String Functions

For some built-in string functions, the input consists of just one char-like data object. These functions do not use input parameters. The input is specified directly within the brackets. Description function NUMOFCHAR( ) is a good example for these type of functions.

Functions with more than one input use parameters with names and the input is assigned to these parameters. Many parameters are common ground between different built-in string functions. The most important parameters are the following:

VAL

You pass the text string to be processed by the function to the parameter val. You can also specify functional method calls, table expressions, and constructor expressions whose return value is convertible to the type string. Only elementary data types can be processed. If a character-like data object with a fixed length is specified, any trailing blanks are ignored. Non-character-like return values are converted to the data type string.

SUB

Parameter sub is used to pass a character string whose characters are to be searched for or inserted. Only arguments with elementary types can be specified. If a character-like data object with a fixed length is specified, any trailing blanks are ignored.

CASE

Searches and comparisons in string functions are case-sensitive by default, but this can be overridden if necessary, by using the parameter case. The parameter case requires the input of type abap_bool (C LENGTH 1) with the value of the constants abap_true ('X') or abap_false (' '). If case contains the value of abap_true, the search is case-sensitive; if it contains the value of abap_false, it is not.

OCC

In string functions where searches are performed, the parameter occ specifies the occurrence of a match. If occ is positive, the occurrences are counted from the left; if occ is negative, they are counted from the right. The values 1, 2, .... indicate the first, second, ... occurrences. The values -1, -2, .... indicate the last, second-to-last, ... occurrences. The default value of occ is 1.

Note
Except in the case of the replacement function replace, the value 0 raises exception CX_SY_STRG_PAR_VAL. If replace is used, the value 0 replaces all occurrences.
OFF and LEN

Parameter off is used to pass an offset and parameter len is used to pass a length. In functions where both off and len can be passed, they determine the subarea in which a string is to be processed.

The default value of off is generally 0 and the default value of len is set to the length of the complete string after the offset.

Note
Inappropriate combinations of values can lead to exception CX_SY_RANGE_OUT_OF_BOUNDS.

Let's have a look at some examples to see the impact of these parameters:

Try It Out: Common Parameters

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code snippet
    
        DATA text   TYPE string VALUE `  Let's talk about ABAP  `.
        DATA result TYPE i.
    
        out->write(  text ).
    
        result = find( val = text sub = 'A' ).
    *
    *    result = find( val = text sub = 'A' case = abap_false ).
    *
    *    result = find( val = text sub = 'A' case = abap_false occ =  -1 ).
    *    result = find( val = text sub = 'A' case = abap_false occ =  -2 ).
    *    result = find( val = text sub = 'A' case = abap_false occ =   2 ).
    *
    *    result = find( val = text sub = 'A' case = abap_false occ = 2 off = 10 ).
    *    result = find( val = text sub = 'A' case = abap_false occ = 2 off = 10 len = 4 ).
    
        out->write( |RESULT = { result } | ).
    
    Expand
  3. Select CTRL + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
  5. Remove the comments in front of the different calls of function FIND( ) and activate the program again.
  6. Debug the program and analyze the content of data object result after each assignment. Can you understand the different results?
  7. Play around with the different parameters to get familiar with their meaning.

Description Functions

Length functions

An important group of description functions for strings are the length functions NUMOFCHAR( ) and STRLEN( ). Most of the time, the two functions return the same result. However, there is one exception: If the argument is of type string and contains one or more blanks at the end, the result of STRLEN( ) includes those trailing blanks, whereas NUMOFCHAR( ) ignores them. For arguments with fixed length, for example arguments of TYPE C or N, both functions ignore the blanks at the end.

Groups of search functions

There are two groups of search functions for strings:

Function COUNT( ) and the functions starting with COUNT_...( ) return the total number of occurrences of a search argument. Function FIND( ) and the functions starting with FIND_( ) return the position (offset) of one particular occurrence of a search argument.

Parameter sub in search functions

Functions COUNT( ) and FIND( ) can either search for a substring (optional parameter sub) or a regular expression (optional parameter pcre). We will discuss the regular expressions later in this course.

In the case of functions ending with _ANY_OF, the name of parameter sub is a bit misleading. Here, the value of parameter sub is not a substring but rather a list of characters. Instead of searching for the substring, that is, the exact combination of characters, these functions evaluate the individual characters and consider each character a match that is part of the provided list. The functions ending with _ANY_NOT_OF work in a similar way, but here a only those characters are considered a match that are different from all the characters in the list.

Note
Like with the function FIND( ), you can use the optional parameter occ to specify which finding of sub you want to consider.

Watch some examples of the description functions.

Try It Out: Description Functions

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code snippet
    
     DATA result TYPE i.
    
        DATA text    TYPE string VALUE `  ABAP  `.
        DATA substring TYPE string VALUE `AB`.
        DATA offset    TYPE i      VALUE 1.
    
    * Call different description functions
    ******************************************************************************
    *    result = strlen(     string ).
    *    result = numofchar(  string ).
    
        result = count(             val = text sub = substring off = offset ).
    *    result = find(             val = string sub = substring off = offset ).
    
    *    result = count_any_of(     val = string sub = substring off = offset ).
    *    result = find_any_of(      val = string sub = substring off = offset ).
    
    *    result = count_any_not_of( val = string sub = substring off = offset ).
    *    result = find_any_not_of(  val = string sub = substring off = offset ).
    
        out->write( |Text      = `{ text }`| ).
        out->write( |Substring = `{ substring }` | ).
        out->write( |Offset    = { offset } | ).
        out->write( |Result    = { result } | ).
    
    Expand
  3. Select Ctrl + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output.
  5. Remove the comments in front of the different value assignments of variable result and activate the program again.
  6. Set a break-point and analyze the content of data object result after each assignment. Can you understand the different results?
  7. Play around with the start values for text and substring to get familiar with the functions.

Processing Functions

Hint
A good use-case for SEGMENT( ) is the import and processing of data in a column-separated values format (csv).

Try It Out: Processing Functions

  1. Create a new global class that implements interface IF_OO_ADT_CLASSRUN.
  2. Copy the following code snippet to the implementation part of method if_oo_adt_classrun~main( ):
    Code snippet
    
    DATA text TYPE string      VALUE ` SAP BTP,   ABAP Environment  `.
    
    * Change Case of characters
    **********************************************************************
        out->write( |TO_UPPER         = {   to_upper(  text ) } | ).
        out->write( |TO_LOWER         = {   to_lower(  text ) } | ).
        out->write( |TO_MIXED         = {   to_mixed(  text ) } | ).
        out->write( |FROM_MIXED       = { from_mixed(  text ) } | ).
    
    
    * Change order of characters
    **********************************************************************
        out->write( |REVERSE             = {  reverse( text ) } | ).
        out->write( |SHIFT_LEFT  (places)= {  shift_left(  val = text places   = 3  ) } | ).
        out->write( |SHIFT_RIGHT (places)= {  shift_right( val = text places   = 3  ) } | ).
        out->write( |SHIFT_LEFT  (circ)  = {  shift_left(  val = text circular = 3  ) } | ).
        out->write( |SHIFT_RIGHT (circ)  = {  shift_right( val = text circular = 3  ) } | ).
    
    
    * Extract a Substring
    **********************************************************************
        out->write( |SUBSTRING       = {  substring(        val = text off = 4 len = 10 ) } | ).
        out->write( |SUBSTRING_FROM  = {  substring_from(   val = text sub = 'ABAP'     ) } | ).
        out->write( |SUBSTRING_AFTER = {  substring_after(  val = text sub = 'ABAP'     ) } | ).
        out->write( |SUBSTRING_TO    = {  substring_to(     val = text sub = 'ABAP'     ) } | ).
        out->write( |SUBSTRING_BEFORE= {  substring_before( val = text sub = 'ABAP'     ) } | ).
    
    
    * Condense, REPEAT and Segment
    **********************************************************************
        out->write( |CONDENSE         = {   condense( val = text ) } | ).
        out->write( |REPEAT           = {   repeat(   val = text occ = 2 ) } | ).
    
        out->write( |SEGMENT1         = {   segment(  val = text sep = ',' index = 1 ) } |  ).
        out->write( |SEGMENT2         = {   segment(  val = text sep = ',' index = 2 ) } |  ).
    
    Expand
  3. Select Ctrl + F3 to activate the class and F9 to execute it as a console app.
  4. Analyze the console output. Can you understand the results of the different functions?
  5. Play around with the start values different functions to get familiar with their results.
  6. Play around with the start values for text, substring and the values for the different parameters to get familiar with the functions.

Predicate Functions

Predicate function CONTAINS( ) is true if a specified substring appears at least once in the input string. In other words, if CONTAINS( ) is true, function FIND( ) returns a result larger than 0.

The same relation exists between predicate functions CONTAINS_ANY_OF( ) and CONTAINS_ANY_NOT_OF( ), and the corresponding description functions FIND_ANY_OF and FIND_ANY_NOT_OF( ).

MATCHES( ) is a dedicated predicate function to compare the complete input string to a regular expression. We will look at regular expressions in the next section.

Regular Expressions in ABAP

A regular expression, or Regex for short, is a pattern of literal and special characters that describes a set of character strings. Regular expressions are often used in text searches, in "search-and-replace" operations, or to validate the content of character-like fields. A search using a regular expression is more powerful than a search for a simple character string because the regular expression represents a greater (potentially infinite) number of character strings and searches for them concurrently.

The syntax of regular expressions is widely standardized but there are differences between common standards such as Perl or POSIX and different syntax flavors such as XPath or XSD regular expressions.

In ABAP, the preferred standard is Perl Compatible Regular Expression (PCRE), but ABAP supports also some other standards and flavors.

Note
The examples in this course follow the PCRE standard. For other standards and syntax flavors see the ABAP keyword documentation.

Examples of Regular Expressions

Let's have a look at some examples of regular expressions:

  • The simplest regular expression is a literal.
  • By using a pair of square brackets ( [ ] ) you can specify a set of characters that are allowed at the specific position. The first example specifies a character set that consists of the two individual characters B and S. Therefore, either letter B or S is allowed between the two letters A.
  • The next example uses a hyphen sign (-) to specify a range of allowed characters. B-D defines a character set that includes B, D, and all characters between them in a lexical order. Of course, you can combine individual values and several ranges. [AL-NRX-Z], for example, defines a character set that consists of letters A, L, M, N, R, X, Y, Z.
  • The next example use a ^ sign after the opening bracket to define an exclusion list rather than a positive list. [^LX] excludes characters L and X but allows all other characters.
  • To specify a character set that includes all available characters and excludes none, regular expressions use a single fullstop sign (.). In the example, any character is allowed between the two As, but that character is required and only one such character is allowed.
  • For more flexibility, you can introduce quantifiers. A quantifier is a pair of curly brackets ( {} ). It specifies how often the element on its left has to be repeated. In the example, the 3 in the quantifier means that exactly 3 letter Bs are required between the two As.
  • Instead of one fixed value, a quantifier can specify a lower and upper value for the repetition. In the example, at least one B is required between the two As, but two Bs are also allowed.
  • When you leave away the upper value, you only set a minimum. In the example, there can be any number of letters B between the two As but there must be at least one.
  • With ordinary brackets ( ) you group several elements together. A good use case is a group on the left of a quantifier. Instead of a single character, the entire group is repeated based on the numbers in the quantifier. In the example, the group BA is repeated between letters A and P.
  • Finally, we want to look at the union operator |. Outside of a group it combines two patterns and combines the result set of the two patterns. In the example, we use the union operator inside of a group. On the left of the closing literal AP, two patterns are allowed: Either there is the literal AB or the single letter S.
Note
These examples are rather simple and their purpose is to illustrate some basic concepts. Many other concepts exist and by combining all these concepts you can build extremely powerful expressions. For the complete picture, start with the ABAP documentation from where you can navigate to further sources of information.

Built-In String Functions and Regular Expressions

Many built-in string functions offer a parameter pcre, especially when they perform searches. Pcre is often an alternative for a parameter sub. When you pass a sequence of characters to parameter pcre, instead of parameter sub, the function interprets the input as a Perl Compatible Regular Expression (PCRE) and searches for substrings that match this pattern. If a function has both parameters, sub, and pcre, you can only supply one of them.

Note
In the ABAP documentation you can find more parameters for regular expressions. One of them, parameter regex, was used for POSIX regular expressions. It is obsolete and you should not use it.

For functions like FIND( ) and COUNT( ) it is obvious that they perform searches for substrings. But other functions perform searches too. Examples are predicate function CONTAINS( ) and processing functions REPLACE( ), SUBSTRING_FROM( ), SUBSTRING_AFTER( ), and so on.

There are also built-in string functions that only work with regular expressions. Predicate function MATCHES( ) is true if the complete character string matches the regular expression. MATCH( ) works similar to FIND( ). It searches in a character string for a substring that matches the regular expression. But where FIND( ) returns the offset of the finding, MATCH( ) returns the found substring.

This video illustrates some examples with parameter PCRE.

Note
The matches( ) function is a sharp and elegant tool when you have to implement complicated validations for character-like input fields.

Log in to track your progress & complete quizzes