C$REGEXP

This routine allows you to search strings using regular expressions. This section includes the following topics:

Usage

CALL "C$REGEXP" 
    USING OP-CODE, parameters
    GIVING return-value

Parameters

Op-codes specify the operation to perform. Each operation is defined in acucobol.def and is described in detail below. Op-codes include:
Code Operation
1 AREGEXP-GET-LEVEL
2 AREGEXP-COMPILE
3 AREGEXP-MATCH
4 AREGEXP-RELEASE-MATCH
5 AREGEXP-RELEASE
6 AREGEXP-NUMGROUPS
7 AREGEXP-GETMATCH
20 AREGEXP-LAST-ERROR

Parameters vary depending on the operation selected. They provide information and hold results.

return-value: Numeric data item.

Unless otherwise noted, each operation returns a value or a status in return-value. Its contents vary by operation and the result of the operation.

Description

This routine allows you to use a regular expression to search a text string.

A regular expression is a formula for matching strings that have a certain pattern. For a complete description of regular expressions, see the POSIX 1003.2 standard appropriate for your platform. Windows platforms use the CAtlRegExp library; UNIX platforms use the POSIX C routines native to the platform.

A simple use of C$REGEXP is outlined in the following steps.

  1. Use the AREGEXP-GET-LEVEL op-code to validate that the host platform provides support for regular expressions.
  2. Validate and compile your regular expression with op-code AREGEXP-COMPILE. Your program should include an error handling routine in the event that the compiler finds an error in the expression.
  3. Use op-code AREGEXP-MATCH to apply a compiled regular expression to a string to search for a match. You may want to do this iteratively to find all matches in the string.
  4. Use op-codes AREGEXP-NUMGROUPS and AREGEXP-GETMATCH to work with subexpression matches.
  5. Manage the memory used by this routine with op-codes AREGEXP-RELEASE-MATCH and AREGEXP-RELEASE.

Op-codes and Parameters

AREGEXP-GET-LEVEL (op-code 1)

This operation indicates whether regular expression support is available on the host. Its usage is:

CALL "C$REGEXP" USING AREGEXP-GET-LEVEL GIVING return-value

The value of return-value can be one of the following (defined in acucobol.def):

AREGEXP-NONE 0 regular expression processing is not available
AREGEXP-WINDOWS    1    Windows regular expressions supported
AREGEXP-POSIX 2 POSIX regular expressions supported

AREGEXP-COMPILE (op-code 2)

This operation compiles a regular expression to ensure that it has a valid form, returning a handle to the compiled regular expression or NULL if there is an error. Its usage is:

CALL "C$REGEXP" USING AREGEXP-COMPILE, reg-expr, flags
                GIVING return-value

reg-expr

Must be a NULL-terminated regular expression. It must be NULL-terminated because trailing spaces are allowed in regular.

flags

(Optional) is the sum of one or more of the following values (defined in acucobol.def):
AREGEXP_COMPILE_IGNORECASE     1 Ignore case when matching patterns. (Windows or UNIX)
AREGEXP_COMPILE_BASIC 2 Change the type of regular expression from extended to basic. (UNIX only) (For an explanation of extended and basic, see the POSIX 1003.2 standard.)
AREGEXP_COMPILE_NO_SPECIAL 4 Treat all characters as ordinary characters with no special meaning. (UNIX only)
AREGEXP_COMPILE_NO_SUB 8 When matching, determine only if there is a match, without returning the offsets of the match. (UNIX only)
AREGEXP_COMPILE_NEWLINE 16    Treat newlines as special (end-of-line marker) characters. (UNIX only)

return-value contains a handle to the compiled expression, or NULL if an error occurred.

AREGEXP-MATCH (op-code 3)

This operation applies a regular expression to a string, and returns a handle. To see if there is a match you need to check match-start. If match-start is 0 there is no match. Its usage is:

CALL "C$REGEXP" USING AREGEXP-MATCH,
   reg-expr-handle, string, length, match-start, match-end
   GIVING return-value
reg-expr-handle is a handle to a regular expression returned by AREGEXP-COMPILE.
string is the string to test for a match.
length is the length of string. If length is zero, the size of string is used.
match-start returns the index of the start of the pattern that matched.
match-end returns one byte beyond the pattern that matched. To test the string for additional matches, start a new AREGEXP-MATCH at the match-end offset.
return-value contains a handle to the match or zero if no match is found or an error occurred.

AREGEXP-RELEASE-MATCH (op-code 4)

This operation frees memory that is allocated when AREGEXP-MATCH is called. Return-value is not used. Its usage is:

CALL "C$REGEXP" USING AREGEXP-RELEASE-MATCH match-handle
match-handle is a handle to a match returned by AREGEXP-MATCH.

AREGEXP-RELEASE (op-code 5)

This operation frees the memory allocated when AREGEXP-COMPILE is called. Return-value is not used. Its usage is:

CALL "C$REGEXP" USING AREGEXP-RELEASE reg-expr-handle
reg-expr-handle is a handle to a regular expression returned by AREGEXP-COMPILE.

AREGEXP-NUMGROUPS (op-code 6)

This operation returns the number of substrings that matched any subgroups in the regular expression. Its usage is:

CALL "C$REGEXP" USING AREGEXP-NUMGROUPS match-handle
                GIVING return-value
match-handle is a handle returned by AREGEXP-MATCH.
return-value returns the number of matches.

Depending on the construction of a regular expression, it is possible for a subgroup of the regular expression to match multiple substrings. This operation reports the number of instances found in the last AREGEXP-MATCH operation. For more information, rules, and examples, see the POSIX 1003.2 documentation or one of the many books available on regular expressions.

AREGEXP-GETMATCH (op-code 7)

This operation returns a set of indices into a string passed to AREGEXP-MATCH that match the subexpression of the regular expression. Its usage is:

CALL "C$REGEXP" 
    USING AREGEXP-GETMATCH, match-handle, group, 
    idx-start, idx-end
    GIVING return-value

The parameters are defined as follows:

match-handle is a handle returned by AREGEXP-MATCH.
group is a number between 1 and the value returned by AREGEXP-NUMGROUPS.
idx-start returns an index into the beginning of the string that matches the subexpression of the regular expression.
idx-end returns an index to the end of the string that matches the subexpression of the regular expression.
return-value returns 1 if the operation succeeds, and zero if there is an error.

AREGEXP-LAST-ERROR (op-code 20)

This operation returns the last error code returned by a call to C$REGEXP. Its usage is:

CALL "C$REGEXP" USING AREGEXP-LAST-ERROR GIVING return-value

The error value is returned in return-value. The possible error values (described in acucobol.def) have the following meanings:

AREGEXP-ERROR-NO-ERROR 0 No error
AREGEXP-ERROR-NO-MEMORY 1 Insufficient memory to handle the request
AREGEXP-ERROR-BRACE-EXPECTED 2 A closing brace is missing
AREGEXP-ERROR-PAREN-EXPECTED 3 A closing parenthesis is missing
AREGEXP-ERROR-BRACKET-EXPECTED    4 A closing bracket is missing
AREGEXP-ERROR-UNEXPECTED 5 An unknown error occurred
AREGEXP-ERROR-EMPTY-RANGE 6 An empty range was given
AREGEXP-ERROR-INVALID-GROUP 7 The group provided was invalid
AREGEXP-ERROR-INVALID-RANGE 8 An invalid range was given
AREGEXP-ERROR-EMPTY-REPEATOP 9 A repeat operator was given on an empty subexpression
AREGEXP-ERROR-INVALID-INPUT 10 The input was invalid
AREGEXP-ERROR-INVALID-HANDLE 11 The handle is not a regular expression handle or a match handle
AREGEXP-ERROR-INVALID-ARGUMENT 12 One of the arguments given is invalid
AREGEXP-ERROR-INVALID-CALL-SEQ 13 The order of C$REGEXP operations is an invalid sequence.
AREGEXP-ERROR-NO-MATCH 14    The regular expression did not find a match in the given string.
Note: If the error code returned does not match a value in the list, it may be that the value is coming from the host's regular expression library. See the documentation for the host's regular expression library.