NAME

       Tcl_RegExpMatch,     Tcl_RegExpCompile,    Tcl_RegExpExec,
       Tcl_RegExpRange, Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,
       Tcl_RegExpExecObj,  Tcl_RegExpGetInfo  -  Pattern matching
       with regular expressions


SYNOPSIS

       #include <tcl.h>

       int
       Tcl_RegExpMatchObj(interp, strObj, patObj)

       int
       Tcl_RegExpMatch(interp, string, pattern)

       Tcl_RegExp
       Tcl_RegExpCompile(interp, pattern)

       int
       Tcl_RegExpExec(interp, regexp, string, start)

       Tcl_RegExpRange(regexp, index, startPtr, endPtr)

       Tcl_RegExp                                                 |
       Tcl_GetRegExpFromObj(interp, patObj, cflags)               |

       int                                                        |
       Tcl_RegExpExecObj(interp, regexp, objPtr, offset, nmatches, eflags)|

       Tcl_RegExpGetInfo(regexp, infoPtr)                         |



ARGUMENTS

       Tcl_Interp   *interp   (in)      Tcl  interpreter  to  use
                                        for error reporting.  The
                                        interpreter may  be  NULL
                                        if  no error reporting is
                                        desired.                  |

       Tcl_Obj      *strObj   (in/out)                                   ||
                                        Refers to the object from |
                                        which to get  the  string |
                                        to  search.  The internal |
                                        representation   of   the |
                                        object  may  be converted |
                                        to a  form  that  can  be |
                                        efficiently searched.     |

       Tcl_Obj      *patObj   (in/out)                                   ||
                                        Refers to the object from |
                                        which  to  get  a regular |
                                        cached in the object.

       char         *string   (in)      String  to  check  for  a
                                        match   with   a  regular
                                        expression.

       char         *pattern  (in)      String in the form  of  a
                                        regular  expression  pat­
                                        tern.

       Tcl_RegExp   regexp    (in)      Compiled regular  expres­
                                        sion.    Must  have  been
                                        returned  previously   by
                                        Tcl_GetRegExpFromObj   or
                                        Tcl_RegExpCompile.

       char         *start    (in)      If string is just a  por­
                                        tion    of   some   other
                                        string,   this   argument
                                        identifies  the beginning
                                        of the larger string.  If
                                        it   isn't  the  same  as
                                        string, then no ^ matches
                                        will be allowed.

       int          index     (in)      Specifies  which range is
                                        desired:   0  means   the
                                        range   of   the   entire
                                        match, 1 or greater means
                                        the  range that matched a
                                        parenthesized sub-expres­
                                        sion.

       char         **startPtr(out)     The  address of the first
                                        character in the range is
                                        stored  here,  or NULL if
                                        there is no such range.

       char         **endPtr  (out)     The address of the  char­
                                        acter just after the last
                                        one  in  the   range   is
                                        stored  here,  or NULL if
                                        there is no such range.   |

       int          cflags    (in)                                       ||
                                        OR-ed combination of com­ |
                                        pilation flags. See below |
                                        for more information.     |

       Tcl_Obj      *objPtr   (in/out)                                   ||
                                        An object which  contains |
                                        the string to check for a |

       int          offset    (in)                                       ||
                                        The character offset into |
                                        the string where matching |
                                        should begin.  The  value |
                                        of   the  offset  has  no |
                                        impact  on   ^   matches. |
                                        This   behavior  is  con­ |
                                        trolled by eflags.        |

       int          nmatches  (in)                                       ||
                                        The  number  of  matching |
                                        subexpressions       that |
                                        should  be remembered for |
                                        later use.  If this value |
                                        is  0, then no subexpres­ |
                                        sion  match   information |
                                        will be computed.  If the |
                                        value is -1, then all  of |
                                        the  matching  subexpres­ |
                                        sions will be remembered. |
                                        Any  other  value will be |
                                        taken as the maximum num­ |
                                        ber  of subexpressions to |
                                        remember.                 |

       int          eflags    (in)                                       ||
                                        OR-ed  combination of the |
                                        values TCL_REG_NOTBOL and |
                                        TCL_REG_NOTEOL.       See |
                                        below for  more  informa­ |
                                        tion.                     |

       Tcl_RegExpInfo         *infoPtr(out)                              ||
                                        The address of the  loca­ |
                                        tion   where  information |
                                        about  a  previous  match |
                                        should   be   stored   by |
                                        Tcl_RegExpGetInfo.
_________________________________________________________________



DESCRIPTION

       Tcl_RegExpMatch determines whether  its  pattern  argument
       matches  regexp,  where regexp is interpreted as a regular
       expression using the  rules  in  the  re_syntax  reference
       page.  If there is a match then Tcl_RegExpMatch returns 1.
       If there is no match then Tcl_RegExpMatch returns  0.   If
       an  error  occurs in the matching process (e.g. pattern is
       not  a  valid  regular  expression)  then  Tcl_RegExpMatch
       returns  -1 and leaves an error message in the interpreter
       result.  Tcl_RegExpMatchObj is similar to  Tcl_RegExpMatch |
       more  efficient  than  Tcl_RegExpMatch,  so it is the pre­ |
       ferred interface.

       Tcl_RegExpCompile,  Tcl_RegExpExec,  and   Tcl_RegExpRange
       provide  lower-level access to the regular expression pat­
       tern  matcher.   Tcl_RegExpCompile  compiles   a   regular
       expression  string  into  the internal form used for effi­
       cient pattern matching.  The return value is a  token  for
       this  compiled form, which can be used in subsequent calls
       to Tcl_RegExpExec or Tcl_RegExpRange.  If an error  occurs
       while compiling the regular expression then Tcl_RegExpCom­
       pile returns NULL and  leaves  an  error  message  in  the
       interpreter result.  Note:  the return value from Tcl_Reg­
       ExpCompile is only valid up to the next call to Tcl_RegEx­
       pCompile;   it is not safe to retain these values for long
       periods of time.

       Tcl_RegExpExec executes  the  regular  expression  pattern
       matcher.  It returns 1 if string contains a range of char­
       acters that match regexp, 0 if no match is found,  and  -1
       if  an  error occurs.  In the case of an error, Tcl_RegEx­
       pExec leaves an error message in the  interpreter  result.
       When searching a string for multiple matches of a pattern,
       it is important to distinguish between the  start  of  the
       original  string and the start of the current search.  For
       example, when searching for the  second  occurrence  of  a
       match,  the  string  argument might point to the character
       just after the first match;  however, it is important  for
       the  pattern matcher to know that this is not the start of
       the entire string, so that it doesn't allow ^ atoms in the
       pattern to match.  The start argument provides this infor­
       mation by pointing to the start of the overall string con­
       taining  string.   Start  will  be  less  than or equal to
       string;  if it is less than string then no ^ matches  will
       be allowed.

       Tcl_RegExpRange   may   be  invoked  after  Tcl_RegExpExec
       returns;  it  provides  detailed  information  about  what
       ranges  of  the  string matched what parts of the pattern.
       Tcl_RegExpRange returns a pair of  pointers  in  *startPtr
       and  *endPtr  that  identify  a range of characters in the
       source string for the most recent call to  Tcl_RegExpExec.
       Index  indicates  which  of  several ranges is desired: if
       index is 0, information  is  returned  about  the  overall
       range of characters that matched the entire pattern;  oth­
       erwise, information is returned about the range of charac­
       ters that matched the index'th parenthesized subexpression
       within the pattern.  If there is no range corresponding to
       index then NULL is stored in *firstPtr and *lastPtr.

       Tcl_GetRegExpFromObj,   Tcl_RegExpExecObj,   and  Tcl_Reg­ |
       ExpGetInfo are object interfaces  that  provide  the  most |
       execution options directly, it is recommended that you use |
       these interfaces instead of calling  the  internal  regexp |
       functions.   These interfaces handle the details of UTF to |
       Unicode translations as well as providing improved perfor­ |
       mance through caching in the pattern and string objects.   |

       Tcl_GetRegExpFromObj  attepts to return a compiled regular |
       expression from  the  patObj.   If  the  object  does  not |
       already  contain  a  compiled  regular  expression it will |
       attempt to create one from the string in  the  object  and |
       assign  it  to  the internal representation of the patObj. |
       The return value of this function is of  type  Tcl_RegExp. |
       The  return value is a token for this compiled form, which |
       can be used in subsequent calls  to  Tcl_RegExpExecObj  or |
       Tcl_RegExpGetInfo.  If an error occurs while compiling the |
       regular expression then Tcl_GetRegExpFromObj returns  NULL |
       and  leaves  an  error  message in the interpreter result. |
       The regular expression token can be used as  long  as  the |
       internal  representation  of patObj refers to the compiled |
       form.  The eflags argument is a bitwise OR of zero or more |
       of  the  following  flags  that control the compilation of |
       patObj:                                                    |

         TCL_REG_ADVANCED                                                  ||
                Compile  advanced  regular  expressions (`AREs'). |
                This  mode  corresponds  to  the  normal  regular |
                expression  syntax accepted by the Tcl regexp and |
                regsub commands.                                  |

         TCL_REG_EXTENDED                                                  ||
                Compile  extended  regular  expressions (`EREs'). |
                This mode corresponds to the  regular  expression |
                syntax  recognized  by  Tcl  8.0 and earlier ver­ |
                sions.                                            |

         TCL_REG_BASIC                                                     ||
                Compile basic regular expressions (`BREs').  This |
                mode corresponds to the regular expression syntax |
                recognized  by common Unix utilities like sed and |
                grep.  This is the default if no flags are speci­ |
                fied.                                             |

         TCL_REG_EXPANDED                                                  ||
                Compile the regular expression (basic,  extended, |
                or advanced) using an expanded syntax that allows |
                comments and whitespace.  This mode  causes  non- |
                backslashed  non-bracket-expression  white  space |
                and #-to-end-of-line comments to be ignored.      |

         TCL_REG_QUOTE                                                     ||
                Compile  a  literal  string,  with all characters |
                treated as ordinary characters.                   |
                Compile  for  matching  that  ignores upper/lower |
                case distinctions.                                |

         TCL_REG_NEWLINE                                                   ||
                Compile   for   newline-sensitive  matching.   By |
                default, newline is a completely ordinary charac­ |
                ter  with  no  special  meaning in either regular |
                expressions or strings.   With  this  flag,  `[^' |
                bracket  expressions and `.' never match newline, |
                `^' matches an empty string after any newline  in |
                addition  to its normal function, and `$' matches |
                an empty string before any newline in addition to |
                its  normal function.  REG_NEWLINE is the bitwise |
                OR of REG_NLSTOP and REG_NLANCH.                  |

         TCL_REG_NLSTOP                                                    ||
                Compile  for  partial newline-sensitive matching, |
                with the behavior of `[^' bracket expressions and |
                `.'  affected,  but  not  the behavior of `^' and |
                `$'.  In this mode, `[^' bracket expressions  and |
                `.' never match newline.                          |

         TCL_REG_NLANCH                                                    ||
                Compile  for  inverse  partial  newline-sensitive |
                matching,  with  the  behavior  of of `^' and `$' |
                (the ``anchors'') affected, but not the  behavior |
                of  `[^'  bracket  expressions  and `.'.  In this |
                mode `^' matches an empty string after  any  new­ |
                line  in addition to its normal function, and `$' |
                matches an empty string  before  any  newline  in |
                addition to its normal function.                  |

         TCL_REG_NOSUB                                                     ||
                Compile for matching that reports only success or |
                failure, not what was matched.  This reduces com­ |
                pile overhead and may improve performance.   Sub­ |
                sequent  calls to Tcl_RegExpGetInfo or Tcl_RegEx­ |
                pRange will not report any match information.     |

         TCL_REG_CANMATCH                                                  ||
                Compile  for  matching that reports the potential |
                to complete a partial match given more text  (see |
                below).                                           |

       Only    one    of    TCL_REG_EXTENDED,   TCL_REG_ADVANCED, |
       TCL_REG_BASIC, and TCL_REG_QUOTE may be specified.         |

       Tcl_RegExpExecObj executes the regular expression  pattern |
       matcher.  It returns 1 if objPtr contains a range of char­ |
       acters that match regexp, 0 if no match is found,  and  -1 |
       if  an  error occurs.  In the case of an error, Tcl_RegEx­ |
       pExecObj  leaves  an  error  message  in  the  interpreter |
       then no subexpression match information is recorded, which |
       may allow the matcher to make various  optimizations.   If |
       the  value  is  -1,  then all of the subexpressions in the |
       pattern are remembered.  If the value is a positive  inte­ |
       ger,  then  only  that  number  of  subexpressions will be |
       remembered.  Matching  begins  at  the  specified  Unicode |
       character  index  given by offset.  Unlike Tcl_RegExpExec, |
       the behavior of anchors is  not  affected  by  the  offset |
       value.   Instead the behavior of the anchors is explicitly |
       controlled by the eflags argument, which is a  bitwise  OR |
       of zero or more of the following flags:                    |

         TCL_REG_NOTBOL                                                    ||
                The starting character will not be treated as the |
                beginning  of  a  line  or  the  beginning of the |
                string, so `^' will not match there.   Note  that |
                this flag has no effect on how `\A' matches.      |

         TCL_REG_NOTEOL                                                    ||
                The last character in  the  string  will  not  be |
                treated  as  the  end of a line or the end of the |
                string, so '$' will not match there.   Note  that |
                this flag has no effect on how `\Z' matches.      |

       Tcl_RegExpGetInfo  retrieves  information  about  the last |
       match performed with a given  regular  expression  regexp. |
       The  infoPtr  argument  contains  a pointer to a structure |
       that is defined as follows:                                |

              typedef struct Tcl_RegExpInfo {                     |
                int nsubs;                                        |
                Tcl_RegExpIndices *matches;                       |
                long extendStart;                                 |
              } Tcl_RegExpInfo;                                   |

       The nsubs field contains a count of the number  of  paren­ |
       thesized subexpressions within the regular expression.  If |
       the TCL_REG_NOSUB was used, then this value will be  zero. |
       The  matches field points to an array of nsubs values that |
       indicate the bounds of each  subexpression  matched.   The |
       first  element in the array refers to the range matched by |
       the entire regular  expression,  and  subsequent  elements |
       refer  to  the  parenthesized  subexpressions in the order |
       that they appear in the pattern.  Each element is a struc­ |
       ture that is defined as follows:                           |

              typedef struct Tcl_RegExpIndices {                  |
                long start;                                       |
                long end;                                         |
              } Tcl_RegExpIndices;                                |

       The  start  and  end  values are Unicode character indices |
       character of the matched  subexpression.   The  end  index |
       identifies  the  first  character after the matched subex­ |
       pression.  If the subexpression matched the empty  string, |
       then  start  and  end will be equal.  If the subexpression |
       did not participate in the match, then start and end  will |
       be set to -1.                                              |

       The extendStart field in Tcl_RegExpInfo is only set if the |
       TCL_REG_CANMATCH flag was used.  It  indicates  the  first |
       character  in  the string where a match could occur.  If a |
       match was found, this will be the same as the beginning of |
       the  current  match.  If no match was found, then it indi­ |
       cates the earliest point at which a match might  occur  if |
       additional text is appended to the string.


SEE ALSO

       re_syntax(n)


KEYWORDS

       match, pattern, regular expression, string, subexpression,
       Tcl_RegExpIndices, Tcl_RegExpInfo