[ Index ]

PHP Cross Reference of Unnamed Project

title

Body

[close]

/se3-unattended/var/se3/unattended/install/linuxaux/opt/perl/lib/5.10.0/pod/ -> perlreapi.pod (source)

   1  =head1 NAME
   2  
   3  perlreapi - perl regular expression plugin interface
   4  
   5  =head1 DESCRIPTION
   6  
   7  As of Perl 5.9.5 there is a new interface for plugging and using other
   8  regular expression engines than the default one.
   9  
  10  Each engine is supposed to provide access to a constant structure of the
  11  following format:
  12  
  13      typedef struct regexp_engine {
  14          REGEXP* (*comp) (pTHX_ const SV * const pattern, const U32 flags);
  15          I32     (*exec) (pTHX_ REGEXP * const rx, char* stringarg, char* strend,
  16                           char* strbeg, I32 minend, SV* screamer,
  17                           void* data, U32 flags);
  18          char*   (*intuit) (pTHX_ REGEXP * const rx, SV *sv, char *strpos,
  19                             char *strend, U32 flags,
  20                             struct re_scream_pos_data_s *data);
  21          SV*     (*checkstr) (pTHX_ REGEXP * const rx);
  22          void    (*free) (pTHX_ REGEXP * const rx);
  23          void    (*numbered_buff_FETCH) (pTHX_ REGEXP * const rx, const I32 paren,
  24                                   SV * const sv);
  25          void    (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren,
  26                                         SV const * const value);
  27          I32     (*numbered_buff_LENGTH) (pTHX_ REGEXP * const rx, const SV * const sv,
  28                                          const I32 paren);
  29          SV*     (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
  30                                 SV * const value, U32 flags);
  31          SV*     (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey,
  32                                      const U32 flags);
  33          SV*     (*qr_package)(pTHX_ REGEXP * const rx);
  34      #ifdef USE_ITHREADS
  35          void*   (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
  36      #endif
  37  
  38  When a regexp is compiled, its C<engine> field is then set to point at
  39  the appropriate structure, so that when it needs to be used Perl can find
  40  the right routines to do so.
  41  
  42  In order to install a new regexp handler, C<$^H{regcomp}> is set
  43  to an integer which (when casted appropriately) resolves to one of these
  44  structures. When compiling, the C<comp> method is executed, and the
  45  resulting regexp structure's engine field is expected to point back at
  46  the same structure.
  47  
  48  The pTHX_ symbol in the definition is a macro used by perl under threading
  49  to provide an extra argument to the routine holding a pointer back to
  50  the interpreter that is executing the regexp. So under threading all
  51  routines get an extra argument.
  52  
  53  =head1 Callbacks
  54  
  55  =head2 comp
  56  
  57      REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags);
  58  
  59  Compile the pattern stored in C<pattern> using the given C<flags> and
  60  return a pointer to a prepared C<REGEXP> structure that can perform
  61  the match. See L</The REGEXP structure> below for an explanation of
  62  the individual fields in the REGEXP struct.
  63  
  64  The C<pattern> parameter is the scalar that was used as the
  65  pattern. previous versions of perl would pass two C<char*> indicating
  66  the start and end of the stringified pattern, the following snippet can
  67  be used to get the old parameters:
  68  
  69      STRLEN plen;
  70      char*  exp = SvPV(pattern, plen);
  71      char* xend = exp + plen;
  72  
  73  Since any scalar can be passed as a pattern it's possible to implement
  74  an engine that does something with an array (C<< "ook" =~ [ qw/ eek
  75  hlagh / ] >>) or with the non-stringified form of a compiled regular
  76  expression (C<< "ook" =~ qr/eek/ >>). perl's own engine will always
  77  stringify everything using the snippet above but that doesn't mean
  78  other engines have to.
  79  
  80  The C<flags> parameter is a bitfield which indicates which of the
  81  C<msixp> flags the regex was compiled with. It also contains
  82  additional info such as whether C<use locale> is in effect.
  83  
  84  The C<eogc> flags are stripped out before being passed to the comp
  85  routine. The regex engine does not need to know whether any of these
  86  are set as those flags should only affect what perl does with the
  87  pattern and its match variables, not how it gets compiled and
  88  executed.
  89  
  90  By the time the comp callback is called, some of these flags have
  91  already had effect (noted below where applicable). However most of
  92  their effect occurs after the comp callback has run in routines that
  93  read the C<< rx->extflags >> field which it populates.
  94  
  95  In general the flags should be preserved in C<< rx->extflags >> after
  96  compilation, although the regex engine might want to add or delete
  97  some of them to invoke or disable some special behavior in perl. The
  98  flags along with any special behavior they cause are documented below:
  99  
 100  The pattern modifiers:
 101  
 102  =over 4
 103  
 104  =item C</m> - RXf_PMf_MULTILINE
 105  
 106  If this is in C<< rx->extflags >> it will be passed to
 107  C<Perl_fbm_instr> by C<pp_split> which will treat the subject string
 108  as a multi-line string.
 109  
 110  =item C</s> - RXf_PMf_SINGLELINE
 111  
 112  =item C</i> - RXf_PMf_FOLD
 113  
 114  =item C</x> - RXf_PMf_EXTENDED
 115  
 116  If present on a regex C<#> comments will be handled differently by the
 117  tokenizer in some cases.
 118  
 119  TODO: Document those cases.
 120  
 121  =item C</p> - RXf_PMf_KEEPCOPY
 122  
 123  =back
 124  
 125  Additional flags:
 126  
 127  =over 4
 128  
 129  =item RXf_PMf_LOCALE
 130  
 131  Set if C<use locale> is in effect. If present in C<< rx->extflags >>
 132  C<split> will use the locale dependent definition of whitespace under
 133  when RXf_SKIPWHITE or RXf_WHITE are in effect. Under ASCII whitespace
 134  is defined as per L<isSPACE|perlapi/ISSPACE>, and by the internal
 135  macros C<is_utf8_space> under UTF-8 and C<isSPACE_LC> under C<use
 136  locale>.
 137  
 138  =item RXf_UTF8
 139  
 140  Set if the pattern is L<SvUTF8()|perlapi/SvUTF8>, set by Perl_pmruntime.
 141  
 142  A regex engine may want to set or disable this flag during
 143  compilation. The perl engine for instance may upgrade non-UTF-8
 144  strings to UTF-8 if the pattern includes constructs such as C<\x{...}>
 145  that can only match Unicode values.
 146  
 147  =item RXf_SPLIT
 148  
 149  If C<split> is invoked as C<split ' '> or with no arguments (which
 150  really means C<split(' ', $_)>, see L<split|perlfunc/split>), perl will
 151  set this flag. The regex engine can then check for it and set the
 152  SKIPWHITE and WHITE extflags. To do this the perl engine does:
 153  
 154      if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ')
 155          r->extflags |= (RXf_SKIPWHITE|RXf_WHITE);
 156  
 157  =back
 158  
 159  These flags can be set during compilation to enable optimizations in
 160  the C<split> operator.
 161  
 162  =over 4
 163  
 164  =item RXf_SKIPWHITE
 165  
 166  If the flag is present in C<< rx->extflags >> C<split> will delete
 167  whitespace from the start of the subject string before it's operated
 168  on. What is considered whitespace depends on whether the subject is a
 169  UTF-8 string and whether the C<RXf_PMf_LOCALE> flag is set.
 170  
 171  If RXf_WHITE is set in addition to this flag C<split> will behave like
 172  C<split " "> under the perl engine.
 173  
 174  =item RXf_START_ONLY
 175  
 176  Tells the split operator to split the target string on newlines
 177  (C<\n>) without invoking the regex engine.
 178  
 179  Perl's engine sets this if the pattern is C</^/> (C<plen == 1 && *exp
 180  == '^'>), even under C</^/s>, see L<split|perlfunc>. Of course a
 181  different regex engine might want to use the same optimizations
 182  with a different syntax.
 183  
 184  =item RXf_WHITE
 185  
 186  Tells the split operator to split the target string on whitespace
 187  without invoking the regex engine. The definition of whitespace varies
 188  depending on whether the target string is a UTF-8 string and on
 189  whether RXf_PMf_LOCALE is set.
 190  
 191  Perl's engine sets this flag if the pattern is C<\s+>.
 192  
 193  =item RXf_NULL
 194  
 195  Tells the split operator to split the target string on
 196  characters. The definition of character varies depending on whether
 197  the target string is a UTF-8 string.
 198  
 199  Perl's engine sets this flag on empty patterns, this optimization
 200  makes C<split //> much faster than it would otherwise be. It's even
 201  faster than C<unpack>.
 202  
 203  =back
 204  
 205  =head2 exec
 206  
 207      I32 exec(pTHX_ REGEXP * const rx,
 208               char *stringarg, char* strend, char* strbeg,
 209               I32 minend, SV* screamer,
 210               void* data, U32 flags);
 211  
 212  Execute a regexp.
 213  
 214  =head2 intuit
 215  
 216      char* intuit(pTHX_ REGEXP * const rx,
 217                    SV *sv, char *strpos, char *strend,
 218                    const U32 flags, struct re_scream_pos_data_s *data);
 219  
 220  Find the start position where a regex match should be attempted,
 221  or possibly whether the regex engine should not be run because the
 222  pattern can't match. This is called as appropriate by the core
 223  depending on the values of the extflags member of the regexp
 224  structure.
 225  
 226  =head2 checkstr
 227  
 228      SV*    checkstr(pTHX_ REGEXP * const rx);
 229  
 230  Return a SV containing a string that must appear in the pattern. Used
 231  by C<split> for optimising matches.
 232  
 233  =head2 free
 234  
 235      void free(pTHX_ REGEXP * const rx);
 236  
 237  Called by perl when it is freeing a regexp pattern so that the engine
 238  can release any resources pointed to by the C<pprivate> member of the
 239  regexp structure. This is only responsible for freeing private data;
 240  perl will handle releasing anything else contained in the regexp structure.
 241  
 242  =head2 Numbered capture callbacks
 243  
 244  Called to get/set the value of C<$`>, C<$'>, C<$&> and their named
 245  equivalents, ${^PREMATCH}, ${^POSTMATCH} and $^{MATCH}, as well as the
 246  numbered capture buffers (C<$1>, C<$2>, ...).
 247  
 248  The C<paren> parameter will be C<-2> for C<$`>, C<-1> for C<$'>, C<0>
 249  for C<$&>, C<1> for C<$1> and so forth.
 250  
 251  The names have been chosen by analogy with L<Tie::Scalar> methods
 252  names with an additional B<LENGTH> callback for efficiency. However
 253  named capture variables are currently not tied internally but
 254  implemented via magic.
 255  
 256  =head3 numbered_buff_FETCH
 257  
 258      void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren,
 259                               SV * const sv);
 260  
 261  Fetch a specified numbered capture. C<sv> should be set to the scalar
 262  to return, the scalar is passed as an argument rather than being
 263  returned from the function because when it's called perl already has a
 264  scalar to store the value, creating another one would be
 265  redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and
 266  friends, see L<perlapi>.
 267  
 268  This callback is where perl untaints its own capture variables under
 269  taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch>
 270  function in F<regcomp.c> for how to untaint capture variables if
 271  that's something you'd like your engine to do as well.
 272  
 273  =head3 numbered_buff_STORE
 274  
 275      void    (*numbered_buff_STORE) (pTHX_ REGEXP * const rx, const I32 paren,
 276                                      SV const * const value);
 277  
 278  Set the value of a numbered capture variable. C<value> is the scalar
 279  that is to be used as the new value. It's up to the engine to make
 280  sure this is used as the new value (or reject it).
 281  
 282  Example:
 283  
 284      if ("ook" =~ /(o*)/) {
 285          # `paren' will be `1' and `value' will be `ee'
 286          $1 =~ tr/o/e/;
 287      }
 288  
 289  Perl's own engine will croak on any attempt to modify the capture
 290  variables, to do this in another engine use the following callback
 291  (copied from C<Perl_reg_numbered_buff_store>):
 292  
 293      void
 294      Example_reg_numbered_buff_store(pTHX_ REGEXP * const rx, const I32 paren,
 295                                      SV const * const value)
 296      {
 297          PERL_UNUSED_ARG(rx);
 298          PERL_UNUSED_ARG(paren);
 299          PERL_UNUSED_ARG(value);
 300  
 301          if (!PL_localizing)
 302              Perl_croak(aTHX_ PL_no_modify);
 303      }
 304  
 305  Actually perl will not I<always> croak in a statement that looks
 306  like it would modify a numbered capture variable. This is because the
 307  STORE callback will not be called if perl can determine that it
 308  doesn't have to modify the value. This is exactly how tied variables
 309  behave in the same situation:
 310  
 311      package CaptureVar;
 312      use base 'Tie::Scalar';
 313  
 314      sub TIESCALAR { bless [] }
 315      sub FETCH { undef }
 316      sub STORE { die "This doesn't get called" }
 317  
 318      package main;
 319  
 320      tie my $sv => "CatptureVar";
 321      $sv =~ y/a/b/;
 322  
 323  Because C<$sv> is C<undef> when the C<y///> operator is applied to it
 324  the transliteration won't actually execute and the program won't
 325  C<die>. This is different to how 5.8 and earlier versions behaved
 326  since the capture variables were READONLY variables then, now they'll
 327  just die when assigned to in the default engine.
 328  
 329  =head3 numbered_buff_LENGTH
 330  
 331      I32 numbered_buff_LENGTH (pTHX_ REGEXP * const rx, const SV * const sv,
 332                                const I32 paren);
 333  
 334  Get the C<length> of a capture variable. There's a special callback
 335  for this so that perl doesn't have to do a FETCH and run C<length> on
 336  the result, since the length is (in perl's case) known from an offset
 337  stored in C<<rx->offs> this is much more efficient:
 338  
 339      I32 s1  = rx->offs[paren].start;
 340      I32 s2  = rx->offs[paren].end;
 341      I32 len = t1 - s1;
 342  
 343  This is a little bit more complex in the case of UTF-8, see what
 344  C<Perl_reg_numbered_buff_length> does with
 345  L<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>.
 346  
 347  =head2 Named capture callbacks
 348  
 349  Called to get/set the value of C<%+> and C<%-> as well as by some
 350  utility functions in L<re>.
 351  
 352  There are two callbacks, C<named_buff> is called in all the cases the
 353  FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR L<Tie::Hash> callbacks
 354  would be on changes to C<%+> and C<%-> and C<named_buff_iter> in the
 355  same cases as FIRSTKEY and NEXTKEY.
 356  
 357  The C<flags> parameter can be used to determine which of these
 358  operations the callbacks should respond to, the following flags are
 359  currently defined:
 360  
 361  Which L<Tie::Hash> operation is being performed from the Perl level on
 362  C<%+> or C<%+>, if any:
 363  
 364      RXapif_FETCH
 365      RXapif_STORE
 366      RXapif_DELETE
 367      RXapif_CLEAR
 368      RXapif_EXISTS
 369      RXapif_SCALAR
 370      RXapif_FIRSTKEY
 371      RXapif_NEXTKEY
 372  
 373  Whether C<%+> or C<%-> is being operated on, if any.
 374  
 375      RXapif_ONE /* %+ */
 376      RXapif_ALL /* %- */
 377  
 378  Whether this is being called as C<re::regname>, C<re::regnames> or
 379  C<re::regnames_count>, if any. The first two will be combined with
 380  C<RXapif_ONE> or C<RXapif_ALL>.
 381  
 382      RXapif_REGNAME
 383      RXapif_REGNAMES
 384      RXapif_REGNAMES_COUNT
 385  
 386  Internally C<%+> and C<%-> are implemented with a real tied interface
 387  via L<Tie::Hash::NamedCapture>. The methods in that package will call
 388  back into these functions. However the usage of
 389  L<Tie::Hash::NamedCapture> for this purpose might change in future
 390  releases. For instance this might be implemented by magic instead
 391  (would need an extension to mgvtbl).
 392  
 393  =head3 named_buff
 394  
 395      SV*     (*named_buff) (pTHX_ REGEXP * const rx, SV * const key,
 396                             SV * const value, U32 flags);
 397  
 398  =head3 named_buff_iter
 399  
 400      SV*     (*named_buff_iter) (pTHX_ REGEXP * const rx, const SV * const lastkey,
 401                                  const U32 flags);
 402  
 403  =head2 qr_package
 404  
 405      SV* qr_package(pTHX_ REGEXP * const rx);
 406  
 407  The package the qr// magic object is blessed into (as seen by C<ref
 408  qr//>). It is recommended that engines change this to their package
 409  name for identification regardless of whether they implement methods
 410  on the object.
 411  
 412  The package this method returns should also have the internal
 413  C<Regexp> package in its C<@ISA>. C<qr//->isa("Regexp")> should always
 414  be true regardless of what engine is being used.
 415  
 416  Example implementation might be:
 417  
 418      SV*
 419      Example_qr_package(pTHX_ REGEXP * const rx)
 420      {
 421          PERL_UNUSED_ARG(rx);
 422          return newSVpvs("re::engine::Example");
 423      }
 424  
 425  Any method calls on an object created with C<qr//> will be dispatched to the
 426  package as a normal object.
 427  
 428      use re::engine::Example;
 429      my $re = qr//;
 430      $re->meth; # dispatched to re::engine::Example::meth()
 431  
 432  To retrieve the C<REGEXP> object from the scalar in an XS function use
 433  the C<SvRX> macro, see L<"REGEXP Functions" in perlapi|perlapi/REGEXP
 434  Functions>.
 435  
 436      void meth(SV * rv)
 437      PPCODE:
 438          REGEXP * re = SvRX(sv);
 439  
 440  =head2 dupe
 441  
 442      void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param);
 443  
 444  On threaded builds a regexp may need to be duplicated so that the pattern
 445  can be used by multiple threads. This routine is expected to handle the
 446  duplication of any private data pointed to by the C<pprivate> member of
 447  the regexp structure.  It will be called with the preconstructed new
 448  regexp structure as an argument, the C<pprivate> member will point at
 449  the B<old> private structure, and it is this routine's responsibility to
 450  construct a copy and return a pointer to it (which perl will then use to
 451  overwrite the field as passed to this routine.)
 452  
 453  This allows the engine to dupe its private data but also if necessary
 454  modify the final structure if it really must.
 455  
 456  On unthreaded builds this field doesn't exist.
 457  
 458  =head1 The REGEXP structure
 459  
 460  The REGEXP struct is defined in F<regexp.h>. All regex engines must be able to
 461  correctly build such a structure in their L</comp> routine.
 462  
 463  The REGEXP structure contains all the data that perl needs to be aware of
 464  to properly work with the regular expression. It includes data about
 465  optimisations that perl can use to determine if the regex engine should
 466  really be used, and various other control info that is needed to properly
 467  execute patterns in various contexts such as is the pattern anchored in
 468  some way, or what flags were used during the compile, or whether the
 469  program contains special constructs that perl needs to be aware of.
 470  
 471  In addition it contains two fields that are intended for the private
 472  use of the regex engine that compiled the pattern. These are the
 473  C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to
 474  an arbitrary structure whose use and management is the responsibility
 475  of the compiling engine. perl will never modify either of these
 476  values.
 477  
 478      typedef struct regexp {
 479          /* what engine created this regexp? */
 480          const struct regexp_engine* engine;
 481  
 482          /* what re is this a lightweight copy of? */
 483          struct regexp* mother_re;
 484  
 485          /* Information about the match that the perl core uses to manage things */
 486          U32 extflags;   /* Flags used both externally and internally */
 487          I32 minlen;     /* mininum possible length of string to match */
 488          I32 minlenret;  /* mininum possible length of $& */
 489          U32 gofs;       /* chars left of pos that we search from */
 490  
 491          /* substring data about strings that must appear
 492             in the final match, used for optimisations */
 493          struct reg_substr_data *substrs;
 494  
 495          U32 nparens;  /* number of capture buffers */
 496  
 497          /* private engine specific data */
 498          U32 intflags;   /* Engine Specific Internal flags */
 499          void *pprivate; /* Data private to the regex engine which 
 500                             created this object. */
 501  
 502          /* Data about the last/current match. These are modified during matching*/
 503          U32 lastparen;            /* last open paren matched */
 504          U32 lastcloseparen;       /* last close paren matched */
 505          regexp_paren_pair *swap;  /* Swap copy of *offs */
 506          regexp_paren_pair *offs;  /* Array of offsets for (@-) and (@+) */
 507  
 508          char *subbeg;  /* saved or original string so \digit works forever. */
 509          SV_SAVED_COPY  /* If non-NULL, SV which is COW from original */
 510          I32 sublen;    /* Length of string pointed by subbeg */
 511  
 512          /* Information about the match that isn't often used */
 513          I32 prelen;           /* length of precomp */
 514          const char *precomp;  /* pre-compilation regular expression */
 515  
 516          char *wrapped;  /* wrapped version of the pattern */
 517          I32 wraplen;    /* length of wrapped */
 518  
 519          I32 seen_evals;   /* number of eval groups in the pattern - for security checks */
 520          HV *paren_names;  /* Optional hash of paren names */
 521  
 522          /* Refcount of this regexp */
 523          I32 refcnt;             /* Refcount of this regexp */
 524      } regexp;
 525  
 526  The fields are discussed in more detail below:
 527  
 528  =head2 C<engine>
 529  
 530  This field points at a regexp_engine structure which contains pointers
 531  to the subroutines that are to be used for performing a match. It
 532  is the compiling routine's responsibility to populate this field before
 533  returning the regexp object.
 534  
 535  Internally this is set to C<NULL> unless a custom engine is specified in
 536  C<$^H{regcomp}>, perl's own set of callbacks can be accessed in the struct
 537  pointed to by C<RE_ENGINE_PTR>.
 538  
 539  =head2 C<mother_re>
 540  
 541  TODO, see L<http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html>
 542  
 543  =head2 C<extflags>
 544  
 545  This will be used by perl to see what flags the regexp was compiled
 546  with, this will normally be set to the value of the flags parameter by
 547  the L<comp|/comp> callback. See the L<comp|/comp> documentation for
 548  valid flags.
 549  
 550  =head2 C<minlen> C<minlenret>
 551  
 552  The minimum string length required for the pattern to match.  This is used to
 553  prune the search space by not bothering to match any closer to the end of a
 554  string than would allow a match. For instance there is no point in even
 555  starting the regex engine if the minlen is 10 but the string is only 5
 556  characters long. There is no way that the pattern can match.
 557  
 558  C<minlenret> is the minimum length of the string that would be found
 559  in $& after a match.
 560  
 561  The difference between C<minlen> and C<minlenret> can be seen in the
 562  following pattern:
 563  
 564      /ns(?=\d)/
 565  
 566  where the C<minlen> would be 3 but C<minlenret> would only be 2 as the \d is
 567  required to match but is not actually included in the matched content. This
 568  distinction is particularly important as the substitution logic uses the
 569  C<minlenret> to tell whether it can do in-place substitution which can result in
 570  considerable speedup.
 571  
 572  =head2 C<gofs>
 573  
 574  Left offset from pos() to start match at.
 575  
 576  =head2 C<substrs>
 577  
 578  Substring data about strings that must appear in the final match. This
 579  is currently only used internally by perl's engine for but might be
 580  used in the future for all engines for optimisations.
 581  
 582  =head2 C<nparens>, C<lasparen>, and C<lastcloseparen>
 583  
 584  These fields are used to keep track of how many paren groups could be matched
 585  in the pattern, which was the last open paren to be entered, and which was
 586  the last close paren to be entered.
 587  
 588  =head2 C<intflags>
 589  
 590  The engine's private copy of the flags the pattern was compiled with. Usually
 591  this is the same as C<extflags> unless the engine chose to modify one of them.
 592  
 593  =head2 C<pprivate>
 594  
 595  A void* pointing to an engine-defined data structure. The perl engine uses the
 596  C<regexp_internal> structure (see L<perlreguts/Base Structures>) but a custom
 597  engine should use something else.
 598  
 599  =head2 C<swap>
 600  
 601  TODO: document
 602  
 603  =head2 C<offs>
 604  
 605  A C<regexp_paren_pair> structure which defines offsets into the string being
 606  matched which correspond to the C<$&> and C<$1>, C<$2> etc. captures, the
 607  C<regexp_paren_pair> struct is defined as follows:
 608  
 609      typedef struct regexp_paren_pair {
 610          I32 start;
 611          I32 end;
 612      } regexp_paren_pair;
 613  
 614  If C<< ->offs[num].start >> or C<< ->offs[num].end >> is C<-1> then that
 615  capture buffer did not match. C<< ->offs[0].start/end >> represents C<$&> (or
 616  C<${^MATCH> under C<//p>) and C<< ->offs[paren].end >> matches C<$$paren> where
 617  C<$paren >= 1>.
 618  
 619  =head2 C<precomp> C<prelen>
 620  
 621  Used for optimisations. C<precomp> holds a copy of the pattern that
 622  was compiled and C<prelen> its length. When a new pattern is to be
 623  compiled (such as inside a loop) the internal C<regcomp> operator
 624  checks whether the last compiled C<REGEXP>'s C<precomp> and C<prelen>
 625  are equivalent to the new one, and if so uses the old pattern instead
 626  of compiling a new one.
 627  
 628  The relevant snippet from C<Perl_pp_regcomp>:
 629  
 630      if (!re || !re->precomp || re->prelen != (I32)len ||
 631          memNE(re->precomp, t, len))
 632          /* Compile a new pattern */
 633  
 634  =head2 C<paren_names>
 635  
 636  This is a hash used internally to track named capture buffers and their
 637  offsets. The keys are the names of the buffers the values are dualvars,
 638  with the IV slot holding the number of buffers with the given name and the
 639  pv being an embedded array of I32.  The values may also be contained
 640  independently in the data array in cases where named backreferences are
 641  used.
 642  
 643  =head2 C<substrs>
 644  
 645  Holds information on the longest string that must occur at a fixed
 646  offset from the start of the pattern, and the longest string that must
 647  occur at a floating offset from the start of the pattern. Used to do
 648  Fast-Boyer-Moore searches on the string to find out if its worth using
 649  the regex engine at all, and if so where in the string to search.
 650  
 651  =head2 C<subbeg> C<sublen> C<saved_copy>
 652  
 653  Used during execution phase for managing search and replace patterns.
 654  
 655  =head2 C<wrapped> C<wraplen>
 656  
 657  Stores the string C<qr//> stringifies to. The perl engine for example
 658  stores C<(?-xism:eek)> in the case of C<qr/eek/>.
 659  
 660  When using a custom engine that doesn't support the C<(?:)> construct
 661  for inline modifiers, it's probably best to have C<qr//> stringify to
 662  the supplied pattern, note that this will create undesired patterns in
 663  cases such as:
 664  
 665      my $x = qr/a|b/;  # "a|b"
 666      my $y = qr/c/i;   # "c"
 667      my $z = qr/$x$y/; # "a|bc"
 668  
 669  There's no solution for this problem other than making the custom
 670  engine understand a construct like C<(?:)>.
 671  
 672  =head2 C<seen_evals>
 673  
 674  This stores the number of eval groups in the pattern. This is used for security
 675  purposes when embedding compiled regexes into larger patterns with C<qr//>.
 676  
 677  =head2 C<refcnt>
 678  
 679  The number of times the structure is referenced. When this falls to 0 the
 680  regexp is automatically freed by a call to pregfree. This should be set to 1 in
 681  each engine's L</comp> routine.
 682  
 683  =head1 HISTORY
 684  
 685  Originally part of L<perlreguts>.
 686  
 687  =head1 AUTHORS
 688  
 689  Originally written by Yves Orton, expanded by E<AElig>var ArnfjE<ouml>rE<eth>
 690  Bjarmason.
 691  
 692  =head1 LICENSE
 693  
 694  Copyright 2006 Yves Orton and 2007 E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason.
 695  
 696  This program is free software; you can redistribute it and/or modify it under
 697  the same terms as Perl itself.
 698  
 699  =cut


Generated: Tue Mar 17 22:47:18 2015 Cross-referenced by PHPXref 0.7.1