[ Index ]

PHP Cross Reference of Unnamed Project

title

Body

[close]

/se3-unattended/var/se3/unattended/install/linuxaux/opt/perl/lib/5.10.0/pod/ -> perlpodspec.pod (source)

   1  
   2  =head1 NAME
   3  
   4  perlpodspec - Plain Old Documentation: format specification and notes
   5  
   6  =head1 DESCRIPTION
   7  
   8  This document is detailed notes on the Pod markup language.  Most
   9  people will only have to read L<perlpod|perlpod> to know how to write
  10  in Pod, but this document may answer some incidental questions to do
  11  with parsing and rendering Pod.
  12  
  13  In this document, "must" / "must not", "should" /
  14  "should not", and "may" have their conventional (cf. RFC 2119)
  15  meanings: "X must do Y" means that if X doesn't do Y, it's against
  16  this specification, and should really be fixed.  "X should do Y"
  17  means that it's recommended, but X may fail to do Y, if there's a
  18  good reason.  "X may do Y" is merely a note that X can do Y at
  19  will (although it is up to the reader to detect any connotation of
  20  "and I think it would be I<nice> if X did Y" versus "it wouldn't
  21  really I<bother> me if X did Y").
  22  
  23  Notably, when I say "the parser should do Y", the
  24  parser may fail to do Y, if the calling application explicitly
  25  requests that the parser I<not> do Y.  I often phrase this as
  26  "the parser should, by default, do Y."  This doesn't I<require>
  27  the parser to provide an option for turning off whatever
  28  feature Y is (like expanding tabs in verbatim paragraphs), although
  29  it implicates that such an option I<may> be provided.
  30  
  31  =head1 Pod Definitions
  32  
  33  Pod is embedded in files, typically Perl source files -- although you
  34  can write a file that's nothing but Pod.
  35  
  36  A B<line> in a file consists of zero or more non-newline characters,
  37  terminated by either a newline or the end of the file.
  38  
  39  A B<newline sequence> is usually a platform-dependent concept, but
  40  Pod parsers should understand it to mean any of CR (ASCII 13), LF
  41  (ASCII 10), or a CRLF (ASCII 13 followed immediately by ASCII 10), in
  42  addition to any other system-specific meaning.  The first CR/CRLF/LF
  43  sequence in the file may be used as the basis for identifying the
  44  newline sequence for parsing the rest of the file.
  45  
  46  A B<blank line> is a line consisting entirely of zero or more spaces
  47  (ASCII 32) or tabs (ASCII 9), and terminated by a newline or end-of-file.
  48  A B<non-blank line> is a line containing one or more characters other
  49  than space or tab (and terminated by a newline or end-of-file).
  50  
  51  (I<Note:> Many older Pod parsers did not accept a line consisting of
  52  spaces/tabs and then a newline as a blank line -- the only lines they
  53  considered blank were lines consisting of I<no characters at all>,
  54  terminated by a newline.)
  55  
  56  B<Whitespace> is used in this document as a blanket term for spaces,
  57  tabs, and newline sequences.  (By itself, this term usually refers
  58  to literal whitespace.  That is, sequences of whitespace characters
  59  in Pod source, as opposed to "EE<lt>32>", which is a formatting
  60  code that I<denotes> a whitespace character.)
  61  
  62  A B<Pod parser> is a module meant for parsing Pod (regardless of
  63  whether this involves calling callbacks or building a parse tree or
  64  directly formatting it).  A B<Pod formatter> (or B<Pod translator>)
  65  is a module or program that converts Pod to some other format (HTML,
  66  plaintext, TeX, PostScript, RTF).  A B<Pod processor> might be a
  67  formatter or translator, or might be a program that does something
  68  else with the Pod (like counting words, scanning for index points,
  69  etc.).
  70  
  71  Pod content is contained in B<Pod blocks>.  A Pod block starts with a
  72  line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line
  73  that matches C<m/\A=cut/> -- or up to the end of the file, if there is
  74  no C<m/\A=cut/> line.
  75  
  76  =for comment
  77   The current perlsyn says:
  78   [beginquote]
  79     Note that pod translators should look at only paragraphs beginning
  80     with a pod directive (it makes parsing easier), whereas the compiler
  81     actually knows to look for pod escapes even in the middle of a
  82     paragraph.  This means that the following secret stuff will be ignored
  83     by both the compiler and the translators.
  84        $a=3;
  85        =secret stuff
  86         warn "Neither POD nor CODE!?"
  87        =cut back
  88        print "got $a\n";
  89     You probably shouldn't rely upon the warn() being podded out forever.
  90     Not all pod translators are well-behaved in this regard, and perhaps
  91     the compiler will become pickier.
  92   [endquote]
  93   I think that those paragraphs should just be removed; paragraph-based
  94   parsing  seems to have been largely abandoned, because of the hassle
  95   with non-empty blank lines messing up what people meant by "paragraph".
  96   Even if the "it makes parsing easier" bit were especially true,
  97   it wouldn't be worth the confusion of having perl and pod2whatever
  98   actually disagree on what can constitute a Pod block.
  99  
 100  Within a Pod block, there are B<Pod paragraphs>.  A Pod paragraph
 101  consists of non-blank lines of text, separated by one or more blank
 102  lines.
 103  
 104  For purposes of Pod processing, there are four types of paragraphs in
 105  a Pod block:
 106  
 107  =over
 108  
 109  =item *
 110  
 111  A command paragraph (also called a "directive").  The first line of
 112  this paragraph must match C<m/\A=[a-zA-Z]/>.  Command paragraphs are
 113  typically one line, as in:
 114  
 115    =head1 NOTES
 116  
 117    =item *
 118  
 119  But they may span several (non-blank) lines:
 120  
 121    =for comment
 122    Hm, I wonder what it would look like if
 123    you tried to write a BNF for Pod from this.
 124  
 125    =head3 Dr. Strangelove, or: How I Learned to
 126    Stop Worrying and Love the Bomb
 127  
 128  I<Some> command paragraphs allow formatting codes in their content
 129  (i.e., after the part that matches C<m/\A=[a-zA-Z]\S*\s*/>), as in:
 130  
 131    =head1 Did You Remember to C<use strict;>?
 132  
 133  In other words, the Pod processing handler for "head1" will apply the
 134  same processing to "Did You Remember to CE<lt>use strict;>?" that it
 135  would to an ordinary paragraph -- i.e., formatting codes (like
 136  "CE<lt>...>") are parsed and presumably formatted appropriately, and
 137  whitespace in the form of literal spaces and/or tabs is not
 138  significant.
 139  
 140  =item *
 141  
 142  A B<verbatim paragraph>.  The first line of this paragraph must be a
 143  literal space or tab, and this paragraph must not be inside a "=begin
 144  I<identifier>", ... "=end I<identifier>" sequence unless
 145  "I<identifier>" begins with a colon (":").  That is, if a paragraph
 146  starts with a literal space or tab, but I<is> inside a
 147  "=begin I<identifier>", ... "=end I<identifier>" region, then it's
 148  a data paragraph, unless "I<identifier>" begins with a colon.
 149  
 150  Whitespace I<is> significant in verbatim paragraphs (although, in
 151  processing, tabs are probably expanded).
 152  
 153  =item *
 154  
 155  An B<ordinary paragraph>.  A paragraph is an ordinary paragraph
 156  if its first line matches neither C<m/\A=[a-zA-Z]/> nor
 157  C<m/\A[ \t]/>, I<and> if it's not inside a "=begin I<identifier>",
 158  ... "=end I<identifier>" sequence unless "I<identifier>" begins with
 159  a colon (":").
 160  
 161  =item *
 162  
 163  A B<data paragraph>.  This is a paragraph that I<is> inside a "=begin
 164  I<identifier>" ... "=end I<identifier>" sequence where
 165  "I<identifier>" does I<not> begin with a literal colon (":").  In
 166  some sense, a data paragraph is not part of Pod at all (i.e.,
 167  effectively it's "out-of-band"), since it's not subject to most kinds
 168  of Pod parsing; but it is specified here, since Pod
 169  parsers need to be able to call an event for it, or store it in some
 170  form in a parse tree, or at least just parse I<around> it.
 171  
 172  =back
 173  
 174  For example: consider the following paragraphs:
 175  
 176    # <- that's the 0th column
 177  
 178    =head1 Foo
 179  
 180    Stuff
 181  
 182      $foo->bar
 183  
 184    =cut
 185  
 186  Here, "=head1 Foo" and "=cut" are command paragraphs because the first
 187  line of each matches C<m/\A=[a-zA-Z]/>.  "I<[space][space]>$foo->bar"
 188  is a verbatim paragraph, because its first line starts with a literal
 189  whitespace character (and there's no "=begin"..."=end" region around).
 190  
 191  The "=begin I<identifier>" ... "=end I<identifier>" commands stop
 192  paragraphs that they surround from being parsed as data or verbatim
 193  paragraphs, if I<identifier> doesn't begin with a colon.  This
 194  is discussed in detail in the section
 195  L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
 196  
 197  =head1 Pod Commands
 198  
 199  This section is intended to supplement and clarify the discussion in
 200  L<perlpod/"Command Paragraph">.  These are the currently recognized
 201  Pod commands:
 202  
 203  =over
 204  
 205  =item "=head1", "=head2", "=head3", "=head4"
 206  
 207  This command indicates that the text in the remainder of the paragraph
 208  is a heading.  That text may contain formatting codes.  Examples:
 209  
 210    =head1 Object Attributes
 211  
 212    =head3 What B<Not> to Do!
 213  
 214  =item "=pod"
 215  
 216  This command indicates that this paragraph begins a Pod block.  (If we
 217  are already in the middle of a Pod block, this command has no effect at
 218  all.)  If there is any text in this command paragraph after "=pod",
 219  it must be ignored.  Examples:
 220  
 221    =pod
 222  
 223    This is a plain Pod paragraph.
 224  
 225    =pod This text is ignored.
 226  
 227  =item "=cut"
 228  
 229  This command indicates that this line is the end of this previously
 230  started Pod block.  If there is any text after "=cut" on the line, it must be
 231  ignored.  Examples:
 232  
 233    =cut
 234  
 235    =cut The documentation ends here.
 236  
 237    =cut
 238    # This is the first line of program text.
 239    sub foo { # This is the second.
 240  
 241  It is an error to try to I<start> a Pod block with a "=cut" command.  In
 242  that case, the Pod processor must halt parsing of the input file, and
 243  must by default emit a warning.
 244  
 245  =item "=over"
 246  
 247  This command indicates that this is the start of a list/indent
 248  region.  If there is any text following the "=over", it must consist
 249  of only a nonzero positive numeral.  The semantics of this numeral is
 250  explained in the L</"About =over...=back Regions"> section, further
 251  below.  Formatting codes are not expanded.  Examples:
 252  
 253    =over 3
 254  
 255    =over 3.5
 256  
 257    =over
 258  
 259  =item "=item"
 260  
 261  This command indicates that an item in a list begins here.  Formatting
 262  codes are processed.  The semantics of the (optional) text in the
 263  remainder of this paragraph are
 264  explained in the L</"About =over...=back Regions"> section, further
 265  below.  Examples:
 266  
 267    =item
 268  
 269    =item *
 270  
 271    =item      *    
 272  
 273    =item 14
 274  
 275    =item   3.
 276  
 277    =item C<< $thing->stuff(I<dodad>) >>
 278  
 279    =item For transporting us beyond seas to be tried for pretended
 280    offenses
 281  
 282    =item He is at this time transporting large armies of foreign
 283    mercenaries to complete the works of death, desolation and
 284    tyranny, already begun with circumstances of cruelty and perfidy
 285    scarcely paralleled in the most barbarous ages, and totally
 286    unworthy the head of a civilized nation.
 287  
 288  =item "=back"
 289  
 290  This command indicates that this is the end of the region begun
 291  by the most recent "=over" command.  It permits no text after the
 292  "=back" command.
 293  
 294  =item "=begin formatname"
 295  
 296  This marks the following paragraphs (until the matching "=end
 297  formatname") as being for some special kind of processing.  Unless
 298  "formatname" begins with a colon, the contained non-command
 299  paragraphs are data paragraphs.  But if "formatname" I<does> begin
 300  with a colon, then non-command paragraphs are ordinary paragraphs
 301  or data paragraphs.  This is discussed in detail in the section
 302  L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
 303  
 304  It is advised that formatnames match the regexp
 305  C<m/\A:?[-a-zA-Z0-9_]+\z/>.  Implementors should anticipate future
 306  expansion in the semantics and syntax of the first parameter
 307  to "=begin"/"=end"/"=for".
 308  
 309  =item "=end formatname"
 310  
 311  This marks the end of the region opened by the matching
 312  "=begin formatname" region.  If "formatname" is not the formatname
 313  of the most recent open "=begin formatname" region, then this
 314  is an error, and must generate an error message.  This
 315  is discussed in detail in the section
 316  L</About Data Paragraphs and "=beginE<sol>=end" Regions>.
 317  
 318  =item "=for formatname text..."
 319  
 320  This is synonymous with:
 321  
 322       =begin formatname
 323  
 324       text...
 325  
 326       =end formatname
 327  
 328  That is, it creates a region consisting of a single paragraph; that
 329  paragraph is to be treated as a normal paragraph if "formatname"
 330  begins with a ":"; if "formatname" I<doesn't> begin with a colon,
 331  then "text..." will constitute a data paragraph.  There is no way
 332  to use "=for formatname text..." to express "text..." as a verbatim
 333  paragraph.
 334  
 335  =item "=encoding encodingname"
 336  
 337  This command, which should occur early in the document (at least
 338  before any non-US-ASCII data!), declares that this document is
 339  encoded in the encoding I<encodingname>, which must be
 340  an encoding name that L<Encoding> recognizes.  (Encoding's list
 341  of supported encodings, in L<Encoding::Supported>, is useful here.)
 342  If the Pod parser cannot decode the declared encoding, it 
 343  should emit a warning and may abort parsing the document
 344  altogether.
 345  
 346  A document having more than one "=encoding" line should be
 347  considered an error.  Pod processors may silently tolerate this if
 348  the not-first "=encoding" lines are just duplicates of the
 349  first one (e.g., if there's a "=use utf8" line, and later on
 350  another "=use utf8" line).  But Pod processors should complain if
 351  there are contradictory "=encoding" lines in the same document
 352  (e.g., if there is a "=encoding utf8" early in the document and
 353  "=encoding big5" later).  Pod processors that recognize BOMs
 354  may also complain if they see an "=encoding" line
 355  that contradicts the BOM (e.g., if a document with a UTF-16LE
 356  BOM has an "=encoding shiftjis" line).
 357  
 358  =back
 359  
 360  If a Pod processor sees any command other than the ones listed
 361  above (like "=head", or "=haed1", or "=stuff", or "=cuttlefish",
 362  or "=w123"), that processor must by default treat this as an
 363  error.  It must not process the paragraph beginning with that
 364  command, must by default warn of this as an error, and may
 365  abort the parse.  A Pod parser may allow a way for particular
 366  applications to add to the above list of known commands, and to
 367  stipulate, for each additional command, whether formatting
 368  codes should be processed.
 369  
 370  Future versions of this specification may add additional
 371  commands.
 372  
 373  
 374  
 375  =head1 Pod Formatting Codes
 376  
 377  (Note that in previous drafts of this document and of perlpod,
 378  formatting codes were referred to as "interior sequences", and
 379  this term may still be found in the documentation for Pod parsers,
 380  and in error messages from Pod processors.)
 381  
 382  There are two syntaxes for formatting codes:
 383  
 384  =over
 385  
 386  =item *
 387  
 388  A formatting code starts with a capital letter (just US-ASCII [A-Z])
 389  followed by a "<", any number of characters, and ending with the first
 390  matching ">".  Examples:
 391  
 392      That's what I<you> think!
 393  
 394      What's C<dump()> for?
 395  
 396      X<C<chmod> and C<unlink()> Under Different Operating Systems>
 397  
 398  =item *
 399  
 400  A formatting code starts with a capital letter (just US-ASCII [A-Z])
 401  followed by two or more "<"'s, one or more whitespace characters,
 402  any number of characters, one or more whitespace characters,
 403  and ending with the first matching sequence of two or more ">"'s, where
 404  the number of ">"'s equals the number of "<"'s in the opening of this
 405  formatting code.  Examples:
 406  
 407      That's what I<< you >> think!
 408  
 409      C<<< open(X, ">>thing.dat") || die $! >>>
 410  
 411      B<< $foo->bar(); >>
 412  
 413  With this syntax, the whitespace character(s) after the "CE<lt><<"
 414  and before the ">>" (or whatever letter) are I<not> renderable -- they
 415  do not signify whitespace, are merely part of the formatting codes
 416  themselves.  That is, these are all synonymous:
 417  
 418      C<thing>
 419      C<< thing >>
 420      C<<           thing     >>
 421      C<<<   thing >>>
 422      C<<<<
 423      thing
 424                 >>>>
 425  
 426  and so on.
 427  
 428  =back
 429  
 430  In parsing Pod, a notably tricky part is the correct parsing of
 431  (potentially nested!) formatting codes.  Implementors should
 432  consult the code in the C<parse_text> routine in Pod::Parser as an
 433  example of a correct implementation.
 434  
 435  =over
 436  
 437  =item C<IE<lt>textE<gt>> -- italic text
 438  
 439  See the brief discussion in L<perlpod/"Formatting Codes">.
 440  
 441  =item C<BE<lt>textE<gt>> -- bold text
 442  
 443  See the brief discussion in L<perlpod/"Formatting Codes">.
 444  
 445  =item C<CE<lt>codeE<gt>> -- code text
 446  
 447  See the brief discussion in L<perlpod/"Formatting Codes">.
 448  
 449  =item C<FE<lt>filenameE<gt>> -- style for filenames
 450  
 451  See the brief discussion in L<perlpod/"Formatting Codes">.
 452  
 453  =item C<XE<lt>topic nameE<gt>> -- an index entry
 454  
 455  See the brief discussion in L<perlpod/"Formatting Codes">.
 456  
 457  This code is unusual in that most formatters completely discard
 458  this code and its content.  Other formatters will render it with
 459  invisible codes that can be used in building an index of
 460  the current document.
 461  
 462  =item C<ZE<lt>E<gt>> -- a null (zero-effect) formatting code
 463  
 464  Discussed briefly in L<perlpod/"Formatting Codes">.
 465  
 466  This code is unusual is that it should have no content.  That is,
 467  a processor may complain if it sees C<ZE<lt>potatoesE<gt>>.  Whether
 468  or not it complains, the I<potatoes> text should ignored.
 469  
 470  =item C<LE<lt>nameE<gt>> -- a hyperlink
 471  
 472  The complicated syntaxes of this code are discussed at length in
 473  L<perlpod/"Formatting Codes">, and implementation details are
 474  discussed below, in L</"About LE<lt>...E<gt> Codes">.  Parsing the
 475  contents of LE<lt>content> is tricky.  Notably, the content has to be
 476  checked for whether it looks like a URL, or whether it has to be split
 477  on literal "|" and/or "/" (in the right order!), and so on,
 478  I<before> EE<lt>...> codes are resolved.
 479  
 480  =item C<EE<lt>escapeE<gt>> -- a character escape
 481  
 482  See L<perlpod/"Formatting Codes">, and several points in
 483  L</Notes on Implementing Pod Processors>.
 484  
 485  =item C<SE<lt>textE<gt>> -- text contains non-breaking spaces
 486  
 487  This formatting code is syntactically simple, but semantically
 488  complex.  What it means is that each space in the printable
 489  content of this code signifies a non-breaking space.
 490  
 491  Consider:
 492  
 493      C<$x ? $y    :  $z>
 494  
 495      S<C<$x ? $y     :  $z>>
 496  
 497  Both signify the monospace (c[ode] style) text consisting of
 498  "$x", one space, "?", one space, ":", one space, "$z".  The
 499  difference is that in the latter, with the S code, those spaces
 500  are not "normal" spaces, but instead are non-breaking spaces.
 501  
 502  =back
 503  
 504  
 505  If a Pod processor sees any formatting code other than the ones
 506  listed above (as in "NE<lt>...>", or "QE<lt>...>", etc.), that
 507  processor must by default treat this as an error.
 508  A Pod parser may allow a way for particular
 509  applications to add to the above list of known formatting codes;
 510  a Pod parser might even allow a way to stipulate, for each additional
 511  command, whether it requires some form of special processing, as
 512  LE<lt>...> does.
 513  
 514  Future versions of this specification may add additional
 515  formatting codes.
 516  
 517  Historical note:  A few older Pod processors would not see a ">" as
 518  closing a "CE<lt>" code, if the ">" was immediately preceded by
 519  a "-".  This was so that this:
 520  
 521      C<$foo->bar>
 522  
 523  would parse as equivalent to this:
 524  
 525      C<$foo-E<gt>bar>
 526  
 527  instead of as equivalent to a "C" formatting code containing 
 528  only "$foo-", and then a "bar>" outside the "C" formatting code.  This
 529  problem has since been solved by the addition of syntaxes like this:
 530  
 531      C<< $foo->bar >>
 532  
 533  Compliant parsers must not treat "->" as special.
 534  
 535  Formatting codes absolutely cannot span paragraphs.  If a code is
 536  opened in one paragraph, and no closing code is found by the end of
 537  that paragraph, the Pod parser must close that formatting code,
 538  and should complain (as in "Unterminated I code in the paragraph
 539  starting at line 123: 'Time objects are not...'").  So these
 540  two paragraphs:
 541  
 542    I<I told you not to do this!
 543  
 544    Don't make me say it again!>
 545  
 546  ...must I<not> be parsed as two paragraphs in italics (with the I
 547  code starting in one paragraph and starting in another.)  Instead,
 548  the first paragraph should generate a warning, but that aside, the
 549  above code must parse as if it were:
 550  
 551    I<I told you not to do this!>
 552  
 553    Don't make me say it again!E<gt>
 554  
 555  (In SGMLish jargon, all Pod commands are like block-level
 556  elements, whereas all Pod formatting codes are like inline-level
 557  elements.)
 558  
 559  
 560  
 561  =head1 Notes on Implementing Pod Processors
 562  
 563  The following is a long section of miscellaneous requirements
 564  and suggestions to do with Pod processing.
 565  
 566  =over
 567  
 568  =item *
 569  
 570  Pod formatters should tolerate lines in verbatim blocks that are of
 571  any length, even if that means having to break them (possibly several
 572  times, for very long lines) to avoid text running off the side of the
 573  page.  Pod formatters may warn of such line-breaking.  Such warnings
 574  are particularly appropriate for lines are over 100 characters long, which
 575  are usually not intentional.
 576  
 577  =item *
 578  
 579  Pod parsers must recognize I<all> of the three well-known newline
 580  formats: CR, LF, and CRLF.  See L<perlport|perlport>.
 581  
 582  =item *
 583  
 584  Pod parsers should accept input lines that are of any length.
 585  
 586  =item *
 587  
 588  Since Perl recognizes a Unicode Byte Order Mark at the start of files
 589  as signaling that the file is Unicode encoded as in UTF-16 (whether
 590  big-endian or little-endian) or UTF-8, Pod parsers should do the
 591  same.  Otherwise, the character encoding should be understood as
 592  being UTF-8 if the first highbit byte sequence in the file seems
 593  valid as a UTF-8 sequence, or otherwise as Latin-1.
 594  
 595  Future versions of this specification may specify
 596  how Pod can accept other encodings.  Presumably treatment of other
 597  encodings in Pod parsing would be as in XML parsing: whatever the
 598  encoding declared by a particular Pod file, content is to be
 599  stored in memory as Unicode characters.
 600  
 601  =item *
 602  
 603  The well known Unicode Byte Order Marks are as follows:  if the
 604  file begins with the two literal byte values 0xFE 0xFF, this is
 605  the BOM for big-endian UTF-16.  If the file begins with the two
 606  literal byte value 0xFF 0xFE, this is the BOM for little-endian
 607  UTF-16.  If the file begins with the three literal byte values
 608  0xEF 0xBB 0xBF, this is the BOM for UTF-8.
 609  
 610  =for comment
 611   use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}";
 612   0xEF 0xBB 0xBF
 613  
 614  =for comment
 615   If toke.c is modified to support UTF-32, add mention of those here.
 616  
 617  =item *
 618  
 619  A naive but sufficient heuristic for testing the first highbit
 620  byte-sequence in a BOM-less file (whether in code or in Pod!), to see
 621  whether that sequence is valid as UTF-8 (RFC 2279) is to check whether
 622  that the first byte in the sequence is in the range 0xC0 - 0xFD
 623  I<and> whether the next byte is in the range
 624  0x80 - 0xBF.  If so, the parser may conclude that this file is in
 625  UTF-8, and all highbit sequences in the file should be assumed to
 626  be UTF-8.  Otherwise the parser should treat the file as being
 627  in Latin-1.  In the unlikely circumstance that the first highbit
 628  sequence in a truly non-UTF-8 file happens to appear to be UTF-8, one
 629  can cater to our heuristic (as well as any more intelligent heuristic)
 630  by prefacing that line with a comment line containing a highbit
 631  sequence that is clearly I<not> valid as UTF-8.  A line consisting
 632  of simply "#", an e-acute, and any non-highbit byte,
 633  is sufficient to establish this file's encoding.
 634  
 635  =for comment
 636   If/WHEN some brave soul makes these heuristics into a generic
 637   text-file class (or PerlIO layer?), we can presumably delete
 638   mention of these icky details from this file, and can instead
 639   tell people to just use appropriate class/layer.
 640   Auto-recognition of newline sequences would be another desirable
 641   feature of such a class/layer.
 642   HINT HINT HINT.
 643  
 644  =for comment
 645   "The probability that a string of characters
 646   in any other encoding appears as valid UTF-8 is low" - RFC2279
 647  
 648  =item *
 649  
 650  This document's requirements and suggestions about encodings
 651  do not apply to Pod processors running on non-ASCII platforms,
 652  notably EBCDIC platforms.
 653  
 654  =item *
 655  
 656  Pod processors must treat a "=for [label] [content...]" paragraph as
 657  meaning the same thing as a "=begin [label]" paragraph, content, and
 658  an "=end [label]" paragraph.  (The parser may conflate these two
 659  constructs, or may leave them distinct, in the expectation that the
 660  formatter will nevertheless treat them the same.)
 661  
 662  =item *
 663  
 664  When rendering Pod to a format that allows comments (i.e., to nearly
 665  any format other than plaintext), a Pod formatter must insert comment
 666  text identifying its name and version number, and the name and
 667  version numbers of any modules it might be using to process the Pod.
 668  Minimal examples:
 669  
 670    %% POD::Pod2PS v3.14159, using POD::Parser v1.92
 671  
 672    <!-- Pod::HTML v3.14159, using POD::Parser v1.92 -->
 673  
 674    {\doccomm generated by Pod::Tree::RTF 3.14159 using Pod::Tree 1.08}
 675  
 676    .\" Pod::Man version 3.14159, using POD::Parser version 1.92
 677  
 678  Formatters may also insert additional comments, including: the
 679  release date of the Pod formatter program, the contact address for
 680  the author(s) of the formatter, the current time, the name of input
 681  file, the formatting options in effect, version of Perl used, etc.
 682  
 683  Formatters may also choose to note errors/warnings as comments,
 684  besides or instead of emitting them otherwise (as in messages to
 685  STDERR, or C<die>ing).
 686  
 687  =item *
 688  
 689  Pod parsers I<may> emit warnings or error messages ("Unknown E code
 690  EE<lt>zslig>!") to STDERR (whether through printing to STDERR, or
 691  C<warn>ing/C<carp>ing, or C<die>ing/C<croak>ing), but I<must> allow
 692  suppressing all such STDERR output, and instead allow an option for
 693  reporting errors/warnings
 694  in some other way, whether by triggering a callback, or noting errors
 695  in some attribute of the document object, or some similarly unobtrusive
 696  mechanism -- or even by appending a "Pod Errors" section to the end of
 697  the parsed form of the document.
 698  
 699  =item *
 700  
 701  In cases of exceptionally aberrant documents, Pod parsers may abort the
 702  parse.  Even then, using C<die>ing/C<croak>ing is to be avoided; where
 703  possible, the parser library may simply close the input file
 704  and add text like "*** Formatting Aborted ***" to the end of the
 705  (partial) in-memory document.
 706  
 707  =item *
 708  
 709  In paragraphs where formatting codes (like EE<lt>...>, BE<lt>...>)
 710  are understood (i.e., I<not> verbatim paragraphs, but I<including>
 711  ordinary paragraphs, and command paragraphs that produce renderable
 712  text, like "=head1"), literal whitespace should generally be considered
 713  "insignificant", in that one literal space has the same meaning as any
 714  (nonzero) number of literal spaces, literal newlines, and literal tabs
 715  (as long as this produces no blank lines, since those would terminate
 716  the paragraph).  Pod parsers should compact literal whitespace in each
 717  processed paragraph, but may provide an option for overriding this
 718  (since some processing tasks do not require it), or may follow
 719  additional special rules (for example, specially treating
 720  period-space-space or period-newline sequences).
 721  
 722  =item *
 723  
 724  Pod parsers should not, by default, try to coerce apostrophe (') and
 725  quote (") into smart quotes (little 9's, 66's, 99's, etc), nor try to
 726  turn backtick (`) into anything else but a single backtick character
 727  (distinct from an open quote character!), nor "--" into anything but
 728  two minus signs.  They I<must never> do any of those things to text
 729  in CE<lt>...> formatting codes, and never I<ever> to text in verbatim
 730  paragraphs.
 731  
 732  =item *
 733  
 734  When rendering Pod to a format that has two kinds of hyphens (-), one
 735  that's a non-breaking hyphen, and another that's a breakable hyphen
 736  (as in "object-oriented", which can be split across lines as
 737  "object-", newline, "oriented"), formatters are encouraged to
 738  generally translate "-" to non-breaking hyphen, but may apply
 739  heuristics to convert some of these to breaking hyphens.
 740  
 741  =item *
 742  
 743  Pod formatters should make reasonable efforts to keep words of Perl
 744  code from being broken across lines.  For example, "Foo::Bar" in some
 745  formatting systems is seen as eligible for being broken across lines
 746  as "Foo::" newline "Bar" or even "Foo::-" newline "Bar".  This should
 747  be avoided where possible, either by disabling all line-breaking in
 748  mid-word, or by wrapping particular words with internal punctuation
 749  in "don't break this across lines" codes (which in some formats may
 750  not be a single code, but might be a matter of inserting non-breaking
 751  zero-width spaces between every pair of characters in a word.)
 752  
 753  =item *
 754  
 755  Pod parsers should, by default, expand tabs in verbatim paragraphs as
 756  they are processed, before passing them to the formatter or other
 757  processor.  Parsers may also allow an option for overriding this.
 758  
 759  =item *
 760  
 761  Pod parsers should, by default, remove newlines from the end of
 762  ordinary and verbatim paragraphs before passing them to the
 763  formatter.  For example, while the paragraph you're reading now
 764  could be considered, in Pod source, to end with (and contain)
 765  the newline(s) that end it, it should be processed as ending with
 766  (and containing) the period character that ends this sentence.
 767  
 768  =item *
 769  
 770  Pod parsers, when reporting errors, should make some effort to report
 771  an approximate line number ("Nested EE<lt>>'s in Paragraph #52, near
 772  line 633 of Thing/Foo.pm!"), instead of merely noting the paragraph
 773  number ("Nested EE<lt>>'s in Paragraph #52 of Thing/Foo.pm!").  Where
 774  this is problematic, the paragraph number should at least be
 775  accompanied by an excerpt from the paragraph ("Nested EE<lt>>'s in
 776  Paragraph #52 of Thing/Foo.pm, which begins 'Read/write accessor for
 777  the CE<lt>interest rate> attribute...'").
 778  
 779  =item *
 780  
 781  Pod parsers, when processing a series of verbatim paragraphs one
 782  after another, should consider them to be one large verbatim
 783  paragraph that happens to contain blank lines.  I.e., these two
 784  lines, which have a blank line between them:
 785  
 786      use Foo;
 787  
 788      print Foo->VERSION
 789  
 790  should be unified into one paragraph ("\tuse Foo;\n\n\tprint
 791  Foo->VERSION") before being passed to the formatter or other
 792  processor.  Parsers may also allow an option for overriding this.
 793  
 794  While this might be too cumbersome to implement in event-based Pod
 795  parsers, it is straightforward for parsers that return parse trees.
 796  
 797  =item *
 798  
 799  Pod formatters, where feasible, are advised to avoid splitting short
 800  verbatim paragraphs (under twelve lines, say) across pages.
 801  
 802  =item *
 803  
 804  Pod parsers must treat a line with only spaces and/or tabs on it as a
 805  "blank line" such as separates paragraphs.  (Some older parsers
 806  recognized only two adjacent newlines as a "blank line" but would not
 807  recognize a newline, a space, and a newline, as a blank line.  This
 808  is noncompliant behavior.)
 809  
 810  =item *
 811  
 812  Authors of Pod formatters/processors should make every effort to
 813  avoid writing their own Pod parser.  There are already several in
 814  CPAN, with a wide range of interface styles -- and one of them,
 815  Pod::Parser, comes with modern versions of Perl.
 816  
 817  =item *
 818  
 819  Characters in Pod documents may be conveyed either as literals, or by
 820  number in EE<lt>n> codes, or by an equivalent mnemonic, as in
 821  EE<lt>eacute> which is exactly equivalent to EE<lt>233>.
 822  
 823  Characters in the range 32-126 refer to those well known US-ASCII
 824  characters (also defined there by Unicode, with the same meaning),
 825  which all Pod formatters must render faithfully.  Characters
 826  in the ranges 0-31 and 127-159 should not be used (neither as
 827  literals, nor as EE<lt>number> codes), except for the
 828  literal byte-sequences for newline (13, 13 10, or 10), and tab (9).
 829  
 830  Characters in the range 160-255 refer to Latin-1 characters (also
 831  defined there by Unicode, with the same meaning).  Characters above
 832  255 should be understood to refer to Unicode characters.
 833  
 834  =item *
 835  
 836  Be warned
 837  that some formatters cannot reliably render characters outside 32-126;
 838  and many are able to handle 32-126 and 160-255, but nothing above
 839  255.
 840  
 841  =item *
 842  
 843  Besides the well-known "EE<lt>lt>" and "EE<lt>gt>" codes for
 844  less-than and greater-than, Pod parsers must understand "EE<lt>sol>"
 845  for "/" (solidus, slash), and "EE<lt>verbar>" for "|" (vertical bar,
 846  pipe).  Pod parsers should also understand "EE<lt>lchevron>" and
 847  "EE<lt>rchevron>" as legacy codes for characters 171 and 187, i.e.,
 848  "left-pointing double angle quotation mark" = "left pointing
 849  guillemet" and "right-pointing double angle quotation mark" = "right
 850  pointing guillemet".  (These look like little "<<" and ">>", and they
 851  are now preferably expressed with the HTML/XHTML codes "EE<lt>laquo>"
 852  and "EE<lt>raquo>".)
 853  
 854  =item *
 855  
 856  Pod parsers should understand all "EE<lt>html>" codes as defined
 857  in the entity declarations in the most recent XHTML specification at
 858  C<www.W3.org>.  Pod parsers must understand at least the entities
 859  that define characters in the range 160-255 (Latin-1).  Pod parsers,
 860  when faced with some unknown "EE<lt>I<identifier>>" code,
 861  shouldn't simply replace it with nullstring (by default, at least),
 862  but may pass it through as a string consisting of the literal characters
 863  E, less-than, I<identifier>, greater-than.  Or Pod parsers may offer the
 864  alternative option of processing such unknown
 865  "EE<lt>I<identifier>>" codes by firing an event especially
 866  for such codes, or by adding a special node-type to the in-memory
 867  document tree.  Such "EE<lt>I<identifier>>" may have special meaning
 868  to some processors, or some processors may choose to add them to
 869  a special error report.
 870  
 871  =item *
 872  
 873  Pod parsers must also support the XHTML codes "EE<lt>quot>" for
 874  character 34 (doublequote, "), "EE<lt>amp>" for character 38
 875  (ampersand, &), and "EE<lt>apos>" for character 39 (apostrophe, ').
 876  
 877  =item *
 878  
 879  Note that in all cases of "EE<lt>whatever>", I<whatever> (whether
 880  an htmlname, or a number in any base) must consist only of
 881  alphanumeric characters -- that is, I<whatever> must watch
 882  C<m/\A\w+\z/>.  So "EE<lt> 0 1 2 3 >" is invalid, because
 883  it contains spaces, which aren't alphanumeric characters.  This
 884  presumably does not I<need> special treatment by a Pod processor;
 885  " 0 1 2 3 " doesn't look like a number in any base, so it would
 886  presumably be looked up in the table of HTML-like names.  Since
 887  there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ",
 888  this will be treated as an error.  However, Pod processors may
 889  treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically>
 890  invalid, potentially earning a different error message than the
 891  error message (or warning, or event) generated by a merely unknown
 892  (but theoretically valid) htmlname, as in "EE<lt>qacute>"
 893  [sic].  However, Pod parsers are not required to make this
 894  distinction.
 895  
 896  =item *
 897  
 898  Note that EE<lt>number> I<must not> be interpreted as simply
 899  "codepoint I<number> in the current/native character set".  It always
 900  means only "the character represented by codepoint I<number> in
 901  Unicode."  (This is identical to the semantics of &#I<number>; in XML.)
 902  
 903  This will likely require many formatters to have tables mapping from
 904  treatable Unicode codepoints (such as the "\xE9" for the e-acute
 905  character) to the escape sequences or codes necessary for conveying
 906  such sequences in the target output format.  A converter to *roff
 907  would, for example know that "\xE9" (whether conveyed literally, or via
 908  a EE<lt>...> sequence) is to be conveyed as "e\\*'".
 909  Similarly, a program rendering Pod in a Mac OS application window, would
 910  presumably need to know that "\xE9" maps to codepoint 142 in MacRoman
 911  encoding that (at time of writing) is native for Mac OS.  Such
 912  Unicode2whatever mappings are presumably already widely available for
 913  common output formats.  (Such mappings may be incomplete!  Implementers
 914  are not expected to bend over backwards in an attempt to render
 915  Cherokee syllabics, Etruscan runes, Byzantine musical symbols, or any
 916  of the other weird things that Unicode can encode.)  And
 917  if a Pod document uses a character not found in such a mapping, the
 918  formatter should consider it an unrenderable character.
 919  
 920  =item *
 921  
 922  If, surprisingly, the implementor of a Pod formatter can't find a
 923  satisfactory pre-existing table mapping from Unicode characters to
 924  escapes in the target format (e.g., a decent table of Unicode
 925  characters to *roff escapes), it will be necessary to build such a
 926  table.  If you are in this circumstance, you should begin with the
 927  characters in the range 0x00A0 - 0x00FF, which is mostly the heavily
 928  used accented characters.  Then proceed (as patience permits and
 929  fastidiousness compels) through the characters that the (X)HTML
 930  standards groups judged important enough to merit mnemonics
 931  for.  These are declared in the (X)HTML specifications at the
 932  www.W3.org site.  At time of writing (September 2001), the most recent
 933  entity declaration files are:
 934  
 935    http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
 936    http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
 937    http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
 938  
 939  Then you can progress through any remaining notable Unicode characters
 940  in the range 0x2000-0x204D (consult the character tables at
 941  www.unicode.org), and whatever else strikes your fancy.  For example,
 942  in F<xhtml-symbol.ent>, there is the entry:
 943  
 944    <!ENTITY infin    "&#8734;"> <!-- infinity, U+221E ISOtech -->
 945  
 946  While the mapping "infin" to the character "\x{221E}" will (hopefully)
 947  have been already handled by the Pod parser, the presence of the
 948  character in this file means that it's reasonably important enough to
 949  include in a formatter's table that maps from notable Unicode characters
 950  to the codes necessary for rendering them.  So for a Unicode-to-*roff
 951  mapping, for example, this would merit the entry:
 952  
 953    "\x{221E}" => '\(in',
 954  
 955  It is eagerly hoped that in the future, increasing numbers of formats
 956  (and formatters) will support Unicode characters directly (as (X)HTML
 957  does with C<&infin;>, C<&#8734;>, or C<&#x221E;>), reducing the need
 958  for idiosyncratic mappings of Unicode-to-I<my_escapes>.
 959  
 960  =item *
 961  
 962  It is up to individual Pod formatter to display good judgement when
 963  confronted with an unrenderable character (which is distinct from an
 964  unknown EE<lt>thing> sequence that the parser couldn't resolve to
 965  anything, renderable or not).  It is good practice to map Latin letters
 966  with diacritics (like "EE<lt>eacute>"/"EE<lt>233>") to the corresponding
 967  unaccented US-ASCII letters (like a simple character 101, "e"), but
 968  clearly this is often not feasible, and an unrenderable character may
 969  be represented as "?", or the like.  In attempting a sane fallback
 970  (as from EE<lt>233> to "e"), Pod formatters may use the
 971  %Latin1Code_to_fallback table in L<Pod::Escapes|Pod::Escapes>, or
 972  L<Text::Unidecode|Text::Unidecode>, if available.
 973  
 974  For example, this Pod text:
 975  
 976    magic is enabled if you set C<$Currency> to 'E<euro>'.
 977  
 978  may be rendered as:
 979  "magic is enabled if you set C<$Currency> to 'I<?>'" or as
 980  "magic is enabled if you set C<$Currency> to 'B<[euro]>'", or as
 981  "magic is enabled if you set C<$Currency> to '[x20AC]', etc.
 982  
 983  A Pod formatter may also note, in a comment or warning, a list of what
 984  unrenderable characters were encountered.
 985  
 986  =item *
 987  
 988  EE<lt>...> may freely appear in any formatting code (other than
 989  in another EE<lt>...> or in an ZE<lt>>).  That is, "XE<lt>The
 990  EE<lt>euro>1,000,000 Solution>" is valid, as is "LE<lt>The
 991  EE<lt>euro>1,000,000 Solution|Million::Euros>".
 992  
 993  =item *
 994  
 995  Some Pod formatters output to formats that implement non-breaking
 996  spaces as an individual character (which I'll call "NBSP"), and
 997  others output to formats that implement non-breaking spaces just as
 998  spaces wrapped in a "don't break this across lines" code.  Note that
 999  at the level of Pod, both sorts of codes can occur: Pod can contain a
1000  NBSP character (whether as a literal, or as a "EE<lt>160>" or
1001  "EE<lt>nbsp>" code); and Pod can contain "SE<lt>foo
1002  IE<lt>barE<gt> baz>" codes, where "mere spaces" (character 32) in
1003  such codes are taken to represent non-breaking spaces.  Pod
1004  parsers should consider supporting the optional parsing of "SE<lt>foo
1005  IE<lt>barE<gt> baz>" as if it were
1006  "fooI<NBSP>IE<lt>barE<gt>I<NBSP>baz", and, going the other way, the
1007  optional parsing of groups of words joined by NBSP's as if each group
1008  were in a SE<lt>...> code, so that formatters may use the
1009  representation that maps best to what the output format demands.
1010  
1011  =item *
1012  
1013  Some processors may find that the C<SE<lt>...E<gt>> code is easiest to
1014  implement by replacing each space in the parse tree under the content
1015  of the S, with an NBSP.  But note: the replacement should apply I<not> to
1016  spaces in I<all> text, but I<only> to spaces in I<printable> text.  (This
1017  distinction may or may not be evident in the particular tree/event
1018  model implemented by the Pod parser.)  For example, consider this
1019  unusual case:
1020  
1021     S<L</Autoloaded Functions>>
1022  
1023  This means that the space in the middle of the visible link text must
1024  not be broken across lines.  In other words, it's the same as this:
1025  
1026     L<"AutoloadedE<160>Functions"/Autoloaded Functions>
1027  
1028  However, a misapplied space-to-NBSP replacement could (wrongly)
1029  produce something equivalent to this:
1030  
1031     L<"AutoloadedE<160>Functions"/AutoloadedE<160>Functions>
1032  
1033  ...which is almost definitely not going to work as a hyperlink (assuming
1034  this formatter outputs a format supporting hypertext).
1035  
1036  Formatters may choose to just not support the S format code,
1037  especially in cases where the output format simply has no NBSP
1038  character/code and no code for "don't break this stuff across lines".
1039  
1040  =item *
1041  
1042  Besides the NBSP character discussed above, implementors are reminded
1043  of the existence of the other "special" character in Latin-1, the
1044  "soft hyphen" character, also known as "discretionary hyphen",
1045  i.e. C<EE<lt>173E<gt>> = C<EE<lt>0xADE<gt>> =
1046  C<EE<lt>shyE<gt>>).  This character expresses an optional hyphenation
1047  point.  That is, it normally renders as nothing, but may render as a
1048  "-" if a formatter breaks the word at that point.  Pod formatters
1049  should, as appropriate, do one of the following:  1) render this with
1050  a code with the same meaning (e.g., "\-" in RTF), 2) pass it through
1051  in the expectation that the formatter understands this character as
1052  such, or 3) delete it.
1053  
1054  For example:
1055  
1056    sigE<shy>action
1057    manuE<shy>script
1058    JarkE<shy>ko HieE<shy>taE<shy>nieE<shy>mi
1059  
1060  These signal to a formatter that if it is to hyphenate "sigaction"
1061  or "manuscript", then it should be done as
1062  "sig-I<[linebreak]>action" or "manu-I<[linebreak]>script"
1063  (and if it doesn't hyphenate it, then the C<EE<lt>shyE<gt>> doesn't
1064  show up at all).  And if it is
1065  to hyphenate "Jarkko" and/or "Hietaniemi", it can do
1066  so only at the points where there is a C<EE<lt>shyE<gt>> code.
1067  
1068  In practice, it is anticipated that this character will not be used
1069  often, but formatters should either support it, or delete it.
1070  
1071  =item *
1072  
1073  If you think that you want to add a new command to Pod (like, say, a
1074  "=biblio" command), consider whether you could get the same
1075  effect with a for or begin/end sequence: "=for biblio ..." or "=begin
1076  biblio" ... "=end biblio".  Pod processors that don't understand
1077  "=for biblio", etc, will simply ignore it, whereas they may complain
1078  loudly if they see "=biblio".
1079  
1080  =item *
1081  
1082  Throughout this document, "Pod" has been the preferred spelling for
1083  the name of the documentation format.  One may also use "POD" or
1084  "pod".  For the documentation that is (typically) in the Pod
1085  format, you may use "pod", or "Pod", or "POD".  Understanding these
1086  distinctions is useful; but obsessing over how to spell them, usually
1087  is not.
1088  
1089  =back
1090  
1091  
1092  
1093  
1094  
1095  =head1 About LE<lt>...E<gt> Codes
1096  
1097  As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...>
1098  code is the most complex of the Pod formatting codes.  The points below
1099  will hopefully clarify what it means and how processors should deal
1100  with it.
1101  
1102  =over
1103  
1104  =item *
1105  
1106  In parsing an LE<lt>...> code, Pod parsers must distinguish at least
1107  four attributes:
1108  
1109  =over
1110  
1111  =item First:
1112  
1113  The link-text.  If there is none, this must be undef.  (E.g., in
1114  "LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions".
1115  In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no
1116  link text.  Note that link text may contain formatting.)
1117  
1118  =item Second:
1119  
1120  The possibly inferred link-text -- i.e., if there was no real link
1121  text, then this is the text that we'll infer in its place.  (E.g., for
1122  "LE<lt>Getopt::Std>", the inferred link text is "Getopt::Std".)
1123  
1124  =item Third:
1125  
1126  The name or URL, or undef if none.  (E.g., in "LE<lt>Perl
1127  Functions|perlfunc>", the name -- also sometimes called the page --
1128  is "perlfunc".  In "LE<lt>/CAVEATS>", the name is undef.)
1129  
1130  =item Fourth:
1131  
1132  The section (AKA "item" in older perlpods), or undef if none.  E.g.,
1133  in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section.  (Note
1134  that this is not the same as a manpage section like the "5" in "man 5
1135  crontab".  "Section Foo" in the Pod sense means the part of the text
1136  that's introduced by the heading or item whose text is "Foo".)
1137  
1138  =back
1139  
1140  Pod parsers may also note additional attributes including:
1141  
1142  =over
1143  
1144  =item Fifth:
1145  
1146  A flag for whether item 3 (if present) is a URL (like
1147  "http://lists.perl.org" is), in which case there should be no section
1148  attribute; a Pod name (like "perldoc" and "Getopt::Std" are); or
1149  possibly a man page name (like "crontab(5)" is).
1150  
1151  =item Sixth:
1152  
1153  The raw original LE<lt>...> content, before text is split on
1154  "|", "/", etc, and before EE<lt>...> codes are expanded.
1155  
1156  =back
1157  
1158  (The above were numbered only for concise reference below.  It is not
1159  a requirement that these be passed as an actual list or array.)
1160  
1161  For example:
1162  
1163    L<Foo::Bar>
1164      =>  undef,                          # link text
1165          "Foo::Bar",                     # possibly inferred link text
1166          "Foo::Bar",                     # name
1167          undef,                          # section
1168          'pod',                          # what sort of link
1169          "Foo::Bar"                      # original content
1170  
1171    L<Perlport's section on NL's|perlport/Newlines>
1172      =>  "Perlport's section on NL's",   # link text
1173          "Perlport's section on NL's",   # possibly inferred link text
1174          "perlport",                     # name
1175          "Newlines",                     # section
1176          'pod',                          # what sort of link
1177          "Perlport's section on NL's|perlport/Newlines" # orig. content
1178  
1179    L<perlport/Newlines>
1180      =>  undef,                          # link text
1181          '"Newlines" in perlport',       # possibly inferred link text
1182          "perlport",                     # name
1183          "Newlines",                     # section
1184          'pod',                          # what sort of link
1185          "perlport/Newlines"             # original content
1186  
1187    L<crontab(5)/"DESCRIPTION">
1188      =>  undef,                          # link text
1189          '"DESCRIPTION" in crontab(5)',  # possibly inferred link text
1190          "crontab(5)",                   # name
1191          "DESCRIPTION",                  # section
1192          'man',                          # what sort of link
1193          'crontab(5)/"DESCRIPTION"'      # original content
1194  
1195    L</Object Attributes>
1196      =>  undef,                          # link text
1197          '"Object Attributes"',          # possibly inferred link text
1198          undef,                          # name
1199          "Object Attributes",            # section
1200          'pod',                          # what sort of link
1201          "/Object Attributes"            # original content
1202  
1203    L<http://www.perl.org/>
1204      =>  undef,                          # link text
1205          "http://www.perl.org/",         # possibly inferred link text
1206          "http://www.perl.org/",         # name
1207          undef,                          # section
1208          'url',                          # what sort of link
1209          "http://www.perl.org/"          # original content
1210  
1211  Note that you can distinguish URL-links from anything else by the
1212  fact that they match C<m/\A\w+:[^:\s]\S*\z/>.  So
1213  C<LE<lt>http://www.perl.comE<gt>> is a URL, but
1214  C<LE<lt>HTTP::ResponseE<gt>> isn't.
1215  
1216  =item *
1217  
1218  In case of LE<lt>...> codes with no "text|" part in them,
1219  older formatters have exhibited great variation in actually displaying
1220  the link or cross reference.  For example, LE<lt>crontab(5)> would render
1221  as "the C<crontab(5)> manpage", or "in the C<crontab(5)> manpage"
1222  or just "C<crontab(5)>".
1223  
1224  Pod processors must now treat "text|"-less links as follows:
1225  
1226    L<name>         =>  L<name|name>
1227    L</section>     =>  L<"section"|/section>
1228    L<name/section> =>  L<"section" in name|name/section>
1229  
1230  =item *
1231  
1232  Note that section names might contain markup.  I.e., if a section
1233  starts with:
1234  
1235    =head2 About the C<-M> Operator
1236  
1237  or with:
1238  
1239    =item About the C<-M> Operator
1240  
1241  then a link to it would look like this:
1242  
1243    L<somedoc/About the C<-M> Operator>
1244  
1245  Formatters may choose to ignore the markup for purposes of resolving
1246  the link and use only the renderable characters in the section name,
1247  as in:
1248  
1249    <h1><a name="About_the_-M_Operator">About the <code>-M</code>
1250    Operator</h1>
1251  
1252    ...
1253  
1254    <a href="somedoc#About_the_-M_Operator">About the <code>-M</code>
1255    Operator" in somedoc</a>
1256  
1257  =item *
1258  
1259  Previous versions of perlpod distinguished C<LE<lt>name/"section"E<gt>>
1260  links from C<LE<lt>name/itemE<gt>> links (and their targets).  These
1261  have been merged syntactically and semantically in the current
1262  specification, and I<section> can refer either to a "=headI<n> Heading
1263  Content" command or to a "=item Item Content" command.  This
1264  specification does not specify what behavior should be in the case
1265  of a given document having several things all seeming to produce the
1266  same I<section> identifier (e.g., in HTML, several things all producing
1267  the same I<anchorname> in <a name="I<anchorname>">...</a>
1268  elements).  Where Pod processors can control this behavior, they should
1269  use the first such anchor.  That is, C<LE<lt>Foo/BarE<gt>> refers to the
1270  I<first> "Bar" section in Foo.
1271  
1272  But for some processors/formats this cannot be easily controlled; as
1273  with the HTML example, the behavior of multiple ambiguous
1274  <a name="I<anchorname>">...</a> is most easily just left up to
1275  browsers to decide.
1276  
1277  =item *
1278  
1279  Authors wanting to link to a particular (absolute) URL, must do so
1280  only with "LE<lt>scheme:...>" codes (like
1281  LE<lt>http://www.perl.org>), and must not attempt "LE<lt>Some Site
1282  Name|scheme:...>" codes.  This restriction avoids many problems
1283  in parsing and rendering LE<lt>...> codes.
1284  
1285  =item *
1286  
1287  In a C<LE<lt>text|...E<gt>> code, text may contain formatting codes
1288  for formatting or for EE<lt>...> escapes, as in:
1289  
1290    L<B<ummE<234>stuff>|...>
1291  
1292  For C<LE<lt>...E<gt>> codes without a "name|" part, only
1293  C<EE<lt>...E<gt>> and C<ZE<lt>E<gt>> codes may occur -- no
1294  other formatting codes.  That is, authors should not use
1295  "C<LE<lt>BE<lt>Foo::BarE<gt>E<gt>>".
1296  
1297  Note, however, that formatting codes and ZE<lt>>'s can occur in any
1298  and all parts of an LE<lt>...> (i.e., in I<name>, I<section>, I<text>,
1299  and I<url>).
1300  
1301  Authors must not nest LE<lt>...> codes.  For example, "LE<lt>The
1302  LE<lt>Foo::Bar> man page>" should be treated as an error.
1303  
1304  =item *
1305  
1306  Note that Pod authors may use formatting codes inside the "text"
1307  part of "LE<lt>text|name>" (and so on for LE<lt>text|/"sec">).
1308  
1309  In other words, this is valid:
1310  
1311    Go read L<the docs on C<$.>|perlvar/"$.">
1312  
1313  Some output formats that do allow rendering "LE<lt>...>" codes as
1314  hypertext, might not allow the link-text to be formatted; in
1315  that case, formatters will have to just ignore that formatting.
1316  
1317  =item *
1318  
1319  At time of writing, C<LE<lt>nameE<gt>> values are of two types:
1320  either the name of a Pod page like C<LE<lt>Foo::BarE<gt>> (which
1321  might be a real Perl module or program in an @INC / PATH
1322  directory, or a .pod file in those places); or the name of a UNIX
1323  man page, like C<LE<lt>crontab(5)E<gt>>.  In theory, C<LE<lt>chmodE<gt>>
1324  in ambiguous between a Pod page called "chmod", or the Unix man page
1325  "chmod" (in whatever man-section).  However, the presence of a string
1326  in parens, as in "crontab(5)", is sufficient to signal that what
1327  is being discussed is not a Pod page, and so is presumably a
1328  UNIX man page.  The distinction is of no importance to many
1329  Pod processors, but some processors that render to hypertext formats
1330  may need to distinguish them in order to know how to render a
1331  given C<LE<lt>fooE<gt>> code.
1332  
1333  =item *
1334  
1335  Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax
1336  (as in C<LE<lt>Object AttributesE<gt>>), which was not easily distinguishable
1337  from C<LE<lt>nameE<gt>> syntax.  This syntax is no longer in the
1338  specification, and has been replaced by the C<LE<lt>"section"E<gt>> syntax
1339  (where the quotes were formerly optional).  Pod parsers should tolerate
1340  the C<LE<lt>sectionE<gt>> syntax, for a while at least.  The suggested
1341  heuristic for distinguishing C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>>
1342  is that if it contains any whitespace, it's a I<section>.  Pod processors
1343  may warn about this being deprecated syntax.
1344  
1345  =back
1346  
1347  =head1 About =over...=back Regions
1348  
1349  "=over"..."=back" regions are used for various kinds of list-like
1350  structures.  (I use the term "region" here simply as a collective
1351  term for everything from the "=over" to the matching "=back".)
1352  
1353  =over
1354  
1355  =item *
1356  
1357  The non-zero numeric I<indentlevel> in "=over I<indentlevel>" ...
1358  "=back" is used for giving the formatter a clue as to how many
1359  "spaces" (ems, or roughly equivalent units) it should tab over,
1360  although many formatters will have to convert this to an absolute
1361  measurement that may not exactly match with the size of spaces (or M's)
1362  in the document's base font.  Other formatters may have to completely
1363  ignore the number.  The lack of any explicit I<indentlevel> parameter is
1364  equivalent to an I<indentlevel> value of 4.  Pod processors may
1365  complain if I<indentlevel> is present but is not a positive number
1366  matching C<m/\A(\d*\.)?\d+\z/>.
1367  
1368  =item *
1369  
1370  Authors of Pod formatters are reminded that "=over" ... "=back" may
1371  map to several different constructs in your output format.  For
1372  example, in converting Pod to (X)HTML, it can map to any of
1373  <ul>...</ul>, <ol>...</ol>, <dl>...</dl>, or
1374  <blockquote>...</blockquote>.  Similarly, "=item" can map to <li> or
1375  <dt>.
1376  
1377  =item *
1378  
1379  Each "=over" ... "=back" region should be one of the following:
1380  
1381  =over
1382  
1383  =item *
1384  
1385  An "=over" ... "=back" region containing only "=item *" commands,
1386  each followed by some number of ordinary/verbatim paragraphs, other
1387  nested "=over" ... "=back" regions, "=for..." paragraphs, and
1388  "=begin"..."=end" regions.
1389  
1390  (Pod processors must tolerate a bare "=item" as if it were "=item
1391  *".)  Whether "*" is rendered as a literal asterisk, an "o", or as
1392  some kind of real bullet character, is left up to the Pod formatter,
1393  and may depend on the level of nesting.
1394  
1395  =item *
1396  
1397  An "=over" ... "=back" region containing only
1398  C<m/\A=item\s+\d+\.?\s*\z/> paragraphs, each one (or each group of them)
1399  followed by some number of ordinary/verbatim paragraphs, other nested
1400  "=over" ... "=back" regions, "=for..." paragraphs, and/or
1401  "=begin"..."=end" codes.  Note that the numbers must start at 1
1402  in each section, and must proceed in order and without skipping
1403  numbers.
1404  
1405  (Pod processors must tolerate lines like "=item 1" as if they were
1406  "=item 1.", with the period.)
1407  
1408  =item *
1409  
1410  An "=over" ... "=back" region containing only "=item [text]"
1411  commands, each one (or each group of them) followed by some number of
1412  ordinary/verbatim paragraphs, other nested "=over" ... "=back"
1413  regions, or "=for..." paragraphs, and "=begin"..."=end" regions.
1414  
1415  The "=item [text]" paragraph should not match
1416  C<m/\A=item\s+\d+\.?\s*\z/> or C<m/\A=item\s+\*\s*\z/>, nor should it
1417  match just C<m/\A=item\s*\z/>.
1418  
1419  =item *
1420  
1421  An "=over" ... "=back" region containing no "=item" paragraphs at
1422  all, and containing only some number of 
1423  ordinary/verbatim paragraphs, and possibly also some nested "=over"
1424  ... "=back" regions, "=for..." paragraphs, and "=begin"..."=end"
1425  regions.  Such an itemless "=over" ... "=back" region in Pod is
1426  equivalent in meaning to a "<blockquote>...</blockquote>" element in
1427  HTML.
1428  
1429  =back
1430  
1431  Note that with all the above cases, you can determine which type of
1432  "=over" ... "=back" you have, by examining the first (non-"=cut", 
1433  non-"=pod") Pod paragraph after the "=over" command.
1434  
1435  =item *
1436  
1437  Pod formatters I<must> tolerate arbitrarily large amounts of text
1438  in the "=item I<text...>" paragraph.  In practice, most such
1439  paragraphs are short, as in:
1440  
1441    =item For cutting off our trade with all parts of the world
1442  
1443  But they may be arbitrarily long:
1444  
1445    =item For transporting us beyond seas to be tried for pretended
1446    offenses
1447  
1448    =item He is at this time transporting large armies of foreign
1449    mercenaries to complete the works of death, desolation and
1450    tyranny, already begun with circumstances of cruelty and perfidy
1451    scarcely paralleled in the most barbarous ages, and totally
1452    unworthy the head of a civilized nation.
1453  
1454  =item *
1455  
1456  Pod processors should tolerate "=item *" / "=item I<number>" commands
1457  with no accompanying paragraph.  The middle item is an example:
1458  
1459    =over
1460  
1461    =item 1
1462  
1463    Pick up dry cleaning.
1464  
1465    =item 2
1466  
1467    =item 3
1468  
1469    Stop by the store.  Get Abba Zabas, Stoli, and cheap lawn chairs.
1470  
1471    =back
1472  
1473  =item *
1474  
1475  No "=over" ... "=back" region can contain headings.  Processors may
1476  treat such a heading as an error.
1477  
1478  =item *
1479  
1480  Note that an "=over" ... "=back" region should have some
1481  content.  That is, authors should not have an empty region like this:
1482  
1483    =over
1484  
1485    =back
1486  
1487  Pod processors seeing such a contentless "=over" ... "=back" region,
1488  may ignore it, or may report it as an error.
1489  
1490  =item *
1491  
1492  Processors must tolerate an "=over" list that goes off the end of the
1493  document (i.e., which has no matching "=back"), but they may warn
1494  about such a list.
1495  
1496  =item *
1497  
1498  Authors of Pod formatters should note that this construct:
1499  
1500    =item Neque
1501  
1502    =item Porro
1503  
1504    =item Quisquam Est
1505  
1506    Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci 
1507    velit, sed quia non numquam eius modi tempora incidunt ut
1508    labore et dolore magnam aliquam quaerat voluptatem.
1509  
1510    =item Ut Enim
1511  
1512  is semantically ambiguous, in a way that makes formatting decisions
1513  a bit difficult.  On the one hand, it could be mention of an item
1514  "Neque", mention of another item "Porro", and mention of another
1515  item "Quisquam Est", with just the last one requiring the explanatory
1516  paragraph "Qui dolorem ipsum quia dolor..."; and then an item
1517  "Ut Enim".  In that case, you'd want to format it like so:
1518  
1519    Neque
1520  
1521    Porro
1522  
1523    Quisquam Est
1524      Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1525      velit, sed quia non numquam eius modi tempora incidunt ut
1526      labore et dolore magnam aliquam quaerat voluptatem.
1527  
1528    Ut Enim
1529  
1530  But it could equally well be a discussion of three (related or equivalent)
1531  items, "Neque", "Porro", and "Quisquam Est", followed by a paragraph
1532  explaining them all, and then a new item "Ut Enim".  In that case, you'd
1533  probably want to format it like so:
1534  
1535    Neque
1536    Porro
1537    Quisquam Est
1538      Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1539      velit, sed quia non numquam eius modi tempora incidunt ut
1540      labore et dolore magnam aliquam quaerat voluptatem.
1541  
1542    Ut Enim
1543  
1544  But (for the foreseeable future), Pod does not provide any way for Pod
1545  authors to distinguish which grouping is meant by the above
1546  "=item"-cluster structure.  So formatters should format it like so:
1547  
1548    Neque
1549  
1550    Porro
1551  
1552    Quisquam Est
1553  
1554      Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
1555      velit, sed quia non numquam eius modi tempora incidunt ut
1556      labore et dolore magnam aliquam quaerat voluptatem.
1557  
1558    Ut Enim
1559  
1560  That is, there should be (at least roughly) equal spacing between
1561  items as between paragraphs (although that spacing may well be less
1562  than the full height of a line of text).  This leaves it to the reader
1563  to use (con)textual cues to figure out whether the "Qui dolorem
1564  ipsum..." paragraph applies to the "Quisquam Est" item or to all three
1565  items "Neque", "Porro", and "Quisquam Est".  While not an ideal
1566  situation, this is preferable to providing formatting cues that may
1567  be actually contrary to the author's intent.
1568  
1569  =back
1570  
1571  
1572  
1573  =head1 About Data Paragraphs and "=begin/=end" Regions
1574  
1575  Data paragraphs are typically used for inlining non-Pod data that is
1576  to be used (typically passed through) when rendering the document to
1577  a specific format:
1578  
1579    =begin rtf
1580  
1581    \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1582  
1583    =end rtf
1584  
1585  The exact same effect could, incidentally, be achieved with a single
1586  "=for" paragraph:
1587  
1588    =for rtf \par{\pard\qr\sa4500{\i Printed\~\chdate\~\chtime}\par}
1589  
1590  (Although that is not formally a data paragraph, it has the same
1591  meaning as one, and Pod parsers may parse it as one.)
1592  
1593  Another example of a data paragraph:
1594  
1595    =begin html
1596  
1597    I like <em>PIE</em>!
1598  
1599    <hr>Especially pecan pie!
1600  
1601    =end html
1602  
1603  If these were ordinary paragraphs, the Pod parser would try to
1604  expand the "EE<lt>/em>" (in the first paragraph) as a formatting
1605  code, just like "EE<lt>lt>" or "EE<lt>eacute>".  But since this
1606  is in a "=begin I<identifier>"..."=end I<identifier>" region I<and>
1607  the identifier "html" doesn't begin have a ":" prefix, the contents
1608  of this region are stored as data paragraphs, instead of being
1609  processed as ordinary paragraphs (or if they began with a spaces
1610  and/or tabs, as verbatim paragraphs).
1611  
1612  As a further example: At time of writing, no "biblio" identifier is
1613  supported, but suppose some processor were written to recognize it as
1614  a way of (say) denoting a bibliographic reference (necessarily
1615  containing formatting codes in ordinary paragraphs).  The fact that
1616  "biblio" paragraphs were meant for ordinary processing would be
1617  indicated by prefacing each "biblio" identifier with a colon:
1618  
1619    =begin :biblio
1620  
1621    Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1622    Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1623  
1624    =end :biblio
1625  
1626  This would signal to the parser that paragraphs in this begin...end
1627  region are subject to normal handling as ordinary/verbatim paragraphs
1628  (while still tagged as meant only for processors that understand the
1629  "biblio" identifier).  The same effect could be had with:
1630  
1631    =for :biblio
1632    Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1633    Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1634  
1635  The ":" on these identifiers means simply "process this stuff
1636  normally, even though the result will be for some special target".
1637  I suggest that parser APIs report "biblio" as the target identifier,
1638  but also report that it had a ":" prefix.  (And similarly, with the
1639  above "html", report "html" as the target identifier, and note the
1640  I<lack> of a ":" prefix.)
1641  
1642  Note that a "=begin I<identifier>"..."=end I<identifier>" region where
1643  I<identifier> begins with a colon, I<can> contain commands.  For example:
1644  
1645    =begin :biblio
1646  
1647    Wirth's classic is available in several editions, including:
1648  
1649    =for comment
1650     hm, check abebooks.com for how much used copies cost.
1651  
1652    =over
1653  
1654    =item
1655  
1656    Wirth, Niklaus.  1975.  I<Algorithmen und Datenstrukturen.>
1657    Teubner, Stuttgart.  [Yes, it's in German.]
1658  
1659    =item
1660  
1661    Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1662    Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1663  
1664    =back
1665  
1666    =end :biblio
1667  
1668  Note, however, a "=begin I<identifier>"..."=end I<identifier>"
1669  region where I<identifier> does I<not> begin with a colon, should not
1670  directly contain "=head1" ... "=head4" commands, nor "=over", nor "=back",
1671  nor "=item".  For example, this may be considered invalid:
1672  
1673    =begin somedata
1674  
1675    This is a data paragraph.
1676  
1677    =head1 Don't do this!
1678  
1679    This is a data paragraph too.
1680  
1681    =end somedata
1682  
1683  A Pod processor may signal that the above (specifically the "=head1"
1684  paragraph) is an error.  Note, however, that the following should
1685  I<not> be treated as an error:
1686  
1687    =begin somedata
1688  
1689    This is a data paragraph.
1690  
1691    =cut
1692  
1693    # Yup, this isn't Pod anymore.
1694    sub excl { (rand() > .5) ? "hoo!" : "hah!" }
1695  
1696    =pod
1697  
1698    This is a data paragraph too.
1699  
1700    =end somedata
1701  
1702  And this too is valid:
1703  
1704    =begin someformat
1705  
1706    This is a data paragraph.
1707  
1708      And this is a data paragraph.
1709  
1710    =begin someotherformat
1711  
1712    This is a data paragraph too.
1713  
1714      And this is a data paragraph too.
1715  
1716    =begin :yetanotherformat
1717  
1718    =head2 This is a command paragraph!
1719  
1720    This is an ordinary paragraph!
1721  
1722      And this is a verbatim paragraph!
1723  
1724    =end :yetanotherformat
1725  
1726    =end someotherformat
1727  
1728    Another data paragraph!
1729  
1730    =end someformat
1731  
1732  The contents of the above "=begin :yetanotherformat" ...
1733  "=end :yetanotherformat" region I<aren't> data paragraphs, because
1734  the immediately containing region's identifier (":yetanotherformat")
1735  begins with a colon.  In practice, most regions that contain
1736  data paragraphs will contain I<only> data paragraphs; however, 
1737  the above nesting is syntactically valid as Pod, even if it is
1738  rare.  However, the handlers for some formats, like "html",
1739  will accept only data paragraphs, not nested regions; and they may
1740  complain if they see (targeted for them) nested regions, or commands,
1741  other than "=end", "=pod", and "=cut".
1742  
1743  Also consider this valid structure:
1744  
1745    =begin :biblio
1746  
1747    Wirth's classic is available in several editions, including:
1748  
1749    =over
1750  
1751    =item
1752  
1753    Wirth, Niklaus.  1975.  I<Algorithmen und Datenstrukturen.>
1754    Teubner, Stuttgart.  [Yes, it's in German.]
1755  
1756    =item
1757  
1758    Wirth, Niklaus.  1976.  I<Algorithms + Data Structures =
1759    Programs.>  Prentice-Hall, Englewood Cliffs, NJ.
1760  
1761    =back
1762  
1763    Buy buy buy!
1764  
1765    =begin html
1766  
1767    <img src='wirth_spokesmodeling_book.png'>
1768  
1769    <hr>
1770  
1771    =end html
1772  
1773    Now now now!
1774  
1775    =end :biblio
1776  
1777  There, the "=begin html"..."=end html" region is nested inside
1778  the larger "=begin :biblio"..."=end :biblio" region.  Note that the
1779  content of the "=begin html"..."=end html" region is data
1780  paragraph(s), because the immediately containing region's identifier
1781  ("html") I<doesn't> begin with a colon.
1782  
1783  Pod parsers, when processing a series of data paragraphs one
1784  after another (within a single region), should consider them to
1785  be one large data paragraph that happens to contain blank lines.  So
1786  the content of the above "=begin html"..."=end html" I<may> be stored
1787  as two data paragraphs (one consisting of
1788  "<img src='wirth_spokesmodeling_book.png'>\n"
1789  and another consisting of "<hr>\n"), but I<should> be stored as
1790  a single data paragraph (consisting of 
1791  "<img src='wirth_spokesmodeling_book.png'>\n\n<hr>\n").
1792  
1793  Pod processors should tolerate empty
1794  "=begin I<something>"..."=end I<something>" regions,
1795  empty "=begin :I<something>"..."=end :I<something>" regions, and
1796  contentless "=for I<something>" and "=for :I<something>"
1797  paragraphs.  I.e., these should be tolerated:
1798  
1799    =for html
1800  
1801    =begin html
1802  
1803    =end html
1804  
1805    =begin :biblio
1806  
1807    =end :biblio
1808  
1809  Incidentally, note that there's no easy way to express a data
1810  paragraph starting with something that looks like a command.  Consider:
1811  
1812    =begin stuff
1813  
1814    =shazbot
1815  
1816    =end stuff
1817  
1818  There, "=shazbot" will be parsed as a Pod command "shazbot", not as a data
1819  paragraph "=shazbot\n".  However, you can express a data paragraph consisting
1820  of "=shazbot\n" using this code:
1821  
1822    =for stuff =shazbot
1823  
1824  The situation where this is necessary, is presumably quite rare.
1825  
1826  Note that =end commands must match the currently open =begin command.  That
1827  is, they must properly nest.  For example, this is valid:
1828  
1829    =begin outer
1830  
1831    X
1832  
1833    =begin inner
1834  
1835    Y
1836  
1837    =end inner
1838  
1839    Z
1840  
1841    =end outer
1842  
1843  while this is invalid:
1844  
1845    =begin outer
1846  
1847    X
1848  
1849    =begin inner
1850  
1851    Y
1852  
1853    =end outer
1854  
1855    Z
1856  
1857    =end inner
1858  
1859  This latter is improper because when the "=end outer" command is seen, the
1860  currently open region has the formatname "inner", not "outer".  (It just
1861  happens that "outer" is the format name of a higher-up region.)  This is
1862  an error.  Processors must by default report this as an error, and may halt
1863  processing the document containing that error.  A corollary of this is that
1864  regions cannot "overlap" -- i.e., the latter block above does not represent
1865  a region called "outer" which contains X and Y, overlapping a region called
1866  "inner" which contains Y and Z.  But because it is invalid (as all
1867  apparently overlapping regions would be), it doesn't represent that, or
1868  anything at all.
1869  
1870  Similarly, this is invalid:
1871  
1872    =begin thing
1873  
1874    =end hting
1875  
1876  This is an error because the region is opened by "thing", and the "=end"
1877  tries to close "hting" [sic].
1878  
1879  This is also invalid:
1880  
1881    =begin thing
1882  
1883    =end
1884  
1885  This is invalid because every "=end" command must have a formatname
1886  parameter.
1887  
1888  =head1 SEE ALSO
1889  
1890  L<perlpod>, L<perlsyn/"PODs: Embedded Documentation">,
1891  L<podchecker>
1892  
1893  =head1 AUTHOR
1894  
1895  Sean M. Burke
1896  
1897  =cut
1898  
1899  


Generated: Tue Mar 17 22:47:18 2015 Cross-referenced by PHPXref 0.7.1