summary refs log tree commit diff stats
path: root/doc/pegdocs.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/pegdocs.txt')
-rw-r--r--doc/pegdocs.txt100
1 files changed, 54 insertions, 46 deletions
diff --git a/doc/pegdocs.txt b/doc/pegdocs.txt
index 4c557aed8..0a8fd8187 100644
--- a/doc/pegdocs.txt
+++ b/doc/pegdocs.txt
@@ -13,12 +13,12 @@ notation           meaning
 ``A / ... / Z``    Ordered choice: Apply expressions `A`, ..., `Z`, in this
                    order, to the text ahead, until one of them succeeds and
                    possibly consumes some text. Indicate success if one of
-                   expressions succeeded. Otherwise do not consume any text
+                   expressions succeeded. Otherwise, do not consume any text
                    and indicate failure.
 ``A ... Z``        Sequence: Apply expressions `A`, ..., `Z`, in this order,
                    to consume consecutive portions of the text ahead, as long
                    as they succeed. Indicate success if all succeeded.
-                   Otherwise do not consume any text and indicate failure.
+                   Otherwise, do not consume any text and indicate failure.
                    The sequence's precedence is higher than that of ordered
                    choice: ``A B / C`` means ``(A B) / Z`` and
                    not ``A (B / Z)``.
@@ -27,7 +27,10 @@ notation           meaning
 ``{E}``            Capture: Apply expression `E` and store the substring
                    that matched `E` into a *capture* that can be accessed
                    after the matching process.
-``$i``             Back reference to the ``i``th capture. ``i`` counts from 1.
+``{}``             Empty capture: Delete the last capture. No character
+                   is consumed.
+``$i``             Back reference to the ``i``th capture. ``i`` counts forwards
+                   from 1 or backwards (last capture to first) from ^1.
 ``$``              Anchor: Matches at the end of the input. No character
                    is consumed. Same as ``!.``.
 ``^``              Anchor: Matches at the start of the input. No character
@@ -41,20 +44,20 @@ notation           meaning
 ``E+``             One or more: Apply expression `E` repeatedly to match
                    the text ahead, as long as it succeeds. Consume the matched
                    text (if any) and indicate success if there was at least
-                   one match. Otherwise indicate failure.
+                   one match. Otherwise, indicate failure.
 ``E*``             Zero or more: Apply expression `E` repeatedly to match
                    the text ahead, as long as it succeeds. Consume the matched
                    text (if any). Always indicate success.
 ``E?``             Zero or one: If expression `E` matches the text ahead,
                    consume it. Always indicate success.
 ``[s]``            Character class: If the character ahead appears in the
-                   string `s`, consume it and indicate success. Otherwise
+                   string `s`, consume it and indicate success. Otherwise,
                    indicate failure.
 ``[a-b]``          Character range: If the character ahead is one from the
                    range `a` through `b`, consume it and indicate success.
-                   Otherwise indicate failure.
+                   Otherwise, indicate failure.
 ``'s'``            String: If the text ahead is the string `s`, consume it
-                   and indicate success. Otherwise indicate failure.
+                   and indicate success. Otherwise, indicate failure.
 ``i's'``           String match ignoring case.
 ``y's'``           String match ignoring style.
 ``v's'``           Verbatim string match: Use this to override a global
@@ -63,15 +66,15 @@ notation           meaning
 ``y$j``            String match ignoring style for back reference.
 ``v$j``            Verbatim string match for back reference.
 ``.``              Any character: If there is a character ahead, consume it
-                   and indicate success. Otherwise (that is, at the end of
+                   and indicate success. Otherwise, (that is, at the end of
                    input) indicate failure.
-``_``              Any Unicode character: If there is an UTF-8 character
-                   ahead, consume it and indicate success. Otherwise indicate
+``_``              Any Unicode character: If there is a UTF-8 character
+                   ahead, consume it and indicate success. Otherwise, indicate
                    failure.
 ``@E``             Search: Shorthand for ``(!E .)* E``. (Search loop for the
                    pattern `E`.)
 ``{@} E``          Captured Search: Shorthand for ``{(!E .)*} E``. (Search
-                   loop for the pattern `E`.) Everything until and exluding
+                   loop for the pattern `E`.) Everything until and excluding
                    `E` is captured.
 ``@@ E``           Same as ``{@} E``.
 ``A <- E``         Rule: Bind the expression `E` to the *nonterminal symbol*
@@ -79,7 +82,7 @@ notation           meaning
                    matching engine.**
 ``\identifier``    Built-in macro for a longer expression.
 ``\ddd``           Character with decimal code *ddd*.
-``\"``, etc        Literal ``"``, etc.
+``\"``, etc.       Literal ``"``, etc.
 ===============    ============================================================
 
 
@@ -128,51 +131,53 @@ notation           meaning
 Supported PEG grammar
 ---------------------
 
-The PEG parser implements this grammar (written in PEG syntax)::
+The PEG parser implements this grammar (written in PEG syntax):
 
-  # Example grammar of PEG in PEG syntax.
-  # Comments start with '#'.
-  # First symbol is the start symbol.
+    # Example grammar of PEG in PEG syntax.
+    # Comments start with '#'.
+    # First symbol is the start symbol.
 
-  grammar <- rule* / expr
+    grammar <- rule* / expr
 
-  identifier <- [A-Za-z][A-Za-z0-9_]*
-  charsetchar <- "\\" . / [^\]]
-  charset <- "[" "^"? (charsetchar ("-" charsetchar)?)+ "]"
-  stringlit <- identifier? ("\"" ("\\" . / [^"])* "\"" /
-                            "'" ("\\" . / [^'])* "'")
-  builtin <- "\\" identifier / [^\13\10]
+    identifier <- [A-Za-z][A-Za-z0-9_]*
+    charsetchar <- "\\" . / [^\]]
+    charset <- "[" "^"? (charsetchar ("-" charsetchar)?)+ "]"
+    stringlit <- identifier? ("\"" ("\\" . / [^"])* "\"" /
+                              "'" ("\\" . / [^'])* "'")
+    builtin <- "\\" identifier / [^\13\10]
 
-  comment <- '#' @ \n
-  ig <- (\s / comment)* # things to ignore
+    comment <- '#' @ \n
+    ig <- (\s / comment)* # things to ignore
 
-  rule <- identifier \s* "<-" expr ig
-  identNoArrow <- identifier !(\s* "<-")
-  prefixOpr <- ig '&' / ig '!' / ig '@' / ig '{@}' / ig '@@'
-  literal <- ig identifier? '$' [0-9]+ / '$' / '^' /
-             ig identNoArrow /
-             ig charset /
-             ig stringlit /
-             ig builtin /
-             ig '.' /
-             ig '_' /
-             (ig "(" expr ig ")")
-  postfixOpr <- ig '?' / ig '*' / ig '+'
-  primary <- prefixOpr* (literal postfixOpr*)
+    rule <- identifier \s* "<-" expr ig
+    identNoArrow <- identifier !(\s* "<-")
+    prefixOpr <- ig '&' / ig '!' / ig '@' / ig '{@}' / ig '@@'
+    literal <- ig identifier? '$' '^'? [0-9]+ / '$' / '^' /
+               ig identNoArrow /
+               ig charset /
+               ig stringlit /
+               ig builtin /
+               ig '.' /
+               ig '_' /
+               (ig "(" expr ig ")") /
+               (ig "{" expr? ig "}")
+    postfixOpr <- ig '?' / ig '*' / ig '+'
+    primary <- prefixOpr* (literal postfixOpr*)
 
-  # Concatenation has higher priority than choice:
-  # ``a b / c`` means ``(a b) / c``
+    # Concatenation has higher priority than choice:
+    # ``a b / c`` means ``(a b) / c``
 
-  seqExpr <- primary+
-  expr <- seqExpr (ig "/" expr)*
+    seqExpr <- primary+
+    expr <- seqExpr (ig "/" expr)*
 
 
 **Note**: As a special syntactic extension if the whole PEG is only a single
 expression, identifiers are not interpreted as non-terminals, but are
 interpreted as verbatim string:
 
-.. code-block:: nim
+  ```nim
   abc =~ peg"abc" # is true
+  ```
 
 So it is not necessary to write ``peg" 'abc' "`` in the above example.
 
@@ -182,22 +187,25 @@ Examples
 
 Check if `s` matches Nim's "while" keyword:
 
-.. code-block:: nim
+  ```nim
   s =~ peg" y'while'"
+  ```
 
 Exchange (key, val)-pairs:
 
-.. code-block:: nim
+  ```nim
   "key: val; key2: val2".replacef(peg"{\ident} \s* ':' \s* {\ident}", "$2: $1")
+  ```
 
 Determine the ``#include``'ed files of a C file:
 
-.. code-block:: nim
+  ```nim
   for line in lines("myfile.c"):
     if line =~ peg"""s <- ws '#include' ws '"' {[^"]+} '"' ws
                      comment <- '/*' @ '*/' / '//' .*
                      ws <- (comment / \s+)* """:
       echo matches[0]
+  ```
 
 PEG vs regular expression
 -------------------------