summary refs log tree commit diff stats
path: root/doc/manual.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/manual.txt')
-rw-r--r--doc/manual.txt565
1 files changed, 399 insertions, 166 deletions
diff --git a/doc/manual.txt b/doc/manual.txt
index cd982302f..04b8bb97b 100644
--- a/doc/manual.txt
+++ b/doc/manual.txt
@@ -11,6 +11,10 @@ Nimrod Manual
 About this document
 ===================
 
+**Note**: This document is a draft! Several of Nimrod's features need more
+precise wording. This manual will evolve into a proper specification some
+day.
+
 This document describes the lexis, the syntax, and the semantics of Nimrod.
 
 The language constructs are explained using an extended BNF, in
@@ -18,10 +22,11 @@ which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and
 ``(a)?`` means an optional *a*; an alternative spelling for optional parts is
 ``[a]``. The ``|`` symbol is used to mark alternatives
 and has the lowest precedence. Parentheses may be used to group elements.
-Non-terminals are in lowercase, terminal symbols (including keywords) are in
-UPPERCASE. An example::
+Non-terminals start with a lowercase letter, abstract terminal symbols are in
+UPPERCASE. Verbatim terminal symbols (including keywords) are quoted
+with ``'``. An example::
 
-  if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts]
+  ifStmt ::= 'if' expr ':' stmts ('elif' expr ':' stmts)* ['else' stmts]
 
 Other parts of Nimrod - like scoping rules or runtime semantics are only
 described in an informal manner. The reason is that formal semantics are
@@ -90,8 +95,7 @@ Indentation consists only of spaces; tabulators are not allowed.
 The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD``
 (same indentation) are generated by the scanner, denoting an indentation.
 
-These terminals are only generated for lines that are not empty or contain
-only whitespace and comments.
+These terminals are only generated for lines that are not empty.
 
 The parser and the scanner communicate over a stack which indentation terminal
 should be generated: The stack consists of integers counting the spaces. The
@@ -100,14 +104,17 @@ If the current indentation token consists of more spaces than the entry at the
 top of the stack, a ``IND`` token is generated, else if it consists of the same
 number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces,
 a ``DED`` token is generated for any item on the stack that is greater than the
-current. These items are then popped from the stack by the scanner. At the end
+current. These items are later popped from the stack by the parser. At the end
 of the file, a ``DED`` token is generated for each number remaining on the
 stack that is larger than zero.
 
 Because the grammar contains some optional ``IND`` tokens, the scanner cannot
 push new indentation levels. This has to be done by the parser. The symbol
 ``indPush`` indicates that an ``IND`` token is expected; the current number of
-leading spaces is pushed onto the stack by the parser.
+leading spaces is pushed onto the stack by the parser. The symbol ``indPop``
+denotes that the parser pops an item from the indentation stack. No token is
+consumed by ``indPop``.
+
 
 Comments
 --------
@@ -131,8 +138,8 @@ aligned to the preceding one, it does not start a new comment:
 Comments are tokens; they are only allowed at certain places in the input file
 as they belong to the syntax tree! This feature enables perfect source-to-source
 transformations (such as pretty-printing) and superior documentation generators.
-A side-effect is that the human reader of the code always knows exactly which
-code snippet the comment refers to.
+A nice side-effect is that the human reader of the code always knows exactly
+which code snippet the comment refers to.
 
 
 Identifiers & Keywords
@@ -159,9 +166,9 @@ case-sensitive and even underscores are ignored:
 **type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind
 this is that this allows programmers to use their own prefered spelling style
 and libraries written by different programmers cannot use incompatible
-conventions. The editors or IDE can show the identifiers as preferred. Another
-advantage is that it frees the programmer from remembering the exact spelling
-of an identifier.
+conventions. A Nimrod-aware editor or IDE can show the identifiers as
+preferred. Another advantage is that it frees the programmer from remembering 
+the exact spelling of an identifier.
 
 
 Literal strings
@@ -174,7 +181,7 @@ contain the following `escape sequences`:idx:\ :
   Escape sequence          Meaning
 ==================         ===================================================
   ``\n``                   `newline`:idx:
-  ``\r``                   `carriage return`:idx:
+  ``\r``, ``\c``           `carriage return`:idx:
   ``\l``                   `line feed`:idx:
   ``\f``                   `form feed`:idx:
   ``\t``                   `tabulator`:idx:
@@ -184,8 +191,7 @@ contain the following `escape sequences`:idx:\ :
   ``\'``                   `apostrophe`:idx:
   ``\d+``                  `character with decimal value d`:idx:;
                            all decimal digits directly
-                           following are used for the
-                           character
+                           following are used for the character
   ``\a``                   `alert`:idx:
   ``\b``                   `backspace`:idx:
   ``\e``                   `escape`:idx: `[ESC]`:idx:
@@ -194,15 +200,14 @@ contain the following `escape sequences`:idx:\ :
 ==================         ===================================================
 
 
-Strings in Nimrod may contain any 8-bit value, except embedded zeros
-which are not allowed for compability with `C`:idx:.
+Strings in Nimrod may contain any 8-bit value, except embedded zeros.
 
 Literal strings can also be delimited by three double squotes
 ``"""`` ... ``"""``.
 Literals in this form may run for several lines, may contain ``"`` and do not
 interpret any escape sequences.
-For convenience, when the opening ``"""`` is immediately
-followed by a newline, the newline is not included in the string.
+For convenience, when the opening ``"""`` is immediately followed by a newline, 
+the newline is not included in the string.
 There are also `raw string literals` that are preceded with the letter ``r``
 (or ``R``) and are delimited by matching double quotes (just like ordinary
 string literals) and do not interpret the escape sequences. This is especially
@@ -253,8 +258,8 @@ Numerical constants
 
 As can be seen in the productions, numerical constants can contain unterscores
 for readability. Integer and floating point literals may be given in decimal (no
-prefix), binary (prefix ``0b``), octal (prefix ``0o``) and
-hexadecimal (prefix ``0x``) notation.
+prefix), binary (prefix ``0b``), octal (prefix ``0o``) and hexadecimal 
+(prefix ``0x``) notation.
 
 There exists a literal for each numerical type that is
 defined. The suffix starting with an apostophe ('\'') is called a
@@ -262,7 +267,7 @@ defined. The suffix starting with an apostophe ('\'') is called a
 unless the literal contains a dot or an ``E`` in which case it is of
 type ``float``.
 
-The following table specifies type suffixes:
+The type suffixes are:
 
 =================    =========================
   Type Suffix        Resulting type of literal
@@ -295,11 +300,11 @@ the three tokens `{`:tok:, `..`:tok:, `}`:tok: and not the two tokens
 `{.`:tok:, `.}`:tok:.
 
 In Nimrod one can define his own operators. An `operator`:idx: is any
-combination of the following characters that are not listed above::
+combination of the following characters that is not listed above::
 
        +     -     *     /     <     >
        =     @     $     ~     &     %
-       !     ?     ^     .     |
+       !     ?     ^     .     |     \
 
 These keywords are also operators:
 ``and or not xor shl shr div mod in notin is isnot``.
@@ -348,16 +353,13 @@ Constants
 cannot change. The compiler must be able to evaluate the expression in a
 constant declaration at compile time.
 
-..
-  Nimrod contains a sophisticated
-  compile-time evaluator, so procedures declared with the ``{.noSideEffect.}``
-  pragma can be used in constant expressions:
-
-  .. code-block:: nimrod
+Nimrod contains a sophisticated compile-time evaluator, so procedures which
+have no side-effect can be used in constant expressions too:
 
-    from strutils import findSubStr
-    const
-      x = findSubStr('a', "hallo") # x is 1; this is computed at compile time!
+.. code-block:: nimrod
+  import strutils
+  const 
+    constEval = contains("abc", 'b') # computed at compile time!
 
 
 Types
@@ -414,8 +416,8 @@ intXX
 
 There are no `unsigned integer`:idx: types, only `unsigned operations`:idx:
 that treat their arguments as unsigned. Unsigned operations all wrap around;
-they may not lead to over- or underflow errors. Unsigned operations use the
-``%`` postfix as convention:
+they cannot lead to over- or underflow errors. Unsigned operations use the
+``%`` suffix as convention:
 
 ======================   ======================================================
 operation                meaning
@@ -453,7 +455,7 @@ floatXX
   implementation supports ``float32`` and ``float64``. Literals of these types
   have the suffix 'fXX.
 
-`Automatic type conversion`:idx: is performed in expressions where different 
+`Automatic type conversion`:idx: is performed in expressions where different
 kinds of integer types are used. However, if the type conversion
 loses information, the `EOutOfRange`:idx: exception is raised (if the error
 cannot be detected at compile time).
@@ -498,16 +500,15 @@ the resulting programs will still handle UTF-8 properly as UTF-8 was specially
 designed for this.
 Another reason is that Nimrod can support ``array[char, int]`` or
 ``set[char]`` efficiently as many algorithms rely on this feature. The
-`TUniChar` type is used for Unicode characters, it can represent any Unicode
-character. ``TUniChar`` is declared the ``unicode`` standard module.
+`TRune` type is used for Unicode characters, it can represent any Unicode
+character. ``TRune`` is declared the ``unicode`` module.
 
 
 
 Enumeration types
 ~~~~~~~~~~~~~~~~~
-`Enumeration`:idx: types define a new type whose values consist only of the ones
-specified.
-The values are ordered by the order in enum's declaration. Example:
+`Enumeration`:idx: types define a new type whose values consist of the ones
+specified. The values are ordered. Example:
 
 .. code-block:: nimrod
 
@@ -528,8 +529,8 @@ with enumeration types.
 
 For better interfacing to other programming languages, the fields of enum
 types can be assigned an explicit ordinal value. However, the ordinal values
-have to be in ascending order. A field whose ordinal value that is not
-explicitly given, is assigned the value of the previous field + 1.
+have to be in ascending order. A field whose ordinal value is not
+explicitly given is assigned the value of the previous field + 1.
 
 An explicit ordered enum can have *wholes*:
 
@@ -545,7 +546,7 @@ and ``pred`` are not available for them either.
 
 Subrange types
 ~~~~~~~~~~~~~~
-A `subrange`:idx: type is a range of values from an ordinal type (the host
+A `subrange`:idx: type is a range of values from an ordinal type (the base
 type). To define a subrange type, one must specify it's limiting values: the
 highest and lowest value of the type:
 
@@ -566,7 +567,7 @@ A subrange type has the same size as its base type (``int`` in the example).
 String type
 ~~~~~~~~~~~
 All string literals are of the type `string`:idx:. A string in Nimrod is very
-similar to a sequence of characters. However, strings in Nimrod both are
+similar to a sequence of characters. However, strings in Nimrod are both
 zero-terminated and have a length field. One can retrieve the length with the
 builtin ``len`` procedure; the length never counts the terminating zero.
 The assignment operator for strings always copies the string.
@@ -585,7 +586,7 @@ arrays, they can be used in case statements:
 Per convention, all strings are UTF-8 strings, but this is not enforced. For
 example, when reading strings from binary files, they are merely a sequence of
 bytes. The index operation ``s[i]`` means the i-th *char* of ``s``, not the
-i-th *unichar*. The iterator ``unichars`` from the ``unicode`` standard
+i-th *unichar*. The iterator ``runes`` from the ``unicode``
 module can be used for iteration over all unicode characters.
 
 
@@ -611,9 +612,7 @@ constructed by the array constructor ``[]`` in conjunction with the array to
 sequence operator ``@``. Another way to allocate space for a sequence is to
 call the built-in ``newSeq`` procedure.
 
-A sequence may be passed to a parameter that is of type *open array*, but
-not to a multi-dimensional open array, because it is impossible to do so in an
-efficient manner.
+A sequence may be passed to a parameter that is of type *open array*.
 
 Example:
 
@@ -633,18 +632,36 @@ The lower bound of an array or sequence may be received by the built-in proc
 received by ``len()``. ``low()`` for a sequence or an open array always returns
 0, as this is the first valid index.
 
-The notation ``x[i]`` can be used to access the i-th element of ``x``. 
+The notation ``x[i]`` can be used to access the i-th element of ``x``.
 
 Arrays are always bounds checked (at compile-time or at runtime). These
 checks can be disabled via pragmas or invoking the compiler with the
 ``--bound_checks:off`` command line switch.
 
+An open array is  also a means to implement passing a variable number of
+arguments to a procedure. The compiler converts the list of arguments
+to an array automatically:
+
+.. code-block:: nimrod
+  proc myWriteln(f: TFile, a: openarray[string]) =
+    for s in items(a):
+      write(f, s)
+    write(f, "\n")
+
+  myWriteln(stdout, "abc", "def", "xyz")
+  # is transformed by the compiler to:
+  myWriteln(stdout, ["abc", "def", "xyz"])
+
+This transformation is only done if the openarray parameter is the
+last parameter in the procedure header. The current implementation does not
+support nested open arrays.
+
 
 Tuples and object types
 ~~~~~~~~~~~~~~~~~~~~~~~
 A variable of a `tuple`:idx: or `object`:idx: type is a heterogenous storage
 container.
-A tuple or object defines various named *fields* of a type. A tuple also 
+A tuple or object defines various named *fields* of a type. A tuple also
 defines an *order* of the fields. Tuples are meant for heterogenous storage
 types with no overhead and few abstraction possibilities. The constructor ``()``
 can be used to construct tuples. The order of the fields in the constructor
@@ -691,7 +708,7 @@ the ``is`` operator can be used to determine the object's type.
     person: TPerson
   assert(student is TStudent) # is true
 
-Object fields that should be visible outside from the defining module, have to
+Object fields that should be visible from outside the defining module, have to
 marked by ``*``. In contrast to tuples, different object types are
 never *equivalent*.
 
@@ -730,7 +747,7 @@ An example:
   new(n)  # creates a new node
   n.kind = nkFloat
   n.floatVal = 0.0 # valid, because ``n.kind==nkFloat``, so that it fits
-  
+
   # the following statement raises an `EInvalidField` exception, because
   # n.kind's value does not fit:
   n.strVal = ""
@@ -783,8 +800,8 @@ point to and modify the same location in memory.
 
 Nimrod distinguishes between `traced`:idx: and `untraced`:idx: references.
 Untraced references are also called *pointers*. Traced references point to
-objects of a garbage collected heap, untraced references point to 
-manually allocated objects or to objects somewhere else in memory. Thus 
+objects of a garbage collected heap, untraced references point to
+manually allocated objects or to objects somewhere else in memory. Thus
 untraced references are *unsafe*. However for certain low-level operations
 (accessing the hardware) untraced references are unavoidable.
 
@@ -817,7 +834,7 @@ To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and
 ``realloc`` can be used. The documentation of the system module contains
 further information.
 
-If a reference points to *nothing*, it has the value ``nil``. 
+If a reference points to *nothing*, it has the value ``nil``.
 
 Special care has to be taken if an untraced object contains traced objects like
 traced references, strings or sequences: In order to free everything properly,
@@ -904,9 +921,9 @@ Most calling conventions exist only for the Windows 32-bit platform.
 
 
 
-Statements
-----------
-Nimrod uses the common statement/expression paradigma: `Statements`:idx: do not
+Statements and expressions
+--------------------------
+Nimrod uses the common statement/expression paradigm: `Statements`:idx: do not
 produce a value in contrast to expressions. Call expressions are statements.
 If the called procedure returns a value, it is not a valid statement
 as statements do not produce values. To evaluate an expression for
@@ -943,7 +960,7 @@ Discard statement
 
 Syntax::
 
-  discardStmt ::= DISCARD expr
+  discardStmt ::= 'discard' expr
 
 Example:
 
@@ -962,11 +979,13 @@ Var statement
 
 Syntax::
 
-  colonOrEquals ::= COLON typeDesc [EQUALS expr] | EQUALS expr
-  varField ::= symbol ["*"] [pragma]
+  colonOrEquals ::= ':' typeDesc ['=' expr] | '=' expr
+  varField ::= symbol ['*'] [pragma]
   varPart ::= symbol (comma symbol)* [comma] colonOrEquals [COMMENT | IND COMMENT]
-  varSection ::= VAR (varPart
-                     | indPush (COMMENT|varPart) (SAD (COMMENT|varPart))* DED)
+  varSection ::= 'var' (varPart
+                     | indPush (COMMENT|varPart)
+                       (SAD (COMMENT|varPart))* DED indPop)
+
 
 `Var`:idx: statements declare new local and global variables and
 initialize them. A comma seperated list of variables can be used to specify
@@ -992,7 +1011,7 @@ char                            '\0'
 bool                            false
 ref or pointer type             nil
 procedural type                 nil
-sequence                        nil
+sequence                        nil (**not** ``@[]``)
 string                          nil (**not** "")
 tuple[x: A, y: B, ...]          (default(A), default(B), ...)
                                 (analogous for objects)
@@ -1007,12 +1026,12 @@ Const section
 
 Syntax::
 
-  colonAndEquals ::= [COLON typeDesc] EQUALS expr
-  constDecl ::= CONST
-           indPush
-                symbol ["*"] [pragma] colonAndEquals
-           (SAD symbol ["*"] [pragma] colonAndEquals)*
-           DED
+  colonAndEquals ::= [':' typeDesc] '=' expr
+
+  constDecl ::= symbol ['*'] [pragma] colonAndEquals [COMMENT | IND COMMENT]
+              | COMMENT
+  constSection ::= 'const' indPush constDecl (SAD constDecl)* DED indPop
+
 
 Example:
 
@@ -1031,7 +1050,7 @@ If statement
 
 Syntax::
 
-  ifStmt ::= IF expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt]
+  ifStmt ::= 'if' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt]
 
 Example:
 
@@ -1061,9 +1080,9 @@ Case statement
 
 Syntax::
 
-  caseStmt ::= CASE expr (OF sliceList COLON stmt)*
-                         (ELIF expr COLON stmt)*
-                         [ELSE COLON stmt]
+  caseStmt ::= 'case' expr ('of' sliceExprList ':' stmt)*
+                           ('elif' expr ':' stmt)*
+                           ['else' ':' stmt]
 
 Example:
 
@@ -1078,9 +1097,9 @@ Example:
 The `case`:idx: statement is similar to the if statement, but it represents
 a multi-branch selection. The expression after the keyword ``case`` is
 evaluated and if its value is in a *vallist* the corresponding statements
-(after the ``of`` keyword) are executed. If the value is not in any 
-given *slicelist* the ``else`` part is executed. If there is no ``else`` 
-part and not all possible values that ``expr`` can hold occur in a ``vallist``, 
+(after the ``of`` keyword) are executed. If the value is not in any
+given *slicelist* the ``else`` part is executed. If there is no ``else``
+part and not all possible values that ``expr`` can hold occur in a ``vallist``,
 a static error is given. This holds only for expressions of ordinal types.
 If the expression is not of an ordinal type, and no ``else`` part is
 given, control just passes after the ``case`` statement.
@@ -1094,7 +1113,7 @@ When statement
 
 Syntax::
 
-  whenStmt ::= WHEN expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt]
+  whenStmt ::= 'when' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt]
 
 Example:
 
@@ -1116,8 +1135,7 @@ exceptions:
 * The statements do not open a new scope if they introduce new identifiers.
 * The statements that belong to the expression that evaluated to true are
   translated by the compiler, the other statements are not checked for
-  syntax or semantics at all! This holds also for any ``expr`` coming
-  after the expression that evaluated to true.
+  semantics! However, each ``expr`` is checked for semantics.
 
 The ``when`` statement enables conditional compilation techniques. As
 a special syntatic extension, the ``when`` construct is also available
@@ -1129,7 +1147,7 @@ Raise statement
 
 Syntax::
 
-  raiseStmt ::= RAISE [expr]
+  raiseStmt ::= 'raise' [expr]
 
 Example:
 
@@ -1137,7 +1155,7 @@ Example:
   raise newEOS("operating system failed")
 
 Apart from built-in operations like array indexing, memory allocation, etc.
-the ``raise`` statement is the only way to raise an exception. 
+the ``raise`` statement is the only way to raise an exception.
 
 .. XXX document this better!
 
@@ -1152,11 +1170,11 @@ Try statement
 
 Syntax::
 
+  qualifiedIdent ::= symbol ['.' symbol]
   exceptList ::= [qualifiedIdent (comma qualifiedIdent)* [comma]]
-  tryStmt ::= TRY COLON stmt
-            (EXCEPT exceptList COLON stmt)*
-            [FINALLY COLON stmt]
-             
+  tryStmt ::= 'try' ':' stmt
+             ('except' exceptList ':' stmt)*
+             ['finally' ':' stmt]
 
 Example:
 
@@ -1176,6 +1194,8 @@ Example:
       echo("could not convert string to integer")
     except EIO:
       echo("IO error!")
+    except:
+      echo("Unknown exception!")
     finally:
       closeFile(f)
 
@@ -1203,7 +1223,7 @@ Return statement
 
 Syntax::
 
-  returnStmt ::= RETURN [expr]
+  returnStmt ::= 'return' [expr]
 
 Example:
 
@@ -1219,12 +1239,13 @@ sugar for:
   return result
 
 ``return`` without an expression is a short notation for ``return result`` if
-the proc has a return type. The `result`:idx: variable is always the return 
+the proc has a return type. The `result`:idx: variable is always the return
 value of the procedure. It is automatically declared by the compiler. As all
 variables, ``result`` is initialized to (binary) zero::
 
 .. code-block:: nimrod
-    proc returnZero(): int = nil # implicitely returns 0
+    proc returnZero(): int =
+      # implicitely returns 0
 
 
 Yield statement
@@ -1232,7 +1253,7 @@ Yield statement
 
 Syntax::
 
-  yieldStmt ::= YIELD expr
+  yieldStmt ::= 'yield' expr
 
 Example:
 
@@ -1252,7 +1273,7 @@ Block statement
 
 Syntax::
 
-  blockStmt ::= BLOCK [symbol] COLON stmt
+  blockStmt ::= 'block' [symbol] ':' stmt
 
 Example:
 
@@ -1277,7 +1298,7 @@ Break statement
 
 Syntax::
 
-  breakStmt ::= BREAK [symbol]
+  breakStmt ::= 'break' [symbol]
 
 Example:
 
@@ -1294,7 +1315,7 @@ While statement
 
 Syntax::
 
-  whileStmt ::= WHILE expr COLON stmt
+  whileStmt ::= 'while' expr ':' stmt
 
 Example:
 
@@ -1316,7 +1337,7 @@ Continue statement
 
 Syntax::
 
-  continueStmt ::= CONTINUE
+  continueStmt ::= 'continue'
 
 A `continue`:idx: statement leads to the immediate next iteration of the
 surrounding loop construct. It is only allowed within a loop. A continue
@@ -1340,7 +1361,7 @@ Assembler statement
 ~~~~~~~~~~~~~~~~~~~
 Syntax::
 
-  asmStmt ::= ASM [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT)
+  asmStmt ::= 'asm' [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT)
 
 The direct embedding of `assembler`:idx: code into Nimrod code is supported
 by the unsafe ``asm`` statement. Identifiers in the assembler code that refer to
@@ -1348,6 +1369,49 @@ Nimrod identifiers shall be enclosed in a special character which can be
 specified in the statement's pragmas. The default special character is ``'`'``.
 
 
+If expression
+~~~~~~~~~~~~~
+
+An `if expression` is almost like an if statement, but it is an expression.
+Example:
+
+.. code-block:: nimrod
+  p(if x > 8: 9 else: 10)
+
+An if expression always results in a value, so the ``else`` part is
+required. ``Elif`` parts are also allowed (but unlikely to be good
+style).
+
+
+Type convertions
+~~~~~~~~~~~~~~~~
+Syntactically a `type conversion` is like a procedure call, but a
+type name replaces the procedure name. A type conversion is always
+safe in the sense that a failure to convert a type to another
+results in an exception (if it cannot be determined statically).
+
+
+Type casts
+~~~~~~~~~~
+Example:
+
+.. code-block:: nimrod
+  cast[int](x)
+
+Type casts are a crude mechanism to interpret the bit pattern of
+an expression as if it would be of another type. Type casts are
+only needed for low-level programming and are inherently unsafe.
+
+
+The addr operator
+~~~~~~~~~~~~~~~~~
+The `addr` operator returns the address of an l-value. If the
+type of the location is ``T``, the `addr` operator result is
+of the type ``ptr T``. Taking the address of an object that resides
+on the stack is **unsafe**, as the pointer may live longer than the
+object on the stack and can thus reference a non-existing object.
+
+
 Procedures
 ~~~~~~~~~~
 What most programming languages call `methods`:idx: or `funtions`:idx: are
@@ -1355,16 +1419,16 @@ called `procedures`:idx: in Nimrod (which is the correct terminology). A
 procedure declaration defines an identifier and associates it with a block
 of code. A procedure may call itself recursively. The syntax is::
 
-  param ::= symbol (comma symbol)* [comma] COLON typeDesc
-  paramList ::= [PAR_LE [param (comma param)* [comma]] PAR_RI] [COLON typeDesc]
-  
-  genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI
-  
-  procDecl ::= PROC symbol ["*"] [genericParams]
-               paramList [pragma]
-               [EQUALS stmt]
-               
-If the ``EQUALS stmt`` part is missing, it is a `forward`:idx: declaration. If
+  param ::= symbol (comma symbol)* [comma] ':' typeDesc
+  paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc]
+
+  genericParam ::= symbol [':' typeDesc]
+  genericParams ::= '[' genericParam (comma genericParam)* [comma] ']'
+
+  procDecl ::= 'proc' symbol ['*'] [genericParams] paramList [pragma]
+               ['=' stmt]
+
+If the ``= stmt`` part is missing, it is a `forward`:idx: declaration. If
 the proc returns a value, the procedure body can access an implicit declared
 variable named `result`:idx: that represents the return value. Procs can be
 overloaded. The overloading resolution algorithm tries to find the proc that is
@@ -1384,6 +1448,24 @@ is used if the caller does not provide a value for this parameter. Example:
     for i in 0..len(s) - 1:
       result[i] = toLower(s[i]) # calls toLower for characters; no recursion!
 
+Calling a procedure can be done in many different ways:
+
+.. code-block:: nimrod
+  proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ...
+
+  # call with positional arguments # parameter bindings:
+  callme(0, 1, "abc", '\t', true)  # (x=0, y=1, s="abc", c='\t', b=true)
+  # call with named and positional arguments:
+  callme(y=1, x=0, "abd", '\t')    # (x=0, y=1, s="abd", c='\t', b=false)
+  # call with named arguments (order is not relevant):
+  callme(c='\t', y=1, x=0)         # (x=0, y=1, s="", c='\t', b=false)
+  # call as a command statement: no () needed:
+  callme 0, 1, "abc", '\t'
+
+
+A procedure cannot modify its parameters (unless the parameters have the
+type `var`).
+
 `Operators`:idx: are procedures with a special operator symbol as identifier:
 
 .. code-block:: nimrod
@@ -1391,19 +1473,79 @@ is used if the caller does not provide a value for this parameter. Example:
     # converts an integer to a string; this is a prefix operator.
     return intToStr(x)
 
-Calling a procedure can be done in many different ways:
+Operators with one parameter are prefix operators, operators with two
+parameters are infix operators. There is no way to declare postfix
+operators: All postfix operators are built-in and handled by the
+grammar explicitely.
+
+Any operator can be called like an ordinary proc with the '`opr`'
+notation. (Thus an operator can have more than two parameters):
 
 .. code-block:: nimrod
-  proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ...
+  proc `*+` (a, b, c: int): int =
+    # Multiply and add
+    return a * b + c
+
+  assert `*+`(3, 4, 6) == `*`(a, `+`(b, c))
+
+
+
+Var parameters
+~~~~~~~~~~~~~~
+The type of a parameter may be prefixed with the ``var`` keyword:
+
+.. code-block:: nimrod
+  proc divmod(a, b: int, res, remainder: var int) =
+    res = a div b
+    remainder = a mod b
+
+  var
+    x, y: int
+
+  divmod(8, 5, x, y) # modifies x and y
+  assert x == 1
+  assert y == 3
+
+In the example, ``res`` and ``remainder`` are `var parameters`.
+Var parameters can be modified by the procedure and the changes are
+visible to the caller. The argument passed to a var parameter has to be
+an l-value. Var parameters are implemented as hidden pointers. The
+above example is equivalent to:
+
+.. code-block:: nimrod
+  proc divmod(a, b: int, res, remainder: ptr int) =
+    res = a div b
+    remainder = a mod b
+
+  var
+    x, y: int
+  divmod(8, 5, addr(x), addr(y))
+  assert x == 1
+  assert y == 3
+
+In the examples, var parameters or pointers are used to provide two
+return values. This can be done in a cleaner way by returning a tuple:
+
+.. code-block:: nimrod
+  proc divmod(a, b: int): tuple[res, remainder: int] =
+    return (a div b, a mod b)
+
+  var t = divmod(8, 5)
+  assert t.res == 1
+  assert t.remainder = 3
+
+Even more elegant is to use `tuple unpacking` to access the tuple's fields:
+
+.. code-block:: nimrod
+  var (x, y) = divmod(8, 5) # tuple unpacking
+  assert x == 1
+  assert y == 3
+
+Unfortunately, this form of tuple unpacking is not yet implemented.
+
+..
+  XXX remove this as soon as tuple unpacking is implemented
 
-  # call with positional arguments# parameter bindings:
-  callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true)
-  # call with named and positional arguments:
-  callme(y=1, x=0, "abd", '\t')   # (x=0, y=1, s="abd", c='\t', b=false)
-  # call with named arguments (order is not relevant):
-  callme(c='\t', y=1, x=0)        # (x=0, y=1, s="", c='\t', b=false)
-  # call as a command statement: no () needed:
-  callme 0, 1, "abc", '\t'
 
 
 Iterators and the for statement
@@ -1411,15 +1553,16 @@ Iterators and the for statement
 
 Syntax::
 
-  forStmt ::= FOR symbol (comma symbol)* [comma] IN expr [DOTDOT expr] COLON stmt
+  forStmt ::= 'for' symbol (comma symbol)* [comma] 'in' expr ['..' expr] ':' stmt
 
-  param ::= symbol (comma symbol)* [comma] COLON typeDesc
-  paramList ::= [PAR_LE [param (comma param)* [comma]] PAR_RI] [COLON typeDesc]
-  
-  genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI
-  
-  iteratorDecl ::= ITERATOR symbol ["*"] [genericParams] paramList [pragma]
-               [EQUALS stmt]
+  param ::= symbol (comma symbol)* [comma] ':' typeDesc
+  paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc]
+
+  genericParam ::= symbol [':' typeDesc]
+  genericParams ::= '[' genericParam (comma genericParam)* [comma] ']'
+
+  iteratorDecl ::= 'iterator' symbol ['*'] [genericParams] paramList [pragma]
+               ['=' stmt]
 
 The `for`:idx: statement is an abstract mechanism to iterate over the elements
 of a container. It relies on an `iterator`:idx: to do so. Like ``while``
@@ -1473,13 +1616,15 @@ Type sections
 Syntax::
 
   typeDef ::= typeDesc | objectDef | enumDef
-  genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI
 
-  typeDecl ::= TYPE
-           indPush
-                symbol ["*"] [genericParams] [EQUALS typeDef]
-           (SAD symbol ["*"] [genericParams] [EQUALS typeDef])*
-           DED
+  genericParam ::= symbol [':' typeDesc]
+  genericParams ::= '[' genericParam (comma genericParam)* [comma] ']'
+
+  typeDecl ::= COMMENT
+             | symbol ['*'] [genericParams] ['=' typeDef] [COMMENT|IND COMMENT]
+
+  typeSection ::= 'type' indPush typeDecl (SAD typeDecl)* DED indPop
+
 
 Example:
 
@@ -1504,6 +1649,8 @@ possible within a single ``type`` section.
 Generics
 ~~~~~~~~
 
+`Version 0.7.4: Complex generic types like in the example do not work.`:red:
+
 Example:
 
 .. code-block:: nimrod
@@ -1578,26 +1725,119 @@ Example:
     # this definition exists in the System module
     not (a == b)
 
-  writeln(5 != 6) # the compiler rewrites that to: writeln(not (5 == 6))
+  assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6))
 
 
 Macros
-~~~~~~
+------
 
-`Macros`:idx: are the most powerful feature of Nimrod. They should be used
-only to implement `domain specific languages`:idx:. They may lead to code
+`Macros`:idx: are the most powerful feature of Nimrod. They can be used
+to implement `domain specific languages`:idx:. But they may lead to code
 that is harder to understand and maintain. So one ought to use them sparingly.
-The usage of ordinary procs, iterators or generics is preferred to the usage of
-macros.
+
+While macros enable advanced compile-time code tranformations, they
+cannot change Nimrod's syntax. However, this is no real restriction because
+Nimrod's syntax is flexible enough anyway.
+
+To write macros, one needs to know how the Nimrod concrete syntax is converted
+to an abstract syntax tree. (Unfortunately the AST is not yet documented.)
+
+There are two ways to invoke a macro:
+(1) invoking a macro like a procedure call (`expression macros`)
+(2) invoking a macro with the special ``macrostmt`` syntax (`statement macros`)
+
+
+Expression Macros
+~~~~~~~~~~~~~~~~~
+
+The following example implements a powerful ``debug`` command that accepts a
+variable number of arguments:
+
+.. code-block:: nimrod
+  # to work with Nimrod syntax trees, we need an API that is defined in the
+  # ``macros`` module:
+  import macros
+
+  macro debug(n: expr): stmt =
+    # `n` is a Nimrod AST that contains the whole macro expression
+    # this macro returns a list of statements:
+    result = newNimNode(nnkStmtList, n)
+    # iterate over any argument that is passed to this macro:
+    for i in 1..n.len-1:
+      # add a call to the statement list that writes the expression;
+      # `toStrLit` converts an AST to its string representation:
+      add(result, newCall("write", newIdentNode("stdout"), toStrLit(n[i])))
+      # add a call to the statement list that writes ": "
+      add(result, newCall("write", newIdentNode("stdout"), newStrLitNode(": ")))
+      # add a call to the statement list that writes the expressions value:
+      add(result, newCall("writeln", newIdentNode("stdout"), n[i]))
+
+  var
+    a: array [0..10, int]
+    x = "some string"
+  a[0] = 42
+  a[1] = 45
+
+  debug(a[0], a[1], x)
+
+The macro call expands to:
+
+.. code-block:: nimrod
+  write(stdout, "a[0]")
+  write(stdout, ": ")
+  writeln(stdout, a[0])
+
+  write(stdout, "a[1]")
+  write(stdout, ": ")
+  writeln(stdout, a[1])
+
+  write(stdout, "x")
+  write(stdout, ": ")
+  writeln(stdout, x)
+
+
+Statement Macros
+~~~~~~~~~~~~~~~~
+
+Statement macros are defined just as expression macros. However, they are
+invoked by an expression following a colon::
+
+  exprStmt ::= lowestExpr ['=' expr | [expr (comma expr)* [comma]] [macroStmt]]
+  macroStmt ::= ':' [stmt] ('of' [sliceExprList] ':' stmt
+                          | 'elif' expr ':' stmt
+                          | 'except' exceptList ':' stmt )*
+                           ['else' ':' stmt]
+
+The following example outlines a macro that generates a lexical analyser from
+regular expressions:
+
+.. code-block:: nimrod
+  import macros
+
+  macro case_token(n: stmt): stmt =
+    # creates a lexical analyser from regular expressions
+    # ... (implementation is an exercise for the reader :-)
+    nil
+
+  case_token: # this colon tells the parser it is a macro statement
+  of r"[A-Za-z_]+[A-Za-z_0-9]*":
+    return tkIdentifier
+  of r"0-9+":
+    return tkInteger
+  of r"[\+\-\*\?]+":
+    return tkOperator
+  else:
+    return tkUnknown
+
 
 
 Modules
 -------
 Nimrod supports splitting a program into pieces by a `module`:idx: concept.
-Each module needs to be in its own file. Modules enable 
+Each module needs to be in its own file. Modules enable
 `information hiding`:idx: and `separate compilation`:idx:. A module may gain
-access to symbols of another module by the `import`:idx: statement. 
-`Recursive module dependancies`:idx: are allowed, but slightly subtle. Only 
+access to symbols of another module by the `import`:idx: statement.
+`Recursive module dependancies`:idx: are allowed, but slightly subtle. Only
 top-level symbols that are marked with an asterisk (``*``) are exported.
 
 The algorithm for compiling modules is:
@@ -1622,12 +1862,12 @@ This is best illustrated by an example:
 
   # Module B
   import A  # A is not parsed here! Only the already known symbols
-            # of A are imported here.
+            # of A are imported.
 
-  proc p*(x: A.T1): A.T1 # this works because the compiler has already
-                         # added T1 to A's interface symbol table
-
-  proc p(x: A.T1): A.T1 = return x + 1
+  proc p*(x: A.T1): A.T1 =
+    # this works because the compiler has already
+    # added T1 to A's interface symbol table
+    return x + 1
 
 
 Scope rules
@@ -1649,7 +1889,7 @@ procedure or iterator overloading purposes.
 
 
 Tuple or object scope
-~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~
 The field identifiers inside a tuple or object definition are valid in the
 following places:
 
@@ -1659,16 +1899,14 @@ following places:
 
 Module scope
 ~~~~~~~~~~~~
-All identifiers in the interface part of a module are valid from the point of
-declaration, until the end of the module. Furthermore, the identifiers are
-known in other modules that import the module. Identifiers from indirectly
-dependent modules are *not* available. The `system`:idx: module is automatically
-imported in all other modules.
+All identifiers of a module are valid from the point of declaration until
+the end of the module. Identifiers from indirectly dependent modules are *not* 
+available. The `system`:idx: module is automatically imported in every other 
+module.
 
-If a module imports an identifier by two different modules,
-each occurance of the identifier has to be qualified, unless it is an
-overloaded procedure or iterator in which case the overloading
-resolution takes place:
+If a module imports an identifier by two different modules, each occurance of 
+the identifier has to be qualified, unless it is an overloaded procedure or 
+iterator in which case the overloading resolution takes place:
 
 .. code-block:: nimrod
   # Module A
@@ -1680,7 +1918,7 @@ resolution takes place:
   # Module C
   import A, B
   write(stdout, x) # error: x is ambigious
-  write(sdtout, A.x) # no error: qualifier used
+  write(stdout, A.x) # no error: qualifier used
 
   var x = 4
   write(stdout, x) # not ambigious: uses the module C's x
@@ -1698,14 +1936,14 @@ Pragmas
 
 Syntax::
 
-  colonExpr ::= expr [COLON expr]
-  colonExprList ::= [ colonExpr (comma colonExpr)* [comma] ]
+  colonExpr ::= expr [':' expr]
+  colonExprList ::= [colonExpr (comma colonExpr)* [comma]]
 
-  pragma ::= CURLYDOT_LE colonExprList (CURLYDOT_RI | CURLY_RI)
+  pragma ::= '{.' optInd (colonExpr [comma])* [SAD] ('.}' | '}')
 
 Pragmas are Nimrod's method to give the compiler additional information/
 commands without introducing a massive number of new keywords. Pragmas are
-processed on the fly during parsing. Pragmas are always enclosed in the
+processed on the fly during semantic checking. Pragmas are enclosed in the
 special ``{.`` and ``.}`` curly brackets.
 
 
@@ -1718,7 +1956,7 @@ The compiler defines the target processor and the target operating
 system as conditional symbols.
 
 Warning: The ``define`` pragma is deprecated as it conflicts with separate
-compilation! One should use boolean constants as a replacement - this is 
+compilation! One should use boolean constants as a replacement - this is
 cleaner anyway.
 
 
@@ -1759,10 +1997,6 @@ compilation option pragmas
 --------------------------
 The listed pragmas here can be used to override the code generation options
 for a section of code.
-::
-
-  "{." pragma: val {pragma: val} ".}"
-
 
 The implementation currently provides the following possible options (later
 various others may be added).
@@ -1785,8 +2019,7 @@ warnings         on|off           Turns the warning messages of the compiler
 hints            on|off           Turns the hint messages of the compiler
                                   on or off.
 optimization     none|speed|size  Optimize the code for speed or size, or
-                                  disable optimization. For non-optimizing
-                                  compilers this option has no effect.
+                                  disable optimization.
 callconv         cdecl|...        Specifies the default calling convention for
                                   all procedures (and procedure types) that
                                   follow.