1 files changed, 1297 insertions, 0 deletions
diff --git a/doc/spec.txt b/doc/spec.txt
new file mode 100755
index 000000000..3bad06e97
--- /dev/null
+++ b/doc/spec.txt
@@ -0,0 +1,1297 @@
+====================
+Nimrod Specification
+====================
+
+:Author: Andreas Rumpf
+
+.. contents::
+
+
+About this document
+===================
+
+This document describes the lexis, the syntax, and the semantics of Nimrod.
+However, this is only a first draft. Some parts need to be more precise,
+features may be added to the language, etc.
+
+The language constructs are explained using an extended BNF, in
+which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and
+``(a)?`` means an optional *a*; an alternative spelling for optional parts is
+``[a]``. The ``|`` symbol is used to mark alternatives
+and has the lowest precedence. Parentheses may be used to group elements.
+Non-terminals are in lowercase, terminal symbols (including keywords) are in
+UPPERCASE. An example::
+
+  if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts]
+
+
+Definitions
+===========
+
+The following defintions are the same as their counterparts in the
+specification of the Modula-3 programming language.
+
+A Nimrod program specifies a computation that acts on a sequence of digital
+components called `locations`. A variable is a set of locations that
+represents a mathematical value according to a convention determined by the
+variable's *type*. If a value can be represented by some variable of type
+``T``, then we say that the value is a *member* of ``T`` and ``T`` *contains*
+the value.
+
+An *identifier* is a symbol declared as a name for a variable, type,
+procedure, etc. The region of the program over which a declaration applies is
+called the *scope* of the declaration. Scopes can be nested. The meaning of an
+identifier is determined by the smallest enclosing scope in which the
+identifier is declared.
+
+An expression specifies a computation that produces a value or variable.
+Expressions that produce variables are called `designators`. A designator
+can denote either a variable or the value of that variable, depending on
+the context. Some designators are *readonly*, which means that they cannot
+be used in contexts that might change the value of the variable. A
+designator that is not readonly is called *writable*. Expressions whose
+values can be determined statically are called *constant expressions*;
+they are never designators.
+
+A `static error` is an error that the implementation must detect before
+program execution. Violations of the language definition are static
+errors unless they are explicitly classified as runtime errors.
+
+A `checked runtime error` is an error that the implementation must detect
+and report at runtime. The method for reporting such errors is via *raising
+exceptions*. However, an implementation may provide a means to disable these
+runtime checks. See the section *pragmas* for details.
+
+An `unchecked runtime error` is an error that is not guaranteed to be
+detected, and can cause the subsequent behavior of the computation to
+be arbitrary. Unchecked runtime errors cannot occur if only `safe`
+language features are used.
+
+
+Lexical Analysis
+================
+
+Indentation
+-----------
+
+Nimrod's standard grammar describes an `indentation sensitive` language.
+This means that all the control structures are recognized by the indentation.
+Indentation consists only of spaces; tabulators are not allowed.
+
+The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD``
+(same indentation) are generated by the scanner, denoting an indentation.
+Using tabulators for the indentation is not allowed.
+
+These terminals are only generated for *logical lines*, i.e. not for an empty
+line and not for a line with only whitespace or comments.
+
+The parser and the scanner communicate over a stack which indentation terminal
+should be generated: The stack consists of integers counting the spaces. The
+stack is initialized with a zero on its top. The scanner reads from the stack:
+If the current indentation token consists of more spaces than the entry at the
+top of the stack, a ``IND`` token is generated, else if it consists of the same
+number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces,
+a ``DED`` token is generated for any item on the stack that is greater than the
+current. These items are then popped from the stack by the scanner. At the end
+of the file, a ``DED`` token is generated for each number remaining on the
+stack that is larger than zero.
+
+Because the grammar contains some optional ``IND`` tokens, the scanner cannot
+push new indentation levels. This has to be done by the parser. The symbol
+``IND_PUSH`` indicates that the ``IND`` token should be pushed onto the stack
+by the parser.
+
+An Example how this works::
+
+  if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* (ELSE stmts)?
+  stmts ::= IND_PUSH (stmt [SAD])+ DED | stmt [SAD]
+
+  if expr0:
+  stmt1       # would be valid because, SAD is not generated any longer!
+
+  if expr0:
+    stmt1     # scanner: IND; parser pushes 2 onto the stack
+  if expr3: stmt5 # DED; ... SAD eaten by stmt5
+  else: stmt6     #
+  if expr4: stmt7
+  if expr:
+    stmt1     # scanner: IND; the parser pushes 2 onto the stack
+    stmt2     # scanner: SAD (because indentation is 2)
+  elif expr2: # scanner: DED (because indentation is 0)
+    stmts3    # scanner: IND; the parser pushes 2 onto the stack
+  else:       # scanner: DED;
+    stmt4     # scanner: IND; the parser pushes 2 onto the stack
+              # scanner generates 1 DED, because end of file and 1 item on stack
+
+
+
+Identifiers & Keywords
+----------------------
+
+`Identifiers` in Nimrod can be any string of letters, digits
+and underscores, beginning with a letter. Two immediate following
+underscores ``__`` are not allowed::
+
+  letter ::= 'A'..'Z' | 'a'..'z'
+  digit ::= '0'..'9'
+  IDENTIFIER ::= letter ( ['_'] letter | digit )*
+
+The following `keywords` are reserved and cannot be used as identifiers::
+
+  ${keywords}
+
+Some keywords are unused; they are reserved for future developments of the
+language.
+
+Nimrod is a `style-insensitive` language. This means that it is not
+case-sensitive and even underscores are ignored:
+**type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind
+this is, that this allows programmers to use their own prefered spelling style.
+Editors can show the identifiers as configured.
+
+
+Literal strings
+---------------
+
+`Literal strings` can be delimited by matching double quotes, and can contain
+the following `escape sequences`:
+
+==================         ==================================
+  Escape sequence          Meaning
+==================         ==================================
+  ``\n``                   `newline`
+  ``\r``                   `carriage return`
+  ``\l``                   `line feed`
+  ``\f``                   `form feed`
+  ``\t``                   `tabulator`
+  ``\v``                   `vertical tabulator`
+  ``\\``                   `backslash`
+  ``\"``                   `quotation mark`
+  ``\'``                   `apostrophe`
+  ``\y``                   `character with number y`
+  ``\a``                   `alert`
+  ``\b``                   `backspace`
+  ``\e``                   `escape` `[ESC]`
+==================         ==================================
+
+
+Strings in Nimrod may contain any 8-bit value, except embedded zeros,
+which are not allowed for compability with `C`.
+
+Literal strings can also be delimited by three double squotes
+``"""`` ... ``"""``.
+Literals in this form may run for several lines, may contain ``"`` and do not
+interpret any escape sequences.
+For convenience, when the opening ``"""`` is immediately
+followed by a newline, the newline is not included in the string.
+`Raw string literals` are preceded with the letter ``r`` (or ``R``)
+and are delimited by matching double quotes (just like ordinary string
+literals) and do not interpret the escape sequences.
+
+
+Literal characters
+------------------
+
+Character literals are enclosed in single quotes ``''`` and can contain the
+same escape sequences as strings - with one exception: ``\n`` is not allowed
+as it may be wider than one character (often it is the pair CR/LF for example).
+
+
+Numerical constants
+-------------------
+
+`Numerical constants` are of a single type and have the form::
+
+  hexdigit ::= digit | 'A'..'F' | 'a'..'f'
+  octdigit ::= '0'..'7'
+  bindigit ::= '0'..'1'
+  INT_LIT ::= digit ( ['_'] digit )*
+            | '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )*
+            | '0o' octdigit ( ['_'] octdigit )*
+            | '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )*
+
+  INT8_LIT ::= INT_LIT '\'' ('i' | 'I' ) '8'
+  INT16_LIT ::= INT_LIT '\'' ('i' | 'I' ) '16'
+  INT32_LIT ::= INT_LIT '\'' ('i' | 'I' ) '32'
+  INT64_LIT ::= INT_LIT '\'' ('i' | 'I' ) '64'
+
+  exponent ::= ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )*
+  FLOAT_LIT ::= digit (['_'] digit)*  ('.' (['_'] digit)* [exponent] |exponent)
+  FLOAT32_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '32'
+  FLOAT64_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '64'
+
+
+As can be seen in the productions, numerical constants can contain unterscores
+for readability. Integer and floating point literals may be given in decimal (no
+prefix), binary (prefix ``0b``), octal (prefix ``0o``) and
+hexadecimal (prefix ``0x``) notation.
+
+There exists a literal for each numerical type that are
+defined. The suffix starting with an apostophe ('\'') is called a
+`type suffix`. Literals without a type prefix are of the type ``int``, unless
+the literal contains a dot or an ``E`` in which case it is of type ``float``.
+
+The following table specifies type suffixes:
+
+=================    =========================
+  Type Suffix        Resulting type of literal
+=================    =========================
+  ``'i8``            int8
+  ``'i16``           int16
+  ``'i32``           int32
+  ``'i64``           int64
+  ``'f32``           float32
+  ``'f64``           float64
+=================    =========================
+
+Floating point literals may also be in binary, octal or hexadecimal
+notation:
+``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64``
+is approximately 1.72826e35 according to the IEEE floating point standard.
+
+
+
+
+Comments
+--------
+
+`Comments` start anywhere outside a string with the hash character ``#``.
+Comments run until the end of the line. Comments are tokens; they are only
+allowed at certain places in the input file as they belong to the syntax. This
+is essential for performing correct source-to-source transformations or
+documentation generators.
+
+
+Other tokens
+------------
+
+The following strings denote other tokens::
+
+       (     )     {     }     [     ]     ,  ;   [.    .]  {.   .}  (.  .)
+       :     =     ^     ..    `
+
+``..`` takes precedence over other tokens that contain a dot: ``{..}`` are the
+three tokens ``{``, ``..``, ``}`` and not the two tokens ``{.``, ``.}``.
+
+In Nimrod one can define his own operators. An `operator` is any
+combination of the following characters that are not listed above::
+
+       +     -     *     /     <     >
+       =     @     $     ~     &     %
+       !     ?     ^     .     |
+
+These keywords are also operators:
+``and or not xor shl shr div mod in notin is isnot``.
+
+
+Syntax
+======
+
+This section lists Nimrod's standard syntax in ENBF. How the parser receives
+indentation tokens is already described in the Lexical Analysis section.
+
+Nimrod allows user-definable operators.
+Binary operators have 8 different levels of precedence. For user-defined
+operators, the precedence depends on the first character the operator consists
+of. All binary operators are left-associative.
+
+================  ==============================================  ==================  ===============
+Precedence level    Operators                                     First characters    Terminal symbol
+================  ==============================================  ==================  ===============
+  7 (highest)                                                     ``$``               OP7
+  6               ``*    /    div   mod   shl  shr  %``           ``* % \  /``        OP6
+  5               ``+    -``                                      ``+  ~  |``         OP5
+  4               ``&``                                           ``&``               OP4
+  3               ``==  <= < >= > !=  in  not_in  is  isnot``     ``= <  > !``        OP3
+  2               ``and``                                                             OP2
+  1               ``or xor``                                                          OP1
+  0 (lowest)                                                      ``? @ ^ ` : .``     OP0
+================  ==============================================  ==================  ===============
+
+
+The grammar's start symbol is ``module``. The grammar is LL(1) and therefore
+not ambigious.
+
+.. include:: grammar.txt
+   :literal:
+
+
+
+Semantics
+=========
+
+Constants
+---------
+
+Constants are symbols which are bound to a value. The constant's value
+cannot change. The compiler must be able to evaluate the expression in a
+constant declaration at compile time. This means that most of the functions in
+the runtime library cannot be used in a constant declaration.
+
+Operators such as ``+, -, *, /, not, and, or, div, mod`` and the procedures
+``addr, ord, chr, sizeof, trunc, round, frac, odd, abs`` can be used, however.
+An implementation may restrict the usage of ``addr`` in constant expressions.
+
+
+Types
+-----
+
+All expressions have a type which is known at compile time. Thus Nimrod is
+statically typed. One can declare new types, which is in
+essence defining an identifier that can be used to denote this custom type
+when declaring variables further in the source code.
+
+These are the major type classes:
+
+* ordinal types (consist of integer, bool, character, enumeration
+  (and subranges thereof) types)
+* floating point types
+* string type
+* structured types
+* reference (pointer) type
+* procedural type
+
+
+Ordinal types
+~~~~~~~~~~~~~
+Ordinal types have the following characteristics:
+
+- Ordinal types are countable and ordered. This property allows
+  the operation of functions as Inc, Ord, Dec on ordinal types to be defined.
+- Ordinal values have a smallest possible value. Trying to count farther
+  down than the smallest value gives a checked runtime or static error.
+- Ordinal values have a largest possible value. Trying to count farther
+  than the largest value gives a checked runtime or static error.
+
+Signed integers, bool, characters and enumeration types (and subrange of these
+types) belong to ordinal types. Unsigned integer types are special in the way
+that over- and underflows generate no errors, but wrap around.
+
+Pre-defined numerical types
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+These integer types are pre-defined:
+
+``int``
+  the generic signed integer type; its size is platform dependant
+  (a compiler should choose the processor's fastest integer type)
+  this type should be used in general. An integer literal that has no type
+  suffix is of this type.
+
+intXX
+  an implementation may define additional signed integer types
+  of XX bits using this naming scheme (example: int16 is a 16 bit wide integer).
+  The current implementation supports ``int8``, ``int16``, ``int32``, ``int64``.
+  Literals of these types have the suffix 'iXX.
+
+
+There are no unsigned integer types, only *unsigned operations* that treat their
+arguments as unsigned. Unsigned operations all wrap around; they may not lead to
+over- or underflow errors.
+
+The following floating point types are pre-defined:
+
+``float``
+  the generic floating point type; its size is platform dependant
+  (a compiler should choose the processor's fastest floating point type)
+  this type should be used in general
+
+floatXX
+  an implementation may define additional floating point types of XX bits using
+  this naming scheme (example: float64 is a 64 bit wide float). The current
+  implementation supports ``float32`` and ``float64``. Literals of these types
+  have the suffix 'fXX.
+
+Automatic type conversion in expressions where different kinds
+of integer types are used is performed. However, if the type conversion
+loses information, the `EConvertError` exception is raised. An implementation
+may detect certain cases of the convert error at compile time.
+
+Automatic type conversion in expressions with different kinds
+of floating point types is performed: The smaller type is
+converted to the larger. Arithmetic performed on floating point types
+follows the IEEE standard.
+
+
+Boolean type
+~~~~~~~~~~~~
+The boolean type is named ``bool`` in Nimrod and can be one of the two
+pre-defined values ``true`` and ``false``. Conditions in while,
+if, elif, when statements need to be of type bool.
+
+This condition should hold::
+
+  ord(false) == 0 and ord(true) == 1
+
+The operators ``not, and, or, xor`` are defined for the bool type.
+The ``and`` and ``or`` operators perform short-cut evaluation.
+Example::
+
+  while p != nil and p.name != "xyz":
+    # p.name is not evaluated if p == nil
+    p = p.next
+
+
+The size of the bool type is implementation-dependant, typically it is
+one byte.
+
+
+Character type
+~~~~~~~~~~~~~~
+The character type is named ``char`` in Nimrod and uses the platform's
+native encoding. Thus on nearly every platform its size is one byte due to
+the popular UTF-8 encoding.
+Character literals are enclosed in single quotes ``''``.
+
+.. Note:: For platform-independant character handling is the ``encoding``
+          standard module.
+
+
+Enumeration types
+~~~~~~~~~~~~~~~~~
+Enumeration types define a new type whose values consist only of the ones
+specified.
+The values are ordered by the order in enum's declaration. Example::
+
+  type
+    TDirection = enum
+      north, east, south, west
+
+
+Now the following holds::
+
+  ord(north) == 0
+  ord(east) == 1
+  ord(south) == 2
+  ord(west) == 3
+
+Thus, north < east < south < west. The comparison operators can be used
+with enumeration types.
+
+An implemenation should store enumeration types with the minimal number of
+bytes required for the particular enum, unless efficiency would be affected by
+doing so.
+
+For better interfacing to other programming languages, the fields of enum
+types can be assigned an explicit ordinal value. However, the ordinal values
+have to be in ascending order appropriately. A field whose ordinal value that
+is not explicitly given, gets the value of the previous field + 1.
+
+An explicit ordered enum can have *wholes*::
+
+  type
+    TTokenType = enum
+      a = 2, b = 4, c = 89 # wholes are valid
+
+However, it is then not an ordinal anymore, so it is not possible to use these
+enums as an index type for arrays. The procedures ``inc``, ``dec``, ``succ``
+and ``pred`` are not available for them.
+
+
+Subrange types
+~~~~~~~~~~~~~~
+A subrange type is a range of values from an ordinal type (the host type).
+To define a subrange type, one must specify it's limiting values: the highest
+and lowest value of the type::
+
+  type
+    TSubrange = range[0..5]
+
+
+``TSubrange`` is a subrange of an integer which can only hold the values 0
+to 5. Assigning an other value to a variable of type ``TSubrange`` is a
+checked runtime error (or static error if it can be statically
+determined). Assignments from the base type to one of its subrange types
+(and vice versa) are allowed.
+
+An implemenation should give it the same size as its base type.
+
+
+String type
+~~~~~~~~~~~
+All string literals are of the type string. A string in Nimrod is very
+similar to a sequence of characters. However, strings in Nimrod both are
+zero-terminated and have a length field. One can retrieve the length with the
+builtin ``length`` procedure; the length never counts the terminating zero.
+The assignment operator for strings always copies the string.
+
+Strings are compared by their lexicographical order. All comparison operators
+are available. String can be indexed like arrays (lower bound is 0). Unlike
+arrays, they can be used in case statements::
+
+  case paramStr(i)
+  of "-v": incl(options, optVerbose)
+  of "-h", "-?": incl(options, optHelp)
+  else: write(stdout, "invalid command line option!\n")
+
+
+Structured types
+~~~~~~~~~~~~~~~~
+A variable of a structured type can hold multiple values at the same time.
+Stuctured types can be nested to unlimited levels. Arrays, sequences, records,
+objects and sets belong to the structured types.
+
+Array type
+~~~~~~~~~~
+Arrays are a homogenous type, meaning that each element in the array has the
+same type. Arrays always have a fixed length which is specified at compile time
+(except for open arrays). They can be indexed by any ordinal type. A parameter
+may leave out the index type in the declaration making it an
+*open array*. An open array ``A`` is always indexed by integers from 0 to
+``length(A)-1``.
+
+A sequence may be passed to a parameter that is of type *open array*, but
+not to a multi-dimensional open array, because it is impossible to do so in an
+efficient manner.
+
+An array expression may be constructed by the array constructor ``[]``.
+A constructed array is assignment compatible to a sequence.
+
+Example::
+
+  type
+    TIntArray = array[0..5, int]
+  var
+    x: TIntArray
+  x = [1, 2, 3, 4, 5, 6] # this is the array constructor
+
+The lower bound of an array may be received by the built-in proc
+``low()``, the higher bound by ``high()``. The length may be
+received by ``length()``.
+
+Arrays are always bounds checked (at compile-time or at runtime). An
+implementation may provide a means to disable these checks.
+
+
+Sequence type
+~~~~~~~~~~~~~
+Sequences are similar to arrays but of dynamic length which may change
+during runtime (like strings). A sequence ``S`` is always indexed by integers
+from 0 to ``length(S)-1`` and its bounds are checked. Sequences can also be
+constructed by the array constructor ``[]``.
+
+
+Record and object types
+~~~~~~~~~~~~~~~~~~~~~~~
+A variable of a record or object type is a heterogenous storage container.
+A record or object defines various named *fields* of a type. The assignment
+operator for records and objects always copies the whole record/object. The
+constructor ``[]`` can be used to initialize records/objects. A field may
+be given a default value. Fields with default values do not have to be listed
+in a record construction, all other fields have to be listed.
+::
+
+  type
+    TPerson = record   # type representing a person
+      name: string     # a person consists of a name
+      age: int = 30    # and an age which default value is 30
+
+  var
+    person: TPerson
+  person = (name: "Peter") # person.age is its default value (30)
+
+An implementation may align or even reorder the fields for best access
+performance. The alignment may be specified with the `align`
+pragma. If an alignment is specified the compiler shall not reorder the fields.
+
+The difference between records and objects is that objects allow inheritance.
+Objects have access to their type at runtime, so that the ``is`` operator
+can be used to determine the object's type. Assignment from an object to its
+parents' object leads to a static or runtime error (the
+`EInvalidObjectAssignment` exception is raised).
+::
+
+  type
+    TPerson = object
+      name: string
+      age: int
+
+    TStudent = object of TPerson # a student is a person
+      id: int                    # with an id field
+
+  var
+    student: TStudent
+    person: TPerson
+  student = (name: "Peter", age: 89, id: 3)
+  person = (name: "Mary", age: 17)
+  assert(student is TStudent) # is true
+  person = student # this is an error; person has no storage for id.
+
+
+Set type
+~~~~~~~~
+The `set type` models the mathematical notion of a set. The set's
+basetype can only be an ordinal type. The reason is that sets are implemented
+as bit vectors. Sets are designed for high performance computing.
+
+.. Note:: The sets module can be used for sets of other types.
+
+Sets can be constructed via the set constructor: ``{}`` is the empty set. The
+empty set is type combatible with any special set type. The constructor
+can also be used to include elements (and ranges of elements) in the set::
+
+  {'a'..'z', '0'..'9'} # This constructs a set that conains the
+                       # letters from 'a' to 'z' and the digits
+                       # from '0' to '9'
+
+These operations are supported by sets:
+
+==================  ==========================================================
+operation           meaning
+==================  ==========================================================
+  A + B             union of two sets
+  A * B             intersection of two sets
+  A - B             difference of two sets (A without B's elements)
+  A == B            set equality
+  A <= B            subset relation (A is subset of B or equal to B)
+  A < B             strong subset relation (A is a real subset of B)
+  e in A            set membership (A contains element e)
+  A >< B            symmetric set difference (= (A - B) + (B - A))
+  card(A)           the cardinality of A (number of elements in A)
+  incl(A, elem)     same as A = A + {elem}, but may be faster
+  excl(A, elem)     same as A = A - {elem}, but may be faster
+==================  ==========================================================
+
+Reference type
+~~~~~~~~~~~~~~
+References (similiar to `pointers` in other programming languages) are a way to
+introduce many-to-one relationships. This means different references can point
+to and modify the same location in memory. References should be used sparingly
+in a program. They are only needed for constructing graphs.
+
+Nimrod distinguishes between *traced* and *untraced* references. Untraced
+references are also called `pointers`. The difference between them is that
+traced references are garbage collected, untraced are not. Thus untraced
+references are *unsafe*. However for certain low-level operations (accessing
+the hardware) untraced references are unavoidable.
+
+Traced references are declared with the **ref** keyword, untraced references
+are declared with the **ptr** keyword.
+
+The ``^`` operator can be used to derefer a reference, the ``addr`` procedure
+returns the address of an item. An address is always an untraced reference.
+Thus the usage of ``addr`` is an *unsafe* feature.
+
+The ``.`` (access a record field operator) and ``[]`` (array/string/sequence
+index operator) operators perform implicit dereferencing operations for
+reference types::
+
+  type
+    PNode = ref TNode
+    TNode = record
+      le, ri: PNode
+      data: int
+
+  var
+    n: PNode
+  new(n)
+  n.data = 9 # no need to write n^.data
+
+As can be seen by the example, reference types are the only types that can be
+used in *implicit forward declarations*: TNode may be used before it is
+defined, because only a refence to it is needed.
+
+To allocate a new traced object, the built-in procedure ``new`` has to be used.
+To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and
+``realloc`` can be used. The documentation of the system module contains
+further information.
+
+Special care has to be taken if an untraced object contains traced objects like
+traced references, strings or sequences: In order to free everything properly,
+the built-in procedure ``finalize`` has to be called before freeing the
+untraced memory manually!
+
+.. XXX finalizers for traced objects
+
+Procedural type
+~~~~~~~~~~~~~~~
+A procedural type is internally a pointer to procedure. Thus ``nil`` is an
+allowed value for variables of a procedural type. Nimrod uses procedural types
+to achieve `functional` programming techniques. Dynamic dispatch for OOP
+constructs can also be implemented with procedural types.
+
+Example::
+
+  type
+    TCallback = proc (x: int) {.cdecl.}
+
+  proc printItem(x: Int) = ...
+
+  proc forEach(c: TCallback) =
+    ...
+
+  forEach(printItem)  # this will NOT work because calling conventions differ
+
+A subtle issue with procedural types is that the calling convention of the
+procedure influences the type compability: Procedural types are only compatible
+if they have the same calling convention.
+
+Altough a Nimrod implementation may provide additional calling conventions
+the following shall always exist:
+
+``cdecl``
+    is used for interfacing with C; indicates that a proc shall
+    use the same calling convention as the C compiler.
+
+``inline``
+    indicates that the caller of the proc should not call it,
+    but rather inline its code in place for improved efficiency. Note that
+    this is only a hint for the compiler: It may completely ignore it and
+    it may inline procedures that are not marked as ``inline``.
+
+``closure``
+    indicates that the procedure expects a context, a `closure` that needs
+    to be passed to the procedure.
+
+
+Statements
+----------
+Nimrod uses the common statement/expression paradigma: Statements do not
+produce a value in contrast to expressions. Call expressions are statements.
+If the called procedure returns a value, it is not a valid statement
+as statements do not produce values. To evaluate an expression for
+side-effects and throwing its value away, one can use the ``discard``
+statement.
+
+Statements are separated into `simple statements` and `complex statements`.
+Simple statements are statements that cannot contain other statements, like
+assignments, calls or the ``return`` statement; complex statements can
+contain other statements. To avoid the `dangling else problem`, complex
+statements always have to be intended::
+
+  XXX
+
+
+
+Discard statement
+~~~~~~~~~~~~~~~~~
+
+Syntax::
+
+  discard_stmt ::= discard expr
+
+The `discard` statement evaluates its expression for side-effects and throws
+the expression's resulting value away. If the expression has no side-effects,
+this shall generate at least a warning from the compiler.
+
+
+Var statement
+~~~~~~~~~~~~~
+
+Syntax::
+
+  varlist ::= identlist [asgn_opr expr doc]
+  var_section ::= var varlist
+           |      var INDENT(x > I[-1]) PUSH(x) varlist
+                    { INDENT(x) varlist } POP
+
+`Var` statements simply declare new local and global variables and may
+initialize them. A comma seperated list of variables can be used to specify
+variables of the same type::
+
+  var
+    a: int = 0
+    x, y, z: int
+
+However, an initializer is not allowed for such a list as its semantic
+would be ambigious in some cases. If an initializer is given the type
+can be omitted: The variable is of the same type as the initializing
+expression.
+
+
+If statement
+~~~~~~~~~~~~
+
+Syntax::
+
+  if_stmt ::= PUSH(x = I[-1]) if expr ":" stmts
+                  { INDENT(x) elif expr ":" stmts }
+                  [ INDENT(x) else ":" stmts ]
+              POP
+
+The `if` statement is a simple way to make a branch in the control flow:
+The expression after the keyword ``if`` is evaluated, if it is true
+the corresponding statements after the ``:`` are executed. Otherwise
+the expression after the ``elif`` is evaluated (if there is an
+``elif`` branch), if it is true the corresponding statements after
+the ``:`` are executed. This goes on until the last ``elif``. If all
+conditions fail, the ``else`` part is executed. If there is no ``else``
+part, execution continues with the statement after the ``if`` statement.
+
+
+Case statement
+~~~~~~~~~~~~~~
+
+Syntax::
+
+  case_stmt ::= PUSH(I[-1]) case expr
+                INDENT(x>=I[-1]) of vallist ":" stmts
+                    { INDENT(x) of vallist ":" stmts }
+                    { INDENT(x) elif expr ":" stmts }
+                    [ INDENT(x) else ":" stmts ]
+                POP
+
+The `case` statement is similar to the if statement, but it represents
+a multi-branch selection. The expression after the keyword ``case`` is
+evaluated and if its value is in a *vallist* the corresponding statements
+(after the ``of`` keyword) are executed. If the value is no given *vallist*
+the ``else`` part is executed. If there is no ``else`` part and not all
+possible values that ``expr`` can hold occur in a ``vallist``, a static
+error shall be given. This holds only for expressions of ordinal types.
+If the expression is not of an ordinal type, and no ``else`` part is
+given, control just passes after the ``case`` statement.
+
+To suppress the static error in the ordinal case the programmer needs
+to write an ``else`` part with a ``nil`` statement.
+
+
+When statement
+~~~~~~~~~~~~~~
+
+Syntax::
+
+  when_stmt ::= PUSH(x=I[-1]) when expr ":" stmts
+                  { INDENT(x) elif expr ":" stmts }
+                  [ INDENT(x) else ":" stmts ]
+                POP
+
+The `when` statement is almost identical to the ``if`` statement with some
+exceptions:
+
+* Each ``expr`` has to be a constant expression of type ``bool``.
+* The statements do not open a new scope if they introduce new identifiers.
+* The statements that belong to the expression that evaluated to true are
+  translated by the compiler, the other statements are not checked for
+  syntax or semantics at all! This holds also for any ``expr`` coming
+  after the expression that evaluated to true.
+
+The ``when`` statement enables conditional compilation techniques. As
+a special syntatic extension, the ``when `` construct is also available
+within ``record`` or ``object`` definitions.
+
+
+Raise statement
+~~~~~~~~~~~~~~~
+
+Syntax::
+
+  raise_stmt ::= raise [qualified_identifier [comma expr]]
+
+Apart from built-in operations like array indexing, memory allocation, etc.
+the ``raise`` statement is the only way to raise an exception. The
+identifier has to be the name of a previously declared exception. A
+comma followed by an expression may follow; the expression must be of type
+``string`` or ``cstring``; this is an error message that can be extracted
+with the `getCurrentExceptionMsg` procedure in the module ``system``.
+
+If no exception name is given, the current exception is `re-raised`. The
+`ENoExceptionToReraise` exception is raised if there is no exception to
+re-raise. It follows that the ``raise`` statement *always* raises an
+exception.
+
+
+Try statement
+~~~~~~~~~~~~~
+
+Syntax::
+
+  try_stmt ::= PUSH(x=I[-1]) try ":" stmts
+                { INDENT(x)  except exceptlist ":" stmts }
+                [ INDENT(x)  except ":" stmts ]
+                [ INDENT(x)  finally ":" stmts ]
+               POP
+
+The statements after the ``try`` are executed in sequential order unless
+an exception ``e`` is raised. If the exception type of ``e`` matches any
+of the list ``exceptlist`` the corresponding statements are executed.
+The statements following the ``except`` clauses are called
+`exception handlers`.
+
+The empty ``except`` clause is executed if there is an exception that is
+in no list. It is similiar to an ``else`` clause in ``if`` statements.
+
+If there is a ``finally`` clause given, it is always executed after the
+exception handlers.
+
+The exception is *consumed* in an exception handler. However, an
+exception handler may raise another exception. If the exception is not
+handled, it is propagated through the call stack. This means that often
+the rest of the procedure - that is not within a ``finally`` clause -
+is not executed (if an exception occurs).
+
+
+Block statement
+~~~~~~~~~~~~~~~
+
+Syntax::
+
+  block_stmt ::= block [IDENTIFIER] ":" stmts
+
+The block statement is a means to group statements to a (named) `block`.
+Inside the block, the ``break`` statement is allowed to leave the block
+immediately. A ``break`` statement can contain a name of a surrounding
+block to specify which block is to leave.
+
+
+Break statement
+~~~~~~~~~~~~~~~
+
+Syntax::
+
+  break_stmt ::= break [IDENTIFIER]
+
+The break statement is used to leave a block immediately. If ``IDENTIFIER``
+is given, it is the name of the enclosing block that is to leave. If it is
+absent, the innermost block is leaved.
+
+
+While statement
+~~~~~~~~~~~~~~~
+
+Syntax::
+
+  while_stmt ::= WHILE expr COLON stmts
+
+The `while` statement is executed until the ``expr`` evaluates to false.
+Endless loops are no error. ``while`` statements open an `implicit block`,
+so that they can be aborted by a ``break`` statement.
+
+
+For statement & iterators
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Syntax::
+
+  for_stmt ::= PUSH(x=I[-1]) for exprlist in expr [".." expr] ":" stmts
+               POP
+
+The `for` statement is an abstract mechanism to iterate over the elements
+of a container. It relies on an *iterator* to do so. Like ``while``
+statements, ``for`` statements open an `implicit block`, so that they
+can be aborted by a ``break`` statement.
+
+XXX
+
+
+
+Assembler statement
+~~~~~~~~~~~~~~~~~~~
+Syntax::
+
+  asm_stmt ::= asm [CHAR_LITERAL] STRING_LITERAL
+
+The direct embedding of assembler code into Nimrod code is supported by the
+unsafe ``asm`` statement. Identifiers in the assembler code that refer to
+Nimrod identifiers shall be enclosed in a special character which can be
+specified right after the ``asm`` keyword. The default special character is
+``'!'``. An implementation does not need to support the assembler statement,
+giving a static error if it encounters one.
+
+
+Modules
+-------
+Nimrod supports splitting a program into pieces by a module concept. Modules
+make separate compilation possible. Each module needs to be in its own file.
+Modules consist of an interface and an implementation section. The interface
+section lists the symbols that can be imported from other modules. Thus modules
+enable `information hiding`. The interface section may not contain any
+code that is executable. This means that only the headers of procedures can
+appear in the interface. A module may gain access to symbols of another module
+by the `import` statement. Recursive module dependancies are allowed, but
+slightly subtle.
+
+The algorithm for compiling modules is:
+
+- Compile the whole module as usual, following import statements recursively
+- if we have a cycle only import the already parsed symbols (in the interface
+  of course); if an unknown identifier occurs then abort
+
+This is best illustrated by an example::
+
+  # Module A
+  interface
+  type
+    T1 = int
+  import B # the compiler starts parsing B
+
+  implementation
+
+  proc main() =
+    var i = p(3) # works because B has been parsed completely here
+
+  main()
+
+
+  # Module B
+  interface
+  import A  # A is not parsed here! Only the already known symbols
+            # of A are imported here.
+
+  proc p(x: A.T1): A.T1 # this works because the compiler has already
+                        # added T1 to A's interface symbol table
+
+  implementation
+
+  proc p(x: A.T1): A.T1 = return x + 1
+
+
+Scope rules
+-----------
+Identifiers are valid from the point of their declaration until the end of
+the block in which the declaration occurred. The range where the identifier
+is known is the `scope` of the identifier. The exact scope of an identifier
+depends on the way it was declared.
+
+Block scope
+~~~~~~~~~~~
+The *scope* of a variable declared in the declaration part of a block
+is valid from the point of declaration until the end of the block. If a
+block contains a second block, in which the identifier is redeclared,
+then inside this block, the second declaration will be valid. Upon
+leaving the inner block, the first declaration is valid again. An
+identifier cannot be redefined in the same block, except if valid for
+procedure or iterator overloading purposes.
+
+
+Record or object scope
+~~~~~~~~~~~~~~~~~~~~~~
+The field identifiers inside a record or object definition are valid in the
+following places:
+
+* To the end of the record definition
+* Field designators of a variable of the given record type.
+* In all descendent types of the object type.
+
+Module scope
+~~~~~~~~~~~~
+All identifiers in the interface part of a module are valid from the point of
+declaration, until the end of the module. Furthermore, the identifiers are
+known in other modules that import the module. Identifiers from indirectly
+dependent modules are *not* available. Identifiers declared in the
+implementation part of a module are valid from the point of declaration to
+the end of the module. The ``system`` module is automatically imported in
+all other modules.
+
+If a module imports an identifier by two different modules,
+each occurance of the identifier has to be qualified, unless it is an
+overloaded procedure or iterator in which case the overloading
+resolution takes place::
+
+  # Module A
+  interface
+  var x: string
+
+  # Module B
+  interface
+  var x: int
+
+  # Module C
+  import A, B, io
+  write(stdout, x) # error: x is ambigious
+  write(sdtout, A.x) # no error: qualifier used
+
+
+Messages
+========
+
+A Nimrod compiler has to emit different kinds of messages: `hint`,
+`warning`, and `error` messages. `errors` have to be emitted if the compiler
+encounters any static errors. If and when the other two message kinds are
+emitted is not specified, unless a message is requested with the
+``hint`` or ``warning`` pragma.
+
+Pragmas
+=======
+Pragmas are Nimrod's method to give the compiler additional information/
+commands without introducing a massive number of new keywords. Pragmas are
+processed on the fly during parsing. Pragmas are always enclosed in the
+special ``{.`` and ``.}`` curly brackets. There are a number of pragmas
+that a compiler has to process; a compiler may define additional pragmas
+not specified here.
+
+
+define pragma
+-------------
+The `define` pragma defines a conditional symbol. This symbol may only be
+used in other pragmas and in the ``defined`` expression and not in ordinary
+Nimrod source code. The conditional symbols go into a special symbol table.
+The compiler shall define the target processor and the target operating
+system as conditional symbols. See `Annex A <XXX>`_ for a list of specified
+processors and operating systems. The Syntax of the define pragma is::
+
+  define_pragma ::= curlydot_le "define" colon IDENTIFIER curlydot_ri
+
+
+undef pragma
+------------
+The `undef` pragma the counterpart to the define pragma. It undefines a
+conditional symbol. Syntax::
+
+  undef_pragma ::= curlydot_le "undef" colon IDENTIFIER curlydot_ri
+
+
+error pragma
+------------
+The `error` pragma is used to make the compiler output an error message with
+the given content. Compilation may abort after an error (or not). Syntax::
+
+  error_pragma ::= curlydot_le "error" colon STRING_LITERAL curlydot_ri
+
+
+fatal pragma
+------------
+The `fatal` pragma is used to make the compiler output an error message with
+the given content. In contrast to the ``error`` pragma, compilation
+is guaranteed to be aborted by this pragma. Syntax::
+
+  fatal_pragma ::= curlydot_le "fatal" colon STRING_LITERAL curlydot_ri
+
+
+warning pragma
+--------------
+The `warning` pragma is used to make the compiler output a warning message with
+the given content. Compilation continues after the warning. Syntax::
+
+  warning_pragma ::= curlydot_le "warning" colon STRING_LITERAL curlydot_ri
+
+
+hint pragma
+-----------
+The `hint` pragma is used to make the compiler output a hint message with
+the given content. Compilation continues after the hint. Syntax::
+
+
+  hint_pragma ::= curlydot_le "hint" colon STRING_LITERAL curlydot_ri
+
+
+
+compilation option pragmas
+--------------------------
+The listed pragmas here can be used to override the code generation options
+for a section of code.
+::
+
+  "{." pragma: val {pragma: val} ".}"
+
+
+An implementation should provide at least the following possible options (it can
+add various others). If an implementation does not recognize the option, a
+warning shall be given to the user.
+
+===============  ===============  ============================================
+pragma           allowed values   description
+===============  ===============  ============================================
+checks           on|off           Turns the code generation for all runtime
+                                  checks on or off.
+bound_checks     on|off           Turns the code generation for array bound
+                                  checks on or off.
+overflow_checks  on|off           Turns the code generation for over- or
+                                  underflow checks on or off.
+nil_checks       on|off           Turns the code generation for nil pointer
+                                  checks on or off.
+assertions       on|off           Turns the code generation for assertions
+                                  on or off.
+warnings         on|off           Turns the warning messages of the compiler
+                                  on or off.
+hints            on|off           Turns the hint messages of the compiler
+                                  on or off.
+optimization     none|speed|size  Optimize the code for speed or size, or
+                                  disable optimization. For non-optimizing
+                                  compilers this option has no effect.
+                                  Neverless they must parse it properly.
+callconv         cdecl|...        Specifies the default calling convention for
+                                  all procedures (and procedure types) that
+                                  follow.
+===============  ===============  ============================================
+
+Example::
+
+  {.checks: off, optimization: speed.}
+  # compile without runtime checks and optimize for speed
+
+
+push and pop pragmas
+--------------------
+The push/pop pragmas are very similar to the option directive,
+but are used to override the settings temporarily. Example::
+
+  {.push checks: off.}
+  # compile this section without runtime checks as it is
+  # speed critical
+  # ... some code ...
+  {.pop.} # restore old settings
+
+
+Annex A: List of conditional symbols
+====================================
+
+``posix`` is defined on any POSIX compatible operating system.
+
++----------------------------+-----------------+
+|  Operating System          |       Symbols   |
++============================+=================+
+| AIX                        |   aix, posix    |
++----------------------------+-----------------+
+| Compaq Tru64 UNIX          |   tru64, posix  |
++----------------------------+-----------------+
+| Digital UNIX               |   tru64, posix  |
++----------------------------+-----------------+
+| OSF/1                      |   tru64, posix  |
++----------------------------+-----------------+
+| FreeBSD                    |     freebsd,    |
+|                            |     posix,      |
+|                            |     bsd         |
++----------------------------+-----------------+
+| GNU/Linux                  |     linux,      |
+|                            |     posix       |
++----------------------------+-----------------+
+| HP-UX                      |     hpux,       |
+|                            |     posix       |
++----------------------------+-----------------+
+| Irix                       |     irix,       |
+|                            |     posix       |
++----------------------------+-----------------+
+| MacOS X                    |     macosx,     |
+|                            |     posix       |
++----------------------------+-----------------+
+| NetBSD                     |     netbsd,     |
+|                            |     posix,      |
+|                            |     bsd         |
++----------------------------+-----------------+
+| OpenBSD                    |     openbsd,    |
+|                            |     posix,      |
+|                            |     bsd         |
++----------------------------+-----------------+
+| Solaris                    |     solaris,    |
+|                            |     posix       |
++----------------------------+-----------------+
+| Windows (all variants)     |     windows     |
++----------------------------+-----------------+
+
+
++----------------------------+-----------------+
+|  Processor                 |       Symbols   |
++============================+=================+
+| Compaq Alpha               |       alpha     |
++----------------------------+-----------------+
+| HP Precision Architecture  |       hppa      |
++----------------------------+-----------------+
+| INTEL x86                  |       x86       |
++----------------------------+-----------------+
+| AMD/INTEL x86 64bit        |       x86_64,   |
+|                            |       amd64     |
++----------------------------+-----------------+
+| MIPS RISC                  |       mips      |
++----------------------------+-----------------+
+| IBM Power PC               |       powerpc   |
++----------------------------+-----------------+
+| SPARC                      |       sparc     |
++----------------------------+-----------------+
+| MicroSPARC                 |       sparc     |
++----------------------------+-----------------+
+| UltraSPARC                 |       sparc     |
++----------------------------+-----------------+
+
+On targets for 16, 32 or 64 bit processors the symbols ``cpu16``, ``cpu32``
+or ``cpu64`` shall be defined respectively. On little endian machines the
+symbol ``little_endian`` and on big endian ones the symbol ``big_endian``
+are defined.