summary refs log tree commit diff stats
path: root/doc/spec.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/spec.txt')
-rwxr-xr-xdoc/spec.txt1297
1 files changed, 1297 insertions, 0 deletions
diff --git a/doc/spec.txt b/doc/spec.txt
new file mode 100755
index 000000000..3bad06e97
--- /dev/null
+++ b/doc/spec.txt
@@ -0,0 +1,1297 @@
+====================

+Nimrod Specification

+====================

+

+:Author: Andreas Rumpf

+

+.. contents::

+

+

+About this document

+===================

+

+This document describes the lexis, the syntax, and the semantics of Nimrod.

+However, this is only a first draft. Some parts need to be more precise,

+features may be added to the language, etc.

+

+The language constructs are explained using an extended BNF, in

+which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and

+``(a)?`` means an optional *a*; an alternative spelling for optional parts is

+``[a]``. The ``|`` symbol is used to mark alternatives

+and has the lowest precedence. Parentheses may be used to group elements.

+Non-terminals are in lowercase, terminal symbols (including keywords) are in

+UPPERCASE. An example::

+

+  if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts]

+

+

+Definitions

+===========

+

+The following defintions are the same as their counterparts in the

+specification of the Modula-3 programming language.

+

+A Nimrod program specifies a computation that acts on a sequence of digital

+components called `locations`. A variable is a set of locations that

+represents a mathematical value according to a convention determined by the

+variable's *type*. If a value can be represented by some variable of type

+``T``, then we say that the value is a *member* of ``T`` and ``T`` *contains*

+the value.

+

+An *identifier* is a symbol declared as a name for a variable, type,

+procedure, etc. The region of the program over which a declaration applies is

+called the *scope* of the declaration. Scopes can be nested. The meaning of an

+identifier is determined by the smallest enclosing scope in which the

+identifier is declared.

+

+An expression specifies a computation that produces a value or variable.

+Expressions that produce variables are called `designators`. A designator

+can denote either a variable or the value of that variable, depending on

+the context. Some designators are *readonly*, which means that they cannot

+be used in contexts that might change the value of the variable. A

+designator that is not readonly is called *writable*. Expressions whose

+values can be determined statically are called *constant expressions*;

+they are never designators.

+

+A `static error` is an error that the implementation must detect before

+program execution. Violations of the language definition are static

+errors unless they are explicitly classified as runtime errors.

+

+A `checked runtime error` is an error that the implementation must detect

+and report at runtime. The method for reporting such errors is via *raising

+exceptions*. However, an implementation may provide a means to disable these

+runtime checks. See the section *pragmas* for details.

+

+An `unchecked runtime error` is an error that is not guaranteed to be

+detected, and can cause the subsequent behavior of the computation to

+be arbitrary. Unchecked runtime errors cannot occur if only `safe`

+language features are used.

+

+

+Lexical Analysis

+================

+

+Indentation

+-----------

+

+Nimrod's standard grammar describes an `indentation sensitive` language.

+This means that all the control structures are recognized by the indentation.

+Indentation consists only of spaces; tabulators are not allowed.

+

+The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD``

+(same indentation) are generated by the scanner, denoting an indentation.

+Using tabulators for the indentation is not allowed.

+

+These terminals are only generated for *logical lines*, i.e. not for an empty

+line and not for a line with only whitespace or comments.

+

+The parser and the scanner communicate over a stack which indentation terminal

+should be generated: The stack consists of integers counting the spaces. The

+stack is initialized with a zero on its top. The scanner reads from the stack:

+If the current indentation token consists of more spaces than the entry at the

+top of the stack, a ``IND`` token is generated, else if it consists of the same

+number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces,

+a ``DED`` token is generated for any item on the stack that is greater than the

+current. These items are then popped from the stack by the scanner. At the end

+of the file, a ``DED`` token is generated for each number remaining on the

+stack that is larger than zero.

+

+Because the grammar contains some optional ``IND`` tokens, the scanner cannot

+push new indentation levels. This has to be done by the parser. The symbol

+``IND_PUSH`` indicates that the ``IND`` token should be pushed onto the stack

+by the parser.

+

+An Example how this works::

+

+  if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* (ELSE stmts)?

+  stmts ::= IND_PUSH (stmt [SAD])+ DED | stmt [SAD]

+

+  if expr0:

+  stmt1       # would be valid because, SAD is not generated any longer!

+

+  if expr0:

+    stmt1     # scanner: IND; parser pushes 2 onto the stack

+  if expr3: stmt5 # DED; ... SAD eaten by stmt5

+  else: stmt6     #

+  if expr4: stmt7

+  if expr:

+    stmt1     # scanner: IND; the parser pushes 2 onto the stack

+    stmt2     # scanner: SAD (because indentation is 2)

+  elif expr2: # scanner: DED (because indentation is 0)

+    stmts3    # scanner: IND; the parser pushes 2 onto the stack

+  else:       # scanner: DED;

+    stmt4     # scanner: IND; the parser pushes 2 onto the stack

+              # scanner generates 1 DED, because end of file and 1 item on stack

+

+

+

+Identifiers & Keywords

+----------------------

+

+`Identifiers` in Nimrod can be any string of letters, digits

+and underscores, beginning with a letter. Two immediate following

+underscores ``__`` are not allowed::

+

+  letter ::= 'A'..'Z' | 'a'..'z'

+  digit ::= '0'..'9'

+  IDENTIFIER ::= letter ( ['_'] letter | digit )*

+

+The following `keywords` are reserved and cannot be used as identifiers::

+

+  ${keywords}

+

+Some keywords are unused; they are reserved for future developments of the

+language.

+

+Nimrod is a `style-insensitive` language. This means that it is not

+case-sensitive and even underscores are ignored:

+**type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind

+this is, that this allows programmers to use their own prefered spelling style.

+Editors can show the identifiers as configured.

+

+

+Literal strings

+---------------

+

+`Literal strings` can be delimited by matching double quotes, and can contain

+the following `escape sequences`:

+

+==================         ==================================

+  Escape sequence          Meaning

+==================         ==================================

+  ``\n``                   `newline`

+  ``\r``                   `carriage return`

+  ``\l``                   `line feed`

+  ``\f``                   `form feed`

+  ``\t``                   `tabulator`

+  ``\v``                   `vertical tabulator`

+  ``\\``                   `backslash`

+  ``\"``                   `quotation mark`

+  ``\'``                   `apostrophe`

+  ``\y``                   `character with number y`

+  ``\a``                   `alert`

+  ``\b``                   `backspace`

+  ``\e``                   `escape` `[ESC]`

+==================         ==================================

+

+

+Strings in Nimrod may contain any 8-bit value, except embedded zeros,

+which are not allowed for compability with `C`.

+

+Literal strings can also be delimited by three double squotes

+``"""`` ... ``"""``.

+Literals in this form may run for several lines, may contain ``"`` and do not

+interpret any escape sequences.

+For convenience, when the opening ``"""`` is immediately

+followed by a newline, the newline is not included in the string.

+`Raw string literals` are preceded with the letter ``r`` (or ``R``)

+and are delimited by matching double quotes (just like ordinary string

+literals) and do not interpret the escape sequences.

+

+

+Literal characters

+------------------

+

+Character literals are enclosed in single quotes ``''`` and can contain the

+same escape sequences as strings - with one exception: ``\n`` is not allowed

+as it may be wider than one character (often it is the pair CR/LF for example).

+

+

+Numerical constants

+-------------------

+

+`Numerical constants` are of a single type and have the form::

+

+  hexdigit ::= digit | 'A'..'F' | 'a'..'f'

+  octdigit ::= '0'..'7'

+  bindigit ::= '0'..'1'

+  INT_LIT ::= digit ( ['_'] digit )*

+            | '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )*

+            | '0o' octdigit ( ['_'] octdigit )*

+            | '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )*

+

+  INT8_LIT ::= INT_LIT '\'' ('i' | 'I' ) '8'

+  INT16_LIT ::= INT_LIT '\'' ('i' | 'I' ) '16'

+  INT32_LIT ::= INT_LIT '\'' ('i' | 'I' ) '32'

+  INT64_LIT ::= INT_LIT '\'' ('i' | 'I' ) '64'

+

+  exponent ::= ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )*

+  FLOAT_LIT ::= digit (['_'] digit)*  ('.' (['_'] digit)* [exponent] |exponent)

+  FLOAT32_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '32'

+  FLOAT64_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '64'

+

+

+As can be seen in the productions, numerical constants can contain unterscores

+for readability. Integer and floating point literals may be given in decimal (no

+prefix), binary (prefix ``0b``), octal (prefix ``0o``) and

+hexadecimal (prefix ``0x``) notation.

+

+There exists a literal for each numerical type that are

+defined. The suffix starting with an apostophe ('\'') is called a

+`type suffix`. Literals without a type prefix are of the type ``int``, unless

+the literal contains a dot or an ``E`` in which case it is of type ``float``.

+

+The following table specifies type suffixes:

+

+=================    =========================

+  Type Suffix        Resulting type of literal

+=================    =========================

+  ``'i8``            int8

+  ``'i16``           int16

+  ``'i32``           int32

+  ``'i64``           int64

+  ``'f32``           float32

+  ``'f64``           float64

+=================    =========================

+

+Floating point literals may also be in binary, octal or hexadecimal

+notation:

+``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64``

+is approximately 1.72826e35 according to the IEEE floating point standard.

+

+

+

+

+Comments

+--------

+

+`Comments` start anywhere outside a string with the hash character ``#``.

+Comments run until the end of the line. Comments are tokens; they are only

+allowed at certain places in the input file as they belong to the syntax. This

+is essential for performing correct source-to-source transformations or

+documentation generators.

+

+

+Other tokens

+------------

+

+The following strings denote other tokens::

+

+       (     )     {     }     [     ]     ,  ;   [.    .]  {.   .}  (.  .)

+       :     =     ^     ..    `

+

+``..`` takes precedence over other tokens that contain a dot: ``{..}`` are the

+three tokens ``{``, ``..``, ``}`` and not the two tokens ``{.``, ``.}``.

+

+In Nimrod one can define his own operators. An `operator` is any

+combination of the following characters that are not listed above::

+

+       +     -     *     /     <     >

+       =     @     $     ~     &     %

+       !     ?     ^     .     |

+

+These keywords are also operators:

+``and or not xor shl shr div mod in notin is isnot``.

+

+

+Syntax

+======

+

+This section lists Nimrod's standard syntax in ENBF. How the parser receives

+indentation tokens is already described in the Lexical Analysis section.

+

+Nimrod allows user-definable operators.

+Binary operators have 8 different levels of precedence. For user-defined

+operators, the precedence depends on the first character the operator consists

+of. All binary operators are left-associative.

+

+================  ==============================================  ==================  ===============

+Precedence level    Operators                                     First characters    Terminal symbol

+================  ==============================================  ==================  ===============

+  7 (highest)                                                     ``$``               OP7

+  6               ``*    /    div   mod   shl  shr  %``           ``* % \  /``        OP6

+  5               ``+    -``                                      ``+  ~  |``         OP5

+  4               ``&``                                           ``&``               OP4

+  3               ``==  <= < >= > !=  in  not_in  is  isnot``     ``= <  > !``        OP3

+  2               ``and``                                                             OP2

+  1               ``or xor``                                                          OP1

+  0 (lowest)                                                      ``? @ ^ ` : .``     OP0

+================  ==============================================  ==================  ===============

+

+

+The grammar's start symbol is ``module``. The grammar is LL(1) and therefore

+not ambigious.

+

+.. include:: grammar.txt

+   :literal:

+

+

+

+Semantics

+=========

+

+Constants

+---------

+

+Constants are symbols which are bound to a value. The constant's value

+cannot change. The compiler must be able to evaluate the expression in a

+constant declaration at compile time. This means that most of the functions in

+the runtime library cannot be used in a constant declaration.

+

+Operators such as ``+, -, *, /, not, and, or, div, mod`` and the procedures

+``addr, ord, chr, sizeof, trunc, round, frac, odd, abs`` can be used, however.

+An implementation may restrict the usage of ``addr`` in constant expressions.

+

+

+Types

+-----

+

+All expressions have a type which is known at compile time. Thus Nimrod is

+statically typed. One can declare new types, which is in

+essence defining an identifier that can be used to denote this custom type

+when declaring variables further in the source code.

+

+These are the major type classes:

+

+* ordinal types (consist of integer, bool, character, enumeration

+  (and subranges thereof) types)

+* floating point types

+* string type

+* structured types

+* reference (pointer) type

+* procedural type

+

+

+Ordinal types

+~~~~~~~~~~~~~

+Ordinal types have the following characteristics:

+

+- Ordinal types are countable and ordered. This property allows

+  the operation of functions as Inc, Ord, Dec on ordinal types to be defined.

+- Ordinal values have a smallest possible value. Trying to count farther

+  down than the smallest value gives a checked runtime or static error.

+- Ordinal values have a largest possible value. Trying to count farther

+  than the largest value gives a checked runtime or static error.

+

+Signed integers, bool, characters and enumeration types (and subrange of these

+types) belong to ordinal types. Unsigned integer types are special in the way

+that over- and underflows generate no errors, but wrap around.

+

+Pre-defined numerical types

+~~~~~~~~~~~~~~~~~~~~~~~~~~~

+These integer types are pre-defined:

+

+``int``

+  the generic signed integer type; its size is platform dependant

+  (a compiler should choose the processor's fastest integer type)

+  this type should be used in general. An integer literal that has no type

+  suffix is of this type.

+

+intXX

+  an implementation may define additional signed integer types

+  of XX bits using this naming scheme (example: int16 is a 16 bit wide integer).

+  The current implementation supports ``int8``, ``int16``, ``int32``, ``int64``.

+  Literals of these types have the suffix 'iXX.

+

+

+There are no unsigned integer types, only *unsigned operations* that treat their

+arguments as unsigned. Unsigned operations all wrap around; they may not lead to

+over- or underflow errors.

+

+The following floating point types are pre-defined:

+

+``float``

+  the generic floating point type; its size is platform dependant

+  (a compiler should choose the processor's fastest floating point type)

+  this type should be used in general

+

+floatXX

+  an implementation may define additional floating point types of XX bits using

+  this naming scheme (example: float64 is a 64 bit wide float). The current

+  implementation supports ``float32`` and ``float64``. Literals of these types

+  have the suffix 'fXX.

+

+Automatic type conversion in expressions where different kinds

+of integer types are used is performed. However, if the type conversion

+loses information, the `EConvertError` exception is raised. An implementation

+may detect certain cases of the convert error at compile time.

+

+Automatic type conversion in expressions with different kinds

+of floating point types is performed: The smaller type is

+converted to the larger. Arithmetic performed on floating point types

+follows the IEEE standard.

+

+

+Boolean type

+~~~~~~~~~~~~

+The boolean type is named ``bool`` in Nimrod and can be one of the two

+pre-defined values ``true`` and ``false``. Conditions in while,

+if, elif, when statements need to be of type bool.

+

+This condition should hold::

+

+  ord(false) == 0 and ord(true) == 1

+

+The operators ``not, and, or, xor`` are defined for the bool type.

+The ``and`` and ``or`` operators perform short-cut evaluation.

+Example::

+

+  while p != nil and p.name != "xyz":

+    # p.name is not evaluated if p == nil

+    p = p.next

+

+

+The size of the bool type is implementation-dependant, typically it is

+one byte.

+

+

+Character type

+~~~~~~~~~~~~~~

+The character type is named ``char`` in Nimrod and uses the platform's

+native encoding. Thus on nearly every platform its size is one byte due to

+the popular UTF-8 encoding.

+Character literals are enclosed in single quotes ``''``.

+

+.. Note:: For platform-independant character handling is the ``encoding``

+          standard module.

+

+

+Enumeration types

+~~~~~~~~~~~~~~~~~

+Enumeration types define a new type whose values consist only of the ones

+specified.

+The values are ordered by the order in enum's declaration. Example::

+

+  type

+    TDirection = enum

+      north, east, south, west

+

+

+Now the following holds::

+

+  ord(north) == 0

+  ord(east) == 1

+  ord(south) == 2

+  ord(west) == 3

+

+Thus, north < east < south < west. The comparison operators can be used

+with enumeration types.

+

+An implemenation should store enumeration types with the minimal number of

+bytes required for the particular enum, unless efficiency would be affected by

+doing so.

+

+For better interfacing to other programming languages, the fields of enum

+types can be assigned an explicit ordinal value. However, the ordinal values

+have to be in ascending order appropriately. A field whose ordinal value that

+is not explicitly given, gets the value of the previous field + 1.

+

+An explicit ordered enum can have *wholes*::

+

+  type

+    TTokenType = enum

+      a = 2, b = 4, c = 89 # wholes are valid

+

+However, it is then not an ordinal anymore, so it is not possible to use these

+enums as an index type for arrays. The procedures ``inc``, ``dec``, ``succ``

+and ``pred`` are not available for them.

+

+

+Subrange types

+~~~~~~~~~~~~~~

+A subrange type is a range of values from an ordinal type (the host type).

+To define a subrange type, one must specify it's limiting values: the highest

+and lowest value of the type::

+

+  type

+    TSubrange = range[0..5]

+

+

+``TSubrange`` is a subrange of an integer which can only hold the values 0

+to 5. Assigning an other value to a variable of type ``TSubrange`` is a

+checked runtime error (or static error if it can be statically

+determined). Assignments from the base type to one of its subrange types

+(and vice versa) are allowed.

+

+An implemenation should give it the same size as its base type.

+

+

+String type

+~~~~~~~~~~~

+All string literals are of the type string. A string in Nimrod is very

+similar to a sequence of characters. However, strings in Nimrod both are

+zero-terminated and have a length field. One can retrieve the length with the

+builtin ``length`` procedure; the length never counts the terminating zero.

+The assignment operator for strings always copies the string.

+

+Strings are compared by their lexicographical order. All comparison operators

+are available. String can be indexed like arrays (lower bound is 0). Unlike

+arrays, they can be used in case statements::

+

+  case paramStr(i)

+  of "-v": incl(options, optVerbose)

+  of "-h", "-?": incl(options, optHelp)

+  else: write(stdout, "invalid command line option!\n")

+

+

+Structured types

+~~~~~~~~~~~~~~~~

+A variable of a structured type can hold multiple values at the same time.

+Stuctured types can be nested to unlimited levels. Arrays, sequences, records,

+objects and sets belong to the structured types.

+

+Array type

+~~~~~~~~~~

+Arrays are a homogenous type, meaning that each element in the array has the

+same type. Arrays always have a fixed length which is specified at compile time

+(except for open arrays). They can be indexed by any ordinal type. A parameter

+may leave out the index type in the declaration making it an

+*open array*. An open array ``A`` is always indexed by integers from 0 to

+``length(A)-1``.

+

+A sequence may be passed to a parameter that is of type *open array*, but

+not to a multi-dimensional open array, because it is impossible to do so in an

+efficient manner.

+

+An array expression may be constructed by the array constructor ``[]``.

+A constructed array is assignment compatible to a sequence.

+

+Example::

+

+  type

+    TIntArray = array[0..5, int]

+  var

+    x: TIntArray

+  x = [1, 2, 3, 4, 5, 6] # this is the array constructor

+

+The lower bound of an array may be received by the built-in proc

+``low()``, the higher bound by ``high()``. The length may be

+received by ``length()``.

+

+Arrays are always bounds checked (at compile-time or at runtime). An

+implementation may provide a means to disable these checks.

+

+

+Sequence type

+~~~~~~~~~~~~~

+Sequences are similar to arrays but of dynamic length which may change

+during runtime (like strings). A sequence ``S`` is always indexed by integers

+from 0 to ``length(S)-1`` and its bounds are checked. Sequences can also be

+constructed by the array constructor ``[]``.

+

+

+Record and object types

+~~~~~~~~~~~~~~~~~~~~~~~

+A variable of a record or object type is a heterogenous storage container.

+A record or object defines various named *fields* of a type. The assignment

+operator for records and objects always copies the whole record/object. The

+constructor ``[]`` can be used to initialize records/objects. A field may

+be given a default value. Fields with default values do not have to be listed

+in a record construction, all other fields have to be listed.

+::

+

+  type

+    TPerson = record   # type representing a person

+      name: string     # a person consists of a name

+      age: int = 30    # and an age which default value is 30

+

+  var

+    person: TPerson

+  person = (name: "Peter") # person.age is its default value (30)

+

+An implementation may align or even reorder the fields for best access

+performance. The alignment may be specified with the `align`

+pragma. If an alignment is specified the compiler shall not reorder the fields.

+

+The difference between records and objects is that objects allow inheritance.

+Objects have access to their type at runtime, so that the ``is`` operator

+can be used to determine the object's type. Assignment from an object to its

+parents' object leads to a static or runtime error (the

+`EInvalidObjectAssignment` exception is raised).

+::

+

+  type

+    TPerson = object

+      name: string

+      age: int

+

+    TStudent = object of TPerson # a student is a person

+      id: int                    # with an id field

+

+  var

+    student: TStudent

+    person: TPerson

+  student = (name: "Peter", age: 89, id: 3)

+  person = (name: "Mary", age: 17)

+  assert(student is TStudent) # is true

+  person = student # this is an error; person has no storage for id.

+

+

+Set type

+~~~~~~~~

+The `set type` models the mathematical notion of a set. The set's

+basetype can only be an ordinal type. The reason is that sets are implemented

+as bit vectors. Sets are designed for high performance computing.

+

+.. Note:: The sets module can be used for sets of other types.

+

+Sets can be constructed via the set constructor: ``{}`` is the empty set. The

+empty set is type combatible with any special set type. The constructor

+can also be used to include elements (and ranges of elements) in the set::

+

+  {'a'..'z', '0'..'9'} # This constructs a set that conains the

+                       # letters from 'a' to 'z' and the digits

+                       # from '0' to '9'

+

+These operations are supported by sets:

+

+==================  ==========================================================

+operation           meaning

+==================  ==========================================================

+  A + B             union of two sets

+  A * B             intersection of two sets

+  A - B             difference of two sets (A without B's elements)

+  A == B            set equality

+  A <= B            subset relation (A is subset of B or equal to B)

+  A < B             strong subset relation (A is a real subset of B)

+  e in A            set membership (A contains element e)

+  A >< B            symmetric set difference (= (A - B) + (B - A))

+  card(A)           the cardinality of A (number of elements in A)

+  incl(A, elem)     same as A = A + {elem}, but may be faster

+  excl(A, elem)     same as A = A - {elem}, but may be faster

+==================  ==========================================================

+

+Reference type

+~~~~~~~~~~~~~~

+References (similiar to `pointers` in other programming languages) are a way to

+introduce many-to-one relationships. This means different references can point

+to and modify the same location in memory. References should be used sparingly

+in a program. They are only needed for constructing graphs.

+

+Nimrod distinguishes between *traced* and *untraced* references. Untraced

+references are also called `pointers`. The difference between them is that

+traced references are garbage collected, untraced are not. Thus untraced

+references are *unsafe*. However for certain low-level operations (accessing

+the hardware) untraced references are unavoidable.

+

+Traced references are declared with the **ref** keyword, untraced references

+are declared with the **ptr** keyword.

+

+The ``^`` operator can be used to derefer a reference, the ``addr`` procedure

+returns the address of an item. An address is always an untraced reference.

+Thus the usage of ``addr`` is an *unsafe* feature.

+

+The ``.`` (access a record field operator) and ``[]`` (array/string/sequence

+index operator) operators perform implicit dereferencing operations for

+reference types::

+

+  type

+    PNode = ref TNode

+    TNode = record

+      le, ri: PNode

+      data: int

+

+  var

+    n: PNode

+  new(n)

+  n.data = 9 # no need to write n^.data

+

+As can be seen by the example, reference types are the only types that can be

+used in *implicit forward declarations*: TNode may be used before it is

+defined, because only a refence to it is needed.

+

+To allocate a new traced object, the built-in procedure ``new`` has to be used.

+To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and

+``realloc`` can be used. The documentation of the system module contains

+further information.

+

+Special care has to be taken if an untraced object contains traced objects like

+traced references, strings or sequences: In order to free everything properly,

+the built-in procedure ``finalize`` has to be called before freeing the

+untraced memory manually!

+

+.. XXX finalizers for traced objects

+

+Procedural type

+~~~~~~~~~~~~~~~

+A procedural type is internally a pointer to procedure. Thus ``nil`` is an

+allowed value for variables of a procedural type. Nimrod uses procedural types

+to achieve `functional` programming techniques. Dynamic dispatch for OOP

+constructs can also be implemented with procedural types.

+

+Example::

+

+  type

+    TCallback = proc (x: int) {.cdecl.}

+

+  proc printItem(x: Int) = ...

+

+  proc forEach(c: TCallback) =

+    ...

+

+  forEach(printItem)  # this will NOT work because calling conventions differ

+

+A subtle issue with procedural types is that the calling convention of the

+procedure influences the type compability: Procedural types are only compatible

+if they have the same calling convention.

+

+Altough a Nimrod implementation may provide additional calling conventions

+the following shall always exist:

+

+``cdecl``

+    is used for interfacing with C; indicates that a proc shall

+    use the same calling convention as the C compiler.

+

+``inline``

+    indicates that the caller of the proc should not call it,

+    but rather inline its code in place for improved efficiency. Note that

+    this is only a hint for the compiler: It may completely ignore it and

+    it may inline procedures that are not marked as ``inline``.

+

+``closure``

+    indicates that the procedure expects a context, a `closure` that needs

+    to be passed to the procedure.

+

+

+Statements

+----------

+Nimrod uses the common statement/expression paradigma: Statements do not

+produce a value in contrast to expressions. Call expressions are statements.

+If the called procedure returns a value, it is not a valid statement

+as statements do not produce values. To evaluate an expression for

+side-effects and throwing its value away, one can use the ``discard``

+statement.

+

+Statements are separated into `simple statements` and `complex statements`.

+Simple statements are statements that cannot contain other statements, like

+assignments, calls or the ``return`` statement; complex statements can

+contain other statements. To avoid the `dangling else problem`, complex

+statements always have to be intended::

+

+  XXX

+

+

+

+Discard statement

+~~~~~~~~~~~~~~~~~

+

+Syntax::

+

+  discard_stmt ::= discard expr

+

+The `discard` statement evaluates its expression for side-effects and throws

+the expression's resulting value away. If the expression has no side-effects,

+this shall generate at least a warning from the compiler.

+

+

+Var statement

+~~~~~~~~~~~~~

+

+Syntax::

+

+  varlist ::= identlist [asgn_opr expr doc]

+  var_section ::= var varlist

+           |      var INDENT(x > I[-1]) PUSH(x) varlist

+                    { INDENT(x) varlist } POP

+

+`Var` statements simply declare new local and global variables and may

+initialize them. A comma seperated list of variables can be used to specify

+variables of the same type::

+

+  var

+    a: int = 0

+    x, y, z: int

+

+However, an initializer is not allowed for such a list as its semantic

+would be ambigious in some cases. If an initializer is given the type

+can be omitted: The variable is of the same type as the initializing

+expression.

+

+

+If statement

+~~~~~~~~~~~~

+

+Syntax::

+

+  if_stmt ::= PUSH(x = I[-1]) if expr ":" stmts

+                  { INDENT(x) elif expr ":" stmts }

+                  [ INDENT(x) else ":" stmts ]

+              POP

+

+The `if` statement is a simple way to make a branch in the control flow:

+The expression after the keyword ``if`` is evaluated, if it is true

+the corresponding statements after the ``:`` are executed. Otherwise

+the expression after the ``elif`` is evaluated (if there is an

+``elif`` branch), if it is true the corresponding statements after

+the ``:`` are executed. This goes on until the last ``elif``. If all

+conditions fail, the ``else`` part is executed. If there is no ``else``

+part, execution continues with the statement after the ``if`` statement.

+

+

+Case statement

+~~~~~~~~~~~~~~

+

+Syntax::

+

+  case_stmt ::= PUSH(I[-1]) case expr

+                INDENT(x>=I[-1]) of vallist ":" stmts

+                    { INDENT(x) of vallist ":" stmts }

+                    { INDENT(x) elif expr ":" stmts }

+                    [ INDENT(x) else ":" stmts ]

+                POP

+

+The `case` statement is similar to the if statement, but it represents

+a multi-branch selection. The expression after the keyword ``case`` is

+evaluated and if its value is in a *vallist* the corresponding statements

+(after the ``of`` keyword) are executed. If the value is no given *vallist*

+the ``else`` part is executed. If there is no ``else`` part and not all

+possible values that ``expr`` can hold occur in a ``vallist``, a static

+error shall be given. This holds only for expressions of ordinal types.

+If the expression is not of an ordinal type, and no ``else`` part is

+given, control just passes after the ``case`` statement.

+

+To suppress the static error in the ordinal case the programmer needs

+to write an ``else`` part with a ``nil`` statement.

+

+

+When statement

+~~~~~~~~~~~~~~

+

+Syntax::

+

+  when_stmt ::= PUSH(x=I[-1]) when expr ":" stmts

+                  { INDENT(x) elif expr ":" stmts }

+                  [ INDENT(x) else ":" stmts ]

+                POP

+

+The `when` statement is almost identical to the ``if`` statement with some

+exceptions:

+

+* Each ``expr`` has to be a constant expression of type ``bool``.

+* The statements do not open a new scope if they introduce new identifiers.

+* The statements that belong to the expression that evaluated to true are

+  translated by the compiler, the other statements are not checked for

+  syntax or semantics at all! This holds also for any ``expr`` coming

+  after the expression that evaluated to true.

+

+The ``when`` statement enables conditional compilation techniques. As

+a special syntatic extension, the ``when `` construct is also available

+within ``record`` or ``object`` definitions.

+

+

+Raise statement

+~~~~~~~~~~~~~~~

+

+Syntax::

+

+  raise_stmt ::= raise [qualified_identifier [comma expr]]

+

+Apart from built-in operations like array indexing, memory allocation, etc.

+the ``raise`` statement is the only way to raise an exception. The

+identifier has to be the name of a previously declared exception. A

+comma followed by an expression may follow; the expression must be of type

+``string`` or ``cstring``; this is an error message that can be extracted

+with the `getCurrentExceptionMsg` procedure in the module ``system``.

+

+If no exception name is given, the current exception is `re-raised`. The

+`ENoExceptionToReraise` exception is raised if there is no exception to

+re-raise. It follows that the ``raise`` statement *always* raises an

+exception.

+

+

+Try statement

+~~~~~~~~~~~~~

+

+Syntax::

+

+  try_stmt ::= PUSH(x=I[-1]) try ":" stmts

+                { INDENT(x)  except exceptlist ":" stmts }

+                [ INDENT(x)  except ":" stmts ]

+                [ INDENT(x)  finally ":" stmts ]

+               POP

+

+The statements after the ``try`` are executed in sequential order unless

+an exception ``e`` is raised. If the exception type of ``e`` matches any

+of the list ``exceptlist`` the corresponding statements are executed.

+The statements following the ``except`` clauses are called

+`exception handlers`.

+

+The empty ``except`` clause is executed if there is an exception that is

+in no list. It is similiar to an ``else`` clause in ``if`` statements.

+

+If there is a ``finally`` clause given, it is always executed after the

+exception handlers.

+

+The exception is *consumed* in an exception handler. However, an

+exception handler may raise another exception. If the exception is not

+handled, it is propagated through the call stack. This means that often

+the rest of the procedure - that is not within a ``finally`` clause -

+is not executed (if an exception occurs).

+

+

+Block statement

+~~~~~~~~~~~~~~~

+

+Syntax::

+

+  block_stmt ::= block [IDENTIFIER] ":" stmts

+

+The block statement is a means to group statements to a (named) `block`.

+Inside the block, the ``break`` statement is allowed to leave the block

+immediately. A ``break`` statement can contain a name of a surrounding

+block to specify which block is to leave.

+

+

+Break statement

+~~~~~~~~~~~~~~~

+

+Syntax::

+

+  break_stmt ::= break [IDENTIFIER]

+

+The break statement is used to leave a block immediately. If ``IDENTIFIER``

+is given, it is the name of the enclosing block that is to leave. If it is

+absent, the innermost block is leaved.

+

+

+While statement

+~~~~~~~~~~~~~~~

+

+Syntax::

+

+  while_stmt ::= WHILE expr COLON stmts

+

+The `while` statement is executed until the ``expr`` evaluates to false.

+Endless loops are no error. ``while`` statements open an `implicit block`,

+so that they can be aborted by a ``break`` statement.

+

+

+For statement & iterators

+~~~~~~~~~~~~~~~~~~~~~~~~~

+

+Syntax::

+

+  for_stmt ::= PUSH(x=I[-1]) for exprlist in expr [".." expr] ":" stmts

+               POP

+

+The `for` statement is an abstract mechanism to iterate over the elements

+of a container. It relies on an *iterator* to do so. Like ``while``

+statements, ``for`` statements open an `implicit block`, so that they

+can be aborted by a ``break`` statement.

+

+XXX

+

+

+

+Assembler statement

+~~~~~~~~~~~~~~~~~~~

+Syntax::

+

+  asm_stmt ::= asm [CHAR_LITERAL] STRING_LITERAL

+

+The direct embedding of assembler code into Nimrod code is supported by the

+unsafe ``asm`` statement. Identifiers in the assembler code that refer to

+Nimrod identifiers shall be enclosed in a special character which can be

+specified right after the ``asm`` keyword. The default special character is

+``'!'``. An implementation does not need to support the assembler statement,

+giving a static error if it encounters one.

+

+

+Modules

+-------

+Nimrod supports splitting a program into pieces by a module concept. Modules

+make separate compilation possible. Each module needs to be in its own file.

+Modules consist of an interface and an implementation section. The interface

+section lists the symbols that can be imported from other modules. Thus modules

+enable `information hiding`. The interface section may not contain any

+code that is executable. This means that only the headers of procedures can

+appear in the interface. A module may gain access to symbols of another module

+by the `import` statement. Recursive module dependancies are allowed, but

+slightly subtle.

+

+The algorithm for compiling modules is:

+

+- Compile the whole module as usual, following import statements recursively

+- if we have a cycle only import the already parsed symbols (in the interface

+  of course); if an unknown identifier occurs then abort

+

+This is best illustrated by an example::

+

+  # Module A

+  interface

+  type

+    T1 = int

+  import B # the compiler starts parsing B

+

+  implementation

+

+  proc main() =

+    var i = p(3) # works because B has been parsed completely here

+

+  main()

+

+

+  # Module B

+  interface

+  import A  # A is not parsed here! Only the already known symbols

+            # of A are imported here.

+

+  proc p(x: A.T1): A.T1 # this works because the compiler has already

+                        # added T1 to A's interface symbol table

+

+  implementation

+

+  proc p(x: A.T1): A.T1 = return x + 1

+

+

+Scope rules

+-----------

+Identifiers are valid from the point of their declaration until the end of

+the block in which the declaration occurred. The range where the identifier

+is known is the `scope` of the identifier. The exact scope of an identifier

+depends on the way it was declared.

+

+Block scope

+~~~~~~~~~~~

+The *scope* of a variable declared in the declaration part of a block

+is valid from the point of declaration until the end of the block. If a

+block contains a second block, in which the identifier is redeclared,

+then inside this block, the second declaration will be valid. Upon

+leaving the inner block, the first declaration is valid again. An

+identifier cannot be redefined in the same block, except if valid for

+procedure or iterator overloading purposes.

+

+

+Record or object scope

+~~~~~~~~~~~~~~~~~~~~~~

+The field identifiers inside a record or object definition are valid in the

+following places:

+

+* To the end of the record definition

+* Field designators of a variable of the given record type.

+* In all descendent types of the object type.

+

+Module scope

+~~~~~~~~~~~~

+All identifiers in the interface part of a module are valid from the point of

+declaration, until the end of the module. Furthermore, the identifiers are

+known in other modules that import the module. Identifiers from indirectly

+dependent modules are *not* available. Identifiers declared in the

+implementation part of a module are valid from the point of declaration to

+the end of the module. The ``system`` module is automatically imported in

+all other modules.

+

+If a module imports an identifier by two different modules,

+each occurance of the identifier has to be qualified, unless it is an

+overloaded procedure or iterator in which case the overloading

+resolution takes place::

+

+  # Module A

+  interface

+  var x: string

+

+  # Module B

+  interface

+  var x: int

+

+  # Module C

+  import A, B, io

+  write(stdout, x) # error: x is ambigious

+  write(sdtout, A.x) # no error: qualifier used

+

+

+Messages

+========

+

+A Nimrod compiler has to emit different kinds of messages: `hint`,

+`warning`, and `error` messages. `errors` have to be emitted if the compiler

+encounters any static errors. If and when the other two message kinds are

+emitted is not specified, unless a message is requested with the

+``hint`` or ``warning`` pragma.

+

+Pragmas

+=======

+Pragmas are Nimrod's method to give the compiler additional information/

+commands without introducing a massive number of new keywords. Pragmas are

+processed on the fly during parsing. Pragmas are always enclosed in the

+special ``{.`` and ``.}`` curly brackets. There are a number of pragmas

+that a compiler has to process; a compiler may define additional pragmas

+not specified here.

+

+

+define pragma

+-------------

+The `define` pragma defines a conditional symbol. This symbol may only be

+used in other pragmas and in the ``defined`` expression and not in ordinary

+Nimrod source code. The conditional symbols go into a special symbol table.

+The compiler shall define the target processor and the target operating

+system as conditional symbols. See `Annex A <XXX>`_ for a list of specified

+processors and operating systems. The Syntax of the define pragma is::

+

+  define_pragma ::= curlydot_le "define" colon IDENTIFIER curlydot_ri

+

+

+undef pragma

+------------

+The `undef` pragma the counterpart to the define pragma. It undefines a

+conditional symbol. Syntax::

+

+  undef_pragma ::= curlydot_le "undef" colon IDENTIFIER curlydot_ri

+

+

+error pragma

+------------

+The `error` pragma is used to make the compiler output an error message with

+the given content. Compilation may abort after an error (or not). Syntax::

+

+  error_pragma ::= curlydot_le "error" colon STRING_LITERAL curlydot_ri

+

+

+fatal pragma

+------------

+The `fatal` pragma is used to make the compiler output an error message with

+the given content. In contrast to the ``error`` pragma, compilation

+is guaranteed to be aborted by this pragma. Syntax::

+

+  fatal_pragma ::= curlydot_le "fatal" colon STRING_LITERAL curlydot_ri

+

+

+warning pragma

+--------------

+The `warning` pragma is used to make the compiler output a warning message with

+the given content. Compilation continues after the warning. Syntax::

+

+  warning_pragma ::= curlydot_le "warning" colon STRING_LITERAL curlydot_ri

+

+

+hint pragma

+-----------

+The `hint` pragma is used to make the compiler output a hint message with

+the given content. Compilation continues after the hint. Syntax::

+

+

+  hint_pragma ::= curlydot_le "hint" colon STRING_LITERAL curlydot_ri

+

+

+

+compilation option pragmas

+--------------------------

+The listed pragmas here can be used to override the code generation options

+for a section of code.

+::

+

+  "{." pragma: val {pragma: val} ".}"

+

+

+An implementation should provide at least the following possible options (it can

+add various others). If an implementation does not recognize the option, a

+warning shall be given to the user.

+

+===============  ===============  ============================================

+pragma           allowed values   description

+===============  ===============  ============================================

+checks           on|off           Turns the code generation for all runtime

+                                  checks on or off.

+bound_checks     on|off           Turns the code generation for array bound

+                                  checks on or off.

+overflow_checks  on|off           Turns the code generation for over- or

+                                  underflow checks on or off.

+nil_checks       on|off           Turns the code generation for nil pointer

+                                  checks on or off.

+assertions       on|off           Turns the code generation for assertions

+                                  on or off.

+warnings         on|off           Turns the warning messages of the compiler

+                                  on or off.

+hints            on|off           Turns the hint messages of the compiler

+                                  on or off.

+optimization     none|speed|size  Optimize the code for speed or size, or

+                                  disable optimization. For non-optimizing

+                                  compilers this option has no effect.

+                                  Neverless they must parse it properly.

+callconv         cdecl|...        Specifies the default calling convention for

+                                  all procedures (and procedure types) that

+                                  follow.

+===============  ===============  ============================================

+

+Example::

+

+  {.checks: off, optimization: speed.}

+  # compile without runtime checks and optimize for speed

+

+

+push and pop pragmas

+--------------------

+The push/pop pragmas are very similar to the option directive,

+but are used to override the settings temporarily. Example::

+

+  {.push checks: off.}

+  # compile this section without runtime checks as it is

+  # speed critical

+  # ... some code ...

+  {.pop.} # restore old settings

+

+

+Annex A: List of conditional symbols

+====================================

+

+``posix`` is defined on any POSIX compatible operating system.

+

++----------------------------+-----------------+

+|  Operating System          |       Symbols   |

++============================+=================+

+| AIX                        |   aix, posix    |

++----------------------------+-----------------+

+| Compaq Tru64 UNIX          |   tru64, posix  |

++----------------------------+-----------------+

+| Digital UNIX               |   tru64, posix  |

++----------------------------+-----------------+

+| OSF/1                      |   tru64, posix  |

++----------------------------+-----------------+

+| FreeBSD                    |     freebsd,    |

+|                            |     posix,      |

+|                            |     bsd         |

++----------------------------+-----------------+

+| GNU/Linux                  |     linux,      |

+|                            |     posix       |

++----------------------------+-----------------+

+| HP-UX                      |     hpux,       |

+|                            |     posix       |

++----------------------------+-----------------+

+| Irix                       |     irix,       |

+|                            |     posix       |

++----------------------------+-----------------+

+| MacOS X                    |     macosx,     |

+|                            |     posix       |

++----------------------------+-----------------+

+| NetBSD                     |     netbsd,     |

+|                            |     posix,      |

+|                            |     bsd         |

++----------------------------+-----------------+

+| OpenBSD                    |     openbsd,    |

+|                            |     posix,      |

+|                            |     bsd         |

++----------------------------+-----------------+

+| Solaris                    |     solaris,    |

+|                            |     posix       |

++----------------------------+-----------------+

+| Windows (all variants)     |     windows     |

++----------------------------+-----------------+

+

+

++----------------------------+-----------------+

+|  Processor                 |       Symbols   |

++============================+=================+

+| Compaq Alpha               |       alpha     |

++----------------------------+-----------------+

+| HP Precision Architecture  |       hppa      |

++----------------------------+-----------------+

+| INTEL x86                  |       x86       |

++----------------------------+-----------------+

+| AMD/INTEL x86 64bit        |       x86_64,   |

+|                            |       amd64     |

++----------------------------+-----------------+

+| MIPS RISC                  |       mips      |

++----------------------------+-----------------+

+| IBM Power PC               |       powerpc   |

++----------------------------+-----------------+

+| SPARC                      |       sparc     |

++----------------------------+-----------------+

+| MicroSPARC                 |       sparc     |

++----------------------------+-----------------+

+| UltraSPARC                 |       sparc     |

++----------------------------+-----------------+

+

+On targets for 16, 32 or 64 bit processors the symbols ``cpu16``, ``cpu32``

+or ``cpu64`` shall be defined respectively. On little endian machines the

+symbol ``little_endian`` and on big endian ones the symbol ``big_endian``

+are defined.