diff options
author | Andreas Rumpf <rumpf_a@web.de> | 2008-08-23 11:16:44 +0200 |
---|---|---|
committer | Andreas Rumpf <rumpf_a@web.de> | 2008-08-23 11:16:44 +0200 |
commit | 07d5a8085bbcc21a1d9d06a2976ecc00e9c8d55b (patch) | |
tree | b07a53afeb56f4bba917c1a3a843f48dd25b62be /doc/manual.txt | |
parent | 916c25f9a70b68eb7a5e2c45d7cc2e10c6e3a525 (diff) | |
download | Nim-07d5a8085bbcc21a1d9d06a2976ecc00e9c8d55b.tar.gz |
too many changes to list
Diffstat (limited to 'doc/manual.txt')
-rw-r--r-- | doc/manual.txt | 3487 |
1 files changed, 1747 insertions, 1740 deletions
diff --git a/doc/manual.txt b/doc/manual.txt index 8debb92a5..babd96813 100644 --- a/doc/manual.txt +++ b/doc/manual.txt @@ -1,1742 +1,1749 @@ -============= -Nimrod Manual -============= - -:Author: Andreas Rumpf -:Version: |nimrodversion| - -.. contents:: - - -About this document -=================== - -This document describes the lexis, the syntax, and the semantics of Nimrod. - -The language constructs are explained using an extended BNF, in -which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and -``(a)?`` means an optional *a*; an alternative spelling for optional parts is -``[a]``. The ``|`` symbol is used to mark alternatives -and has the lowest precedence. Parentheses may be used to group elements. -Non-terminals are in lowercase, terminal symbols (including keywords) are in -UPPERCASE. An example:: - - if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts] - -Other parts of Nimrod - like scoping rules or runtime semantics are only -described in an informal manner. The reason is that formal semantics are -difficult to write and understand. However, there is only one Nimrod -implementation, so one may consider it as the formal specification; -especially since the compiler's code is pretty clean (well, some parts of it). - - -Definitions -=========== - -A Nimrod program specifies a computation that acts on a memory consisting of -components called `locations`:idx:. A variable is basically a name for a -location. Each variable and location is of a certain `type`:idx:. The -variable's type is called `static type`:idx:, the location's type is called -`dynamic type`:idx:. If the static type is not the same as the dynamic type, -it is a supertype of the dynamic type. - -An `identifier`:idx: is a symbol declared as a name for a variable, type, -procedure, etc. The region of the program over which a declaration applies is -called the `scope`:idx: of the declaration. Scopes can be nested. The meaning -of an identifier is determined by the smallest enclosing scope in which the -identifier is declared. - -An expression specifies a computation that produces a value or location. -Expressions that produce locations are called `l-values`:idx:. An l-value -can denote either a location or the value the location contains, depending on -the context. Expressions whose values can be determined statically are called -`constant expressions`:idx:; they are never l-values. - -A `static error`:idx: is an error that the implementation detects before -program execution. Unless explicitly classified, an error is a static error. - -A `checked runtime error`:idx: is an error that the implementation detects -and reports at runtime. The method for reporting such errors is via *raising -exceptions*. However, the implementation provides a means to disable these -runtime checks. See the section pragmas_ for details. - -An `unchecked runtime error`:idx: is an error that is not guaranteed to be -detected, and can cause the subsequent behavior of the computation to -be arbitrary. Unchecked runtime errors cannot occur if only `safe`:idx: -language features are used. - - -Lexical Analysis -================ - -Encoding --------- - -All Nimrod source files are in the UTF-8 encoding (or its ASCII subset). Other -encodings are not supported. Any of the standard platform line termination -sequences can be used - the Unix form using ASCII LF (linefeed), the Windows -form using the ASCII sequence CR LF (return followed by linefeed), or the old -Macintosh form using the ASCII CR (return) character. All of these forms can be -used equally, regardless of platform. - - -Indentation ------------ - -Nimrod's standard grammar describes an `indentation sensitive`:idx: language. -This means that all the control structures are recognized by indentation. -Indentation consists only of spaces; tabulators are not allowed. - -The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD`` -(same indentation) are generated by the scanner, denoting an indentation. - -These terminals are only generated for lines that are not empty or contain -only whitespace and comments. - -The parser and the scanner communicate over a stack which indentation terminal -should be generated: The stack consists of integers counting the spaces. The -stack is initialized with a zero on its top. The scanner reads from the stack: -If the current indentation token consists of more spaces than the entry at the -top of the stack, a ``IND`` token is generated, else if it consists of the same -number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces, -a ``DED`` token is generated for any item on the stack that is greater than the -current. These items are then popped from the stack by the scanner. At the end -of the file, a ``DED`` token is generated for each number remaining on the -stack that is larger than zero. - -Because the grammar contains some optional ``IND`` tokens, the scanner cannot -push new indentation levels. This has to be done by the parser. The symbol -``indPush`` indicates that an ``IND`` token is expected; the current number of -leading spaces is pushed onto the stack by the parser. - -Comments --------- - -`Comments`:idx: start anywhere outside a string or character literal with the -hash character ``#``. -Comments consist of a concatenation of `comment pieces`:idx:. A comment piece -starts with ``#`` and runs until the end of the line. The end of line characters -belong to the piece. If the next line only consists of a comment piece which is -aligned to the preceding one, it does not start a new comment: - -.. code-block:: nimrod - - i = 0 # This is a single comment over multiple lines belonging to the - # assignment statement. The scanner merges these two pieces. - # This is a new comment belonging to the current block, but to no particular - # statement. - i = i + 1 # This a new comment that is NOT - echo(i) # continued here, because this comment refers to the echo statement - -Comments are tokens; they are only allowed at certain places in the input file -as they belong to the syntax tree! This feature enables perfect source-to-source -transformations (such as pretty-printing) and superior documentation generators. -A side-effect is that the human reader of the code always knows exactly which -code snippet the comment refers to. - - -Identifiers & Keywords ----------------------- - -`Identifiers`:idx: in Nimrod can be any string of letters, digits -and underscores, beginning with a letter. Two immediate following -underscores ``__`` are not allowed:: - - letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' - digit ::= '0'..'9' - IDENTIFIER ::= letter ( ['_'] letter | digit )* - -The following `keywords`:idx: are reserved and cannot be used as identifiers: - -.. code-block:: nimrod - :file: ../data/keywords.txt - -Some keywords are unused; they are reserved for future developments of the -language. - -Nimrod is a `style-insensitive`:idx: language. This means that it is not -case-sensitive and even underscores are ignored: -**type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind -this is that this allows programmers to use their own prefered spelling style -and libraries written by different programmers cannot use incompatible -conventions. The editors or IDE can show the identifiers as preferred. Another -advantage is that it frees the programmer from remembering the spelling of an -identifier. - - -Literal strings ---------------- - -`Literal strings`:idx: can be delimited by matching double quotes, and can -contain the following `escape sequences`:idx:\ : - -================== =================================================== - Escape sequence Meaning -================== =================================================== - ``\n`` `newline`:idx: - ``\r`` `carriage return`:idx: - ``\l`` `line feed`:idx: - ``\f`` `form feed`:idx: - ``\t`` `tabulator`:idx: - ``\v`` `vertical tabulator`:idx: - ``\\`` `backslash`:idx: - ``\"`` `quotation mark`:idx: - ``\'`` `apostrophe`:idx: - ``\d+`` `character with decimal value d`:idx:; - all decimal digits directly - following are used for the - character - ``\a`` `alert`:idx: - ``\b`` `backspace`:idx: - ``\e`` `escape`:idx: `[ESC]`:idx: - ``\xHH`` `character with hex value HH`:idx:; - exactly two hex digits are allowed -================== =================================================== - - -Strings in Nimrod may contain any 8-bit value, except embedded zeros -which are not allowed for compability with `C`:idx:. - -Literal strings can also be delimited by three double squotes -``"""`` ... ``"""``. -Literals in this form may run for several lines, may contain ``"`` and do not -interpret any escape sequences. -For convenience, when the opening ``"""`` is immediately -followed by a newline, the newline is not included in the string. -There are also `raw string literals` that are preceded with the letter ``r`` -(or ``R``) and are delimited by matching double quotes (just like ordinary -string literals) and do not interpret the escape sequences. This is especially -convenient for regular expressions or Windows paths: - -.. code-block:: nimrod - - var f = openFile(r"C:\texts\text.txt") # a raw string, so ``\t`` is no tab - - -Literal characters ------------------- - -Character literals are enclosed in single quotes ``''`` and can contain the -same escape sequences as strings - with one exception: ``\n`` is not allowed -as it may be wider than one character (often it is the pair CR/LF for example). -A character is not an Unicode character but a single byte. The reason for this -is efficiency: For the overwhelming majority of use-cases, the resulting -programs will still handle UTF-8 properly as UTF-8 was specially designed for -this. -Another reason is that Nimrod should support ``array[char, int]`` or -``set[char]`` efficiently as many algorithms rely on this feature. - - -Numerical constants -------------------- - -`Numerical constants`:idx: are of a single type and have the form:: - - hexdigit ::= digit | 'A'..'F' | 'a'..'f' - octdigit ::= '0'..'7' - bindigit ::= '0'..'1' - INT_LIT ::= digit ( ['_'] digit )* - | '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )* - | '0o' octdigit ( ['_'] octdigit )* - | '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )* - - INT8_LIT ::= INT_LIT '\'' ('i' | 'I' ) '8' - INT16_LIT ::= INT_LIT '\'' ('i' | 'I' ) '16' - INT32_LIT ::= INT_LIT '\'' ('i' | 'I' ) '32' - INT64_LIT ::= INT_LIT '\'' ('i' | 'I' ) '64' - - exponent ::= ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )* - FLOAT_LIT ::= digit (['_'] digit)* ('.' (['_'] digit)* [exponent] |exponent) - FLOAT32_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '32' - FLOAT64_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '64' - - -As can be seen in the productions, numerical constants can contain unterscores -for readability. Integer and floating point literals may be given in decimal (no -prefix), binary (prefix ``0b``), octal (prefix ``0o``) and -hexadecimal (prefix ``0x``) notation. - -There exists a literal for each numerical type that is -defined. The suffix starting with an apostophe ('\'') is called a -`type suffix`:idx:. Literals without a type prefix are of the type ``int``, -unless the literal contains a dot or an ``E`` in which case it is of -type ``float``. - -The following table specifies type suffixes: - -================= ========================= - Type Suffix Resulting type of literal -================= ========================= - ``'i8`` int8 - ``'i16`` int16 - ``'i32`` int32 - ``'i64`` int64 - ``'f32`` float32 - ``'f64`` float64 -================= ========================= - -Floating point literals may also be in binary, octal or hexadecimal -notation: -``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64`` -is approximately 1.72826e35 according to the IEEE floating point standard. - - - -Other tokens ------------- - -The following strings denote other tokens:: - - ( ) { } [ ] , ; [. .] {. .} (. .) - : = ^ .. ` - -`..`:tok: takes precedence over other tokens that contain a dot: `{..}`:tok: are -the three tokens `{`:tok:, `..`:tok:, `}`:tok: and not the two tokens -`{.`:tok:, `.}`:tok:. - -In Nimrod one can define his own operators. An `operator`:idx: is any -combination of the following characters that are not listed above:: - - + - * / < > - = @ $ ~ & % - ! ? ^ . | - -These keywords are also operators: -``and or not xor shl shr div mod in notin is isnot``. - - -Syntax -====== - -This section lists Nimrod's standard syntax in ENBF. How the parser receives -indentation tokens is already described in the Lexical Analysis section. - -Nimrod allows user-definable operators. -Binary operators have 8 different levels of precedence. For user-defined -operators, the precedence depends on the first character the operator consists -of. All binary operators are left-associative. - -================ ============================================== ================== =============== -Precedence level Operators First characters Terminal symbol -================ ============================================== ================== =============== - 7 (highest) ``$`` OP7 - 6 ``* / div mod shl shr %`` ``* % \ /`` OP6 - 5 ``+ -`` ``+ ~ |`` OP5 - 4 ``&`` ``&`` OP4 - 3 ``== <= < >= > != in not_in is isnot`` ``= < > !`` OP3 - 2 ``and`` OP2 - 1 ``or xor`` OP1 - 0 (lowest) ``? @ ^ ` : .`` OP0 -================ ============================================== ================== =============== - - -The grammar's start symbol is ``module``. The grammar is LL(1) and therefore -not ambigious. - -.. include:: grammar.txt - :literal: - - - -Semantics -========= - -Constants ---------- - -`Constants`:idx: are symbols which are bound to a value. The constant's value -cannot change. The compiler must be able to evaluate the expression in a -constant declaration at compile time. +============= +Nimrod Manual +============= + +:Author: Andreas Rumpf +:Version: |nimrodversion| + +.. contents:: + + +About this document +=================== + +This document describes the lexis, the syntax, and the semantics of Nimrod. + +The language constructs are explained using an extended BNF, in +which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and +``(a)?`` means an optional *a*; an alternative spelling for optional parts is +``[a]``. The ``|`` symbol is used to mark alternatives +and has the lowest precedence. Parentheses may be used to group elements. +Non-terminals are in lowercase, terminal symbols (including keywords) are in +UPPERCASE. An example:: + + if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts] + +Other parts of Nimrod - like scoping rules or runtime semantics are only +described in an informal manner. The reason is that formal semantics are +difficult to write and understand. However, there is only one Nimrod +implementation, so one may consider it as the formal specification; +especially since the compiler's code is pretty clean (well, some parts of it). + + +Definitions +=========== + +A Nimrod program specifies a computation that acts on a memory consisting of +components called `locations`:idx:. A variable is basically a name for a +location. Each variable and location is of a certain `type`:idx:. The +variable's type is called `static type`:idx:, the location's type is called +`dynamic type`:idx:. If the static type is not the same as the dynamic type, +it is a supertype of the dynamic type. + +An `identifier`:idx: is a symbol declared as a name for a variable, type, +procedure, etc. The region of the program over which a declaration applies is +called the `scope`:idx: of the declaration. Scopes can be nested. The meaning +of an identifier is determined by the smallest enclosing scope in which the +identifier is declared. + +An expression specifies a computation that produces a value or location. +Expressions that produce locations are called `l-values`:idx:. An l-value +can denote either a location or the value the location contains, depending on +the context. Expressions whose values can be determined statically are called +`constant expressions`:idx:; they are never l-values. + +A `static error`:idx: is an error that the implementation detects before +program execution. Unless explicitly classified, an error is a static error. + +A `checked runtime error`:idx: is an error that the implementation detects +and reports at runtime. The method for reporting such errors is via *raising +exceptions*. However, the implementation provides a means to disable these +runtime checks. See the section pragmas_ for details. + +An `unchecked runtime error`:idx: is an error that is not guaranteed to be +detected, and can cause the subsequent behavior of the computation to +be arbitrary. Unchecked runtime errors cannot occur if only `safe`:idx: +language features are used. + + +Lexical Analysis +================ + +Encoding +-------- + +All Nimrod source files are in the UTF-8 encoding (or its ASCII subset). Other +encodings are not supported. Any of the standard platform line termination +sequences can be used - the Unix form using ASCII LF (linefeed), the Windows +form using the ASCII sequence CR LF (return followed by linefeed), or the old +Macintosh form using the ASCII CR (return) character. All of these forms can be +used equally, regardless of platform. + + +Indentation +----------- + +Nimrod's standard grammar describes an `indentation sensitive`:idx: language. +This means that all the control structures are recognized by indentation. +Indentation consists only of spaces; tabulators are not allowed. + +The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD`` +(same indentation) are generated by the scanner, denoting an indentation. + +These terminals are only generated for lines that are not empty or contain +only whitespace and comments. + +The parser and the scanner communicate over a stack which indentation terminal +should be generated: The stack consists of integers counting the spaces. The +stack is initialized with a zero on its top. The scanner reads from the stack: +If the current indentation token consists of more spaces than the entry at the +top of the stack, a ``IND`` token is generated, else if it consists of the same +number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces, +a ``DED`` token is generated for any item on the stack that is greater than the +current. These items are then popped from the stack by the scanner. At the end +of the file, a ``DED`` token is generated for each number remaining on the +stack that is larger than zero. + +Because the grammar contains some optional ``IND`` tokens, the scanner cannot +push new indentation levels. This has to be done by the parser. The symbol +``indPush`` indicates that an ``IND`` token is expected; the current number of +leading spaces is pushed onto the stack by the parser. + +Comments +-------- + +`Comments`:idx: start anywhere outside a string or character literal with the +hash character ``#``. +Comments consist of a concatenation of `comment pieces`:idx:. A comment piece +starts with ``#`` and runs until the end of the line. The end of line characters +belong to the piece. If the next line only consists of a comment piece which is +aligned to the preceding one, it does not start a new comment: + +.. code-block:: nimrod + + i = 0 # This is a single comment over multiple lines belonging to the + # assignment statement. The scanner merges these two pieces. + # This is a new comment belonging to the current block, but to no particular + # statement. + i = i + 1 # This a new comment that is NOT + echo(i) # continued here, because this comment refers to the echo statement + +Comments are tokens; they are only allowed at certain places in the input file +as they belong to the syntax tree! This feature enables perfect source-to-source +transformations (such as pretty-printing) and superior documentation generators. +A side-effect is that the human reader of the code always knows exactly which +code snippet the comment refers to. + + +Identifiers & Keywords +---------------------- + +`Identifiers`:idx: in Nimrod can be any string of letters, digits +and underscores, beginning with a letter. Two immediate following +underscores ``__`` are not allowed:: + + letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' + digit ::= '0'..'9' + IDENTIFIER ::= letter ( ['_'] letter | digit )* + +The following `keywords`:idx: are reserved and cannot be used as identifiers: + +.. code-block:: nimrod + :file: ../data/keywords.txt + +Some keywords are unused; they are reserved for future developments of the +language. + +Nimrod is a `style-insensitive`:idx: language. This means that it is not +case-sensitive and even underscores are ignored: +**type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind +this is that this allows programmers to use their own prefered spelling style +and libraries written by different programmers cannot use incompatible +conventions. The editors or IDE can show the identifiers as preferred. Another +advantage is that it frees the programmer from remembering the spelling of an +identifier. + + +Literal strings +--------------- + +`Literal strings`:idx: can be delimited by matching double quotes, and can +contain the following `escape sequences`:idx:\ : + +================== =================================================== + Escape sequence Meaning +================== =================================================== + ``\n`` `newline`:idx: + ``\r`` `carriage return`:idx: + ``\l`` `line feed`:idx: + ``\f`` `form feed`:idx: + ``\t`` `tabulator`:idx: + ``\v`` `vertical tabulator`:idx: + ``\\`` `backslash`:idx: + ``\"`` `quotation mark`:idx: + ``\'`` `apostrophe`:idx: + ``\d+`` `character with decimal value d`:idx:; + all decimal digits directly + following are used for the + character + ``\a`` `alert`:idx: + ``\b`` `backspace`:idx: + ``\e`` `escape`:idx: `[ESC]`:idx: + ``\xHH`` `character with hex value HH`:idx:; + exactly two hex digits are allowed +================== =================================================== + + +Strings in Nimrod may contain any 8-bit value, except embedded zeros +which are not allowed for compability with `C`:idx:. + +Literal strings can also be delimited by three double squotes +``"""`` ... ``"""``. +Literals in this form may run for several lines, may contain ``"`` and do not +interpret any escape sequences. +For convenience, when the opening ``"""`` is immediately +followed by a newline, the newline is not included in the string. +There are also `raw string literals` that are preceded with the letter ``r`` +(or ``R``) and are delimited by matching double quotes (just like ordinary +string literals) and do not interpret the escape sequences. This is especially +convenient for regular expressions or Windows paths: + +.. code-block:: nimrod + + var f = openFile(r"C:\texts\text.txt") # a raw string, so ``\t`` is no tab + + +Literal characters +------------------ + +Character literals are enclosed in single quotes ``''`` and can contain the +same escape sequences as strings - with one exception: ``\n`` is not allowed +as it may be wider than one character (often it is the pair CR/LF for example). +A character is not an Unicode character but a single byte. The reason for this +is efficiency: For the overwhelming majority of use-cases, the resulting +programs will still handle UTF-8 properly as UTF-8 was specially designed for +this. +Another reason is that Nimrod should support ``array[char, int]`` or +``set[char]`` efficiently as many algorithms rely on this feature. + + +Numerical constants +------------------- + +`Numerical constants`:idx: are of a single type and have the form:: + + hexdigit ::= digit | 'A'..'F' | 'a'..'f' + octdigit ::= '0'..'7' + bindigit ::= '0'..'1' + INT_LIT ::= digit ( ['_'] digit )* + | '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )* + | '0o' octdigit ( ['_'] octdigit )* + | '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )* + + INT8_LIT ::= INT_LIT '\'' ('i' | 'I' ) '8' + INT16_LIT ::= INT_LIT '\'' ('i' | 'I' ) '16' + INT32_LIT ::= INT_LIT '\'' ('i' | 'I' ) '32' + INT64_LIT ::= INT_LIT '\'' ('i' | 'I' ) '64' + + exponent ::= ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )* + FLOAT_LIT ::= digit (['_'] digit)* ('.' (['_'] digit)* [exponent] |exponent) + FLOAT32_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '32' + FLOAT64_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '64' + + +As can be seen in the productions, numerical constants can contain unterscores +for readability. Integer and floating point literals may be given in decimal (no +prefix), binary (prefix ``0b``), octal (prefix ``0o``) and +hexadecimal (prefix ``0x``) notation. + +There exists a literal for each numerical type that is +defined. The suffix starting with an apostophe ('\'') is called a +`type suffix`:idx:. Literals without a type prefix are of the type ``int``, +unless the literal contains a dot or an ``E`` in which case it is of +type ``float``. + +The following table specifies type suffixes: + +================= ========================= + Type Suffix Resulting type of literal +================= ========================= + ``'i8`` int8 + ``'i16`` int16 + ``'i32`` int32 + ``'i64`` int64 + ``'f32`` float32 + ``'f64`` float64 +================= ========================= + +Floating point literals may also be in binary, octal or hexadecimal +notation: +``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64`` +is approximately 1.72826e35 according to the IEEE floating point standard. + + + +Other tokens +------------ + +The following strings denote other tokens:: + + ( ) { } [ ] , ; [. .] {. .} (. .) + : = ^ .. ` + +`..`:tok: takes precedence over other tokens that contain a dot: `{..}`:tok: are +the three tokens `{`:tok:, `..`:tok:, `}`:tok: and not the two tokens +`{.`:tok:, `.}`:tok:. + +In Nimrod one can define his own operators. An `operator`:idx: is any +combination of the following characters that are not listed above:: + + + - * / < > + = @ $ ~ & % + ! ? ^ . | + +These keywords are also operators: +``and or not xor shl shr div mod in notin is isnot``. + + +Syntax +====== + +This section lists Nimrod's standard syntax in ENBF. How the parser receives +indentation tokens is already described in the Lexical Analysis section. + +Nimrod allows user-definable operators. +Binary operators have 8 different levels of precedence. For user-defined +operators, the precedence depends on the first character the operator consists +of. All binary operators are left-associative. + +================ ============================================== ================== =============== +Precedence level Operators First characters Terminal symbol +================ ============================================== ================== =============== + 7 (highest) ``$`` OP7 + 6 ``* / div mod shl shr %`` ``* % \ /`` OP6 + 5 ``+ -`` ``+ ~ |`` OP5 + 4 ``&`` ``&`` OP4 + 3 ``== <= < >= > != in not_in is isnot`` ``= < > !`` OP3 + 2 ``and`` OP2 + 1 ``or xor`` OP1 + 0 (lowest) ``? @ ^ ` : .`` OP0 +================ ============================================== ================== =============== + + +The grammar's start symbol is ``module``. The grammar is LL(1) and therefore +not ambigious. + +.. include:: grammar.txt + :literal: + + + +Semantics +========= + +Constants +--------- + +`Constants`:idx: are symbols which are bound to a value. The constant's value +cannot change. The compiler must be able to evaluate the expression in a +constant declaration at compile time. .. - Nimrod contains a sophisticated - compile-time evaluator, so procedures declared with the ``{.noSideEffect.}`` - pragma can be used in constant expressions: - - .. code-block:: nimrod - - from strutils import findSubStr - const - x = findSubStr('a', "hallo") # x is 1; this is computed at compile time! - - -Types ------ - -All expressions have a `type`:idx: which is known at compile time. Thus Nimrod -is statically typed. One can declare new types, which is in -essence defining an identifier that can be used to denote this custom type. - -These are the major type classes: - -* ordinal types (consist of integer, bool, character, enumeration - (and subranges thereof) types) -* floating point types -* string type -* structured types -* reference (pointer) type -* procedural type -* generic type - - -Ordinal types -~~~~~~~~~~~~~ -`Ordinal types`:idx: have the following characteristics: - -- Ordinal types are countable and ordered. This property allows - the operation of functions as ``Inc``, ``Ord``, ``Dec`` on ordinal types to - be defined. -- Ordinal values have a smallest possible value. Trying to count farther - down than the smallest value gives a checked runtime or static error. -- Ordinal values have a largest possible value. Trying to count farther - than the largest value gives a checked runtime or static error. - -Integers, bool, characters and enumeration types (and subrange of these -types) belong to ordinal types. - - -Pre-defined numerical types -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -These integer types are pre-defined: - -``int`` - the generic signed integer type; its size is platform dependant - (the compiler chooses the processor's fastest integer type) - this type should be used in general. An integer literal that has no type - suffix is of this type. - -intXX - additional signed integer types of XX bits use this naming scheme - (example: int16 is a 16 bit wide integer). - The current implementation supports ``int8``, ``int16``, ``int32``, ``int64``. - Literals of these types have the suffix 'iXX. - - -There are no `unsigned integer`:idx: types, only `unsigned operations`:idx: -that treat their arguments as unsigned. Unsigned operations all wrap around; -they may not lead to over- or underflow errors. Unsigned operations use the -``%`` postfix as convention: - -====================== ====================================================== -operation meaning -====================== ====================================================== -``a +% b`` unsigned integer addition -``a -% b`` unsigned integer substraction -``a *% b`` unsigned integer multiplication -``a /% b`` unsigned integer division -``a %% b`` unsigned integer modulo operation -``a <% b`` treat ``a`` and ``b`` as unsigned and compare -``a <=% b`` treat ``a`` and ``b`` as unsigned and compare -``ze(a)`` extends the bits of ``a`` with zeros until it has the - width of the ``int`` type -``toU8(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 8 bits (but still the - ``int8`` type) -``toU16(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 16 bits (but still the - ``int16`` type) -``toU32(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 32 bits (but still the - ``int32`` type) -====================== ====================================================== - -The following floating point types are pre-defined: - -``float`` - the generic floating point type; its size is platform dependant - (the compiler chooses the processor's fastest floating point type) - this type should be used in general - -floatXX - an implementation may define additional floating point types of XX bits using - this naming scheme (example: float64 is a 64 bit wide float). The current - implementation supports ``float32`` and ``float64``. Literals of these types - have the suffix 'fXX. - -`Automatic type conversion`:idx: in expressions where different kinds -of integer types are used is performed. However, if the type conversion -loses information, the `EInvalidValue`:idx: exception is raised. Certain cases -of the convert error are detected at compile time. - -Automatic type conversion in expressions with different kinds -of floating point types is performed: The smaller type is -converted to the larger. Arithmetic performed on floating point types -follows the IEEE standard. Only the ``int`` type is converted to a floating -point type automatically, other integer types are not. - - -Boolean type -~~~~~~~~~~~~ -The `boolean`:idx: type is named ``bool`` in Nimrod and can be one of the two -pre-defined values ``true`` and ``false``. Conditions in while, -if, elif, when statements need to be of type bool. - -This condition holds:: - - ord(false) == 0 and ord(true) == 1 - -The operators ``not, and, or, xor, implies, <, <=, >, >=, !=, ==`` are defined -for the bool type. The ``and`` and ``or`` operators perform short-cut -evaluation. Example: - -.. code-block:: nimrod - - while p != nil and p.name != "xyz": - # p.name is not evaluated if p == nil - p = p.next - - -The size of the bool type is one byte. - - -Character type -~~~~~~~~~~~~~~ -The `character type`:idx: is named ``char`` in Nimrod. Its size is one byte. -Thus it cannot represent an UTF-8 character, but a part of it. -The reason for this is efficiency: For the overwhelming majority of use-cases, -the resulting programs will still handle UTF-8 properly as UTF-8 was specially -designed for this. -Another reason is that Nimrod can support ``array[char, int]`` or -``set[char]`` efficiently as many algorithms rely on this feature. The -`TUniChar` type is used for Unicode characters, it can represent any Unicode -character. ``TUniChar`` is declared the ``unicode`` standard module. - - - -Enumeration types -~~~~~~~~~~~~~~~~~ -`Enumeration`:idx: types define a new type whose values consist only of the ones -specified. -The values are ordered by the order in enum's declaration. Example: - -.. code-block:: nimrod - - type - TDirection = enum - north, east, south, west - - -Now the following holds:: - - ord(north) == 0 - ord(east) == 1 - ord(south) == 2 - ord(west) == 3 - -Thus, north < east < south < west. The comparison operators can be used -with enumeration types. - -For better interfacing to other programming languages, the fields of enum -types can be assigned an explicit ordinal value. However, the ordinal values -have to be in ascending order. A field whose ordinal value that is not -explicitly given, is assigned the value of the previous field + 1. - -An explicit ordered enum can have *wholes*: - -.. code-block:: nimrod - type - TTokenType = enum - a = 2, b = 4, c = 89 # wholes are valid - -However, it is then not an ordinal anymore, so it is not possible to use these -enums as an index type for arrays. The procedures ``inc``, ``dec``, ``succ`` -and ``pred`` are not available for them either. - - -Subrange types -~~~~~~~~~~~~~~ -A `subrange`:idx: type is a range of values from an ordinal type (the host -type). To define a subrange type, one must specify it's limiting values: the -highest and lowest value of the type: - -.. code-block:: nimrod - type - TSubrange = range[0..5] - - -``TSubrange`` is a subrange of an integer which can only hold the values 0 -to 5. Assigning any other value to a variable of type ``TSubrange`` is a -checked runtime error (or static error if it can be statically -determined). Assignments from the base type to one of its subrange types -(and vice versa) are allowed. - -A subrange type has the same size as its base type (``int`` in the example). - - -String type -~~~~~~~~~~~ -All string literals are of the type `string`:idx:. A string in Nimrod is very -similar to a sequence of characters. However, strings in Nimrod both are -zero-terminated and have a length field. One can retrieve the length with the -builtin ``len`` procedure; the length never counts the terminating zero. -The assignment operator for strings always copies the string. - -Strings are compared by their lexicographical order. All comparison operators -are available. Strings can be indexed like arrays (lower bound is 0). Unlike -arrays, they can be used in case statements: - -.. code-block:: nimrod - - case paramStr(i) - of "-v": incl(options, optVerbose) - of "-h", "-?": incl(options, optHelp) - else: write(stdout, "invalid command line option!\n") - -Per convention, all strings are UTF-8 strings, but this is not enforced. For -example, when reading strings from binary files, they are merely a sequence of -bytes. The index operation ``s[i]`` means the i-th *char* of ``s``, not the -i-th *unichar*. The iterator ``unichars`` from the ``unicode`` standard -module can be used for iteration over all unicode characters. - - -Structured types -~~~~~~~~~~~~~~~~ -A variable of a `structured type`:idx: can hold multiple values at the same -time. Stuctured types can be nested to unlimited levels. Arrays, sequences, -records, objects and sets belong to the structured types. - -Array and sequence types -~~~~~~~~~~~~~~~~~~~~~~~~ -`Arrays`:idx: are a homogenous type, meaning that each element in the array -has the same type. Arrays always have a fixed length which is specified at -compile time (except for open arrays). They can be indexed by any ordinal type. -A parameter ``A`` may be an *open array*, in which case it is indexed by -integers from 0 to ``len(A)-1``. - -`Sequences`:idx: are similar to arrays but of dynamic length which may change -during runtime (like strings). A sequence ``S`` is always indexed by integers -from 0 to ``len(S)-1`` and its bounds are checked. Sequences can also be -constructed by the array constructor ``[]``. - -A sequence may be passed to a parameter that is of type *open array*, but -not to a multi-dimensional open array, because it is impossible to do so in an -efficient manner. - -An array expression may be constructed by the array constructor ``[]``. -A constructed array is assignment compatible to a sequence. - -Example: - -.. code-block:: nimrod - - type - TIntArray = array[0..5, int] # an array that is indexed with 0..5 - TIntSeq = seq[int] # a sequence of integers - var - x: TIntArray - y: TIntSeq - x = [1, 2, 3, 4, 5, 6] # [] this is the array constructor that is compatible - # with arrays, open arrays and - y = [1, 2, 3, 4, 5, 6] # sequences - -The lower bound of an array may be received by the built-in proc -``low()``, the higher bound by ``high()``. The length may be -received by ``len()``. - -Arrays are always bounds checked (at compile-time or at runtime). These -checks can be disabled via pragmas or invoking the compiler with the -``--bound_checks:off`` command line switch. - - -Tuples, record and object types -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A variable of a `record`:idx: or `object`:idx: type is a heterogenous storage -container. -A record or object defines various named *fields* of a type. The assignment -operator for records and objects always copies the whole record/object. The -constructor ``()`` can be used to initialize records/objects. A field may -be given a default value. Fields with default values do not have to be listed -in a record construction, all other fields have to be listed. - -.. code-block:: nimrod - - type - TPerson = record # type representing a person - name: string # a person consists of a name - age: int = 30 # and an age which default value is 30 - - var - person: TPerson - person = (name: "Peter") # person.age is its default value (30) - -The implementation aligns the fields for best access performance. The alignment -is done in a way that is compatible the way the C compiler does it. - -The difference between records and objects is that objects allow inheritance. -Objects have access to their type at runtime, so that the ``is`` operator -can be used to determine the object's type. Assignment from an object to its -parents' object leads to a static or runtime error (the -`EInvalidObjectAssignment`:idx: exception is raised). - -.. code-block:: nimrod - - type - TPerson = object - name: string - age: int - - TStudent = object of TPerson # a student is a person - id: int # with an id field - - var - student: TStudent - person: TPerson - student = (name: "Peter", age: 89, id: 3) - person = (name: "Mary", age: 17) - assert(student is TStudent) # is true - person = student # this is an error; person has no storage for id. - - -Set type -~~~~~~~~ -The `set type`:idx: models the mathematical notion of a set. The set's -basetype can only be an ordinal type. The reason is that sets are implemented -as bit vectors. Sets are designed for high performance computing. - -Note: The sets module can be used for sets of other types. - -Sets can be constructed via the set constructor: ``{}`` is the empty set. The -empty set is type combatible with any special set type. The constructor -can also be used to include elements (and ranges of elements) in the set: - -.. code-block:: nimrod - - {'a'..'z', '0'..'9'} # This constructs a set that conains the - # letters from 'a' to 'z' and the digits - # from '0' to '9' - -These operations are supported by sets: - -================== ======================================================== -operation meaning -================== ======================================================== -``A + B`` union of two sets -``A * B`` intersection of two sets -``A - B`` difference of two sets (A without B's elements) -``A == B`` set equality -``A <= B`` subset relation (A is subset of B or equal to B) -``A < B`` strong subset relation (A is a real subset of B) -``e in A`` set membership (A contains element e) -``A -+- B`` symmetric set difference (= (A - B) + (B - A)) -``card(A)`` the cardinality of A (number of elements in A) -``incl(A, elem)`` same as A = A + {elem}, but may be faster -``excl(A, elem)`` same as A = A - {elem}, but may be faster -================== ======================================================== - -Reference type -~~~~~~~~~~~~~~ -References (similiar to `pointers`:idx: in other programming languages) are a -way to introduce many-to-one relationships. This means different references can -point to and modify the same location in memory. References should be used -sparingly in a program. They are only needed for constructing graphs. - -Nimrod distinguishes between `traced`:idx: and `untraced`:idx: references. -Untraced references are also called *pointers*. The difference between them is -that traced references are garbage collected, untraced are not. Thus untraced -references are *unsafe*. However for certain low-level operations (accessing -the hardware) untraced references are unavoidable. - -Traced references are declared with the **ref** keyword, untraced references -are declared with the **ptr** keyword. - -The ``^`` operator can be used to derefer a reference, the ``addr`` procedure -returns the address of an item. An address is always an untraced reference. -Thus the usage of ``addr`` is an *unsafe* feature. - -The ``.`` (access a record field operator) and ``[]`` (array/string/sequence -index operator) operators perform implicit dereferencing operations for -reference types: - -.. code-block:: nimrod - - type - PNode = ref TNode - TNode = record - le, ri: PNode - data: int - - var - n: PNode - new(n) - n.data = 9 # no need to write n^.data - -To allocate a new traced object, the built-in procedure ``new`` has to be used. -To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and -``realloc`` can be used. The documentation of the system module contains -further information. - -Special care has to be taken if an untraced object contains traced objects like -traced references, strings or sequences: In order to free everything properly, -the built-in procedure ``finalize`` has to be called before freeing the -untraced memory manually! - -.. XXX finalizers for traced objects - -Procedural type -~~~~~~~~~~~~~~~ -A `procedural type`:idx: is internally a pointer to procedure. ``nil`` is -an allowed value for variables of a procedural type. Nimrod uses procedural -types to achieve `functional`:idx: programming techniques. Dynamic dispatch -for OOP constructs can also be implemented with procedural types. - -Example: - -.. code-block:: nimrod - - type - TCallback = proc (x: int) {.cdecl.} - - proc printItem(x: Int) = ... - - proc forEach(c: TCallback) = - ... - - forEach(printItem) # this will NOT work because calling conventions differ - -A subtle issue with procedural types is that the calling convention of the -procedure influences the type compability: Procedural types are only compatible -if they have the same calling convention. - -Nimrod supports these `calling conventions`:idx:, which are all incompatible to -each other: - -`stdcall`:idx: - This the stdcall convention as specified by Microsoft. The generated C - procedure is declared with the ``__stdcall`` keyword. - -`cdecl`:idx: - The cdecl convention means that a procedure shall use the same convention - as the C compiler. Under windows the generated C procedure is declared with - the ``__cdecl`` keyword. - -`safecall`:idx: - This is the safecall convention as specified by Microsoft. The generated C - procedure is declared with the ``__safecall`` keyword. The word *safe* - refers to the fact that all hardware registers shall be pushed to the - hardware stack. - -`inline`:idx: - The inline convention means the the caller should not call the procedure, - but inline its code directly. Note that Nimrod does not inline, but leaves - this to the C compiler. Thus it generates ``__inline`` procedures. This is - only a hint for the compiler: It may completely ignore it and - it may inline procedures that are not marked as ``inline``. - -`fastcall`:idx: - Fastcall means different things to different C compilers. One gets whatever - the C ``__fastcall`` means. - -`nimcall`:idx: - Nimcall is the default convention used for Nimrod procedures. It is the - same as ``fastcall``, but only for C compilers that support ``fastcall``. - -`closure`:idx: - indicates that the procedure expects a context, a closure that needs - to be passed to the procedure. The implementation is the - same as ``cdecl``, but with a hidden pointer parameter (the - *closure*). The hidden parameter is always the last one. - -`syscall`:idx: - The syscall convention is the same as ``__syscall`` in C. It is used for - interrupts. - -`noconv`:idx: - The generated C code will not have any explicit calling convention and thus - use the C compiler's default calling convention. This is needed because - Nimrod's default calling convention for procedures is ``fastcall`` to - improve speed. This is unlikely to be needed by the user. - -Most calling conventions exist only for the Windows 32-bit platform. - - - -Statements ----------- -Nimrod uses the common statement/expression paradigma: `Statements`:idx: do not -produce a value in contrast to expressions. Call expressions are statements. -If the called procedure returns a value, it is not a valid statement -as statements do not produce values. To evaluate an expression for -side-effects and throwing its value away, one can use the ``discard`` -statement. - -Statements are separated into `simple statements`:idx: and -`complex statements`:idx:. -Simple statements are statements that cannot contain other statements, like -assignments, calls or the ``return`` statement; complex statements can -contain other statements. To avoid the `dangling else problem`:idx:, complex -statements always have to be intended:: - - simpleStmt ::= returnStmt - | yieldStmt - | discardStmt - | raiseStmt - | breakStmt - | continueStmt - | pragma - | importStmt - | fromStmt - | includeStmt - | exprStmt - complexStmt ::= ifStmt | whileStmt | caseStmt | tryStmt | forStmt - | blockStmt | asmStmt - | procDecl | iteratorDecl | macroDecl | templateDecl - | constDecl | typeDecl | whenStmt | varStmt - - - -Discard statement -~~~~~~~~~~~~~~~~~ - -Syntax:: - - discardStmt ::= DISCARD expr - -Example: - -.. code-block:: nimrod - - discard proc_call("arg1", "arg2") # discard the return value of `proc_call` - -The `discard`:idx: statement evaluates its expression for side-effects and -throws the expression's resulting value away. If the expression has no -side-effects, this generates a static error. Ignoring the return value of a -procedure without using a discard statement is not allowed. - - -Var statement -~~~~~~~~~~~~~ - -Syntax:: - - colonOrEquals ::= COLON typeDesc [EQUALS expr] | EQUALS expr - varPart ::= (symbol ["*" | "-"] [pragma] optComma)+ colonOrEquals [COMMENT] - varStmt ::= VAR (varPart | indPush varPart (SAD varPart)* DED) - -`Var`:idx: statements declare new local and global variables and -initialize them. A comma seperated list of variables can be used to specify -variables of the same type: - -.. code-block:: nimrod - - var - a: int = 0 - x, y, z: int - -If an initializer is given the type can be omitted: The variable is of the -same type as the initializing expression. Variables are always initialized -with a default value if there is no initializing expression. The default -value depends on the type and is always a zero in binary. - -============================ ============================================== -Type default value -============================ ============================================== -any integer type 0 -any float 0.0 -char '\0' -bool false -ref or pointer type nil -procedural type nil -sequence nil -string nil (**not** "") -tuple[A, B, ...] (default(A), default(B), ...) - (analogous for objects and records) -array[0..., T] [default(T), ...] -range[T] default(T); this may be out of the valid range -T = enum cast[T](0); this may be an invalid value -============================ ============================================== - - -Const section -~~~~~~~~~~~~~ - -Syntax:: - - colonAndEquals ::= [COLON typeDesc] EQUALS expr - constDecl ::= CONST - indPush - symbol ["*"] [pragma] colonAndEquals - (SAD symbol ["*"] [pragma] colonAndEquals)* - DED - -Example: - -.. code-block:: nimrod - - const - MyFilename = "/home/my/file.txt" - debugMode: bool = false - -The `const`:idx: section declares symbolic constants. A symbolic constant is -a name for a constant expression. Symbolic constants only allow read-access. - - -If statement -~~~~~~~~~~~~ - -Syntax:: - - ifStmt ::= IF expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] - -Example: - -.. code-block:: nimrod - - var name = readLine(stdin) - - if name == "Andreas": - echo("What a nice name!") - elif name == "": - echo("Don't you have a name?") - else: - echo("Boring name...") - -The `if`:idx: statement is a simple way to make a branch in the control flow: -The expression after the keyword ``if`` is evaluated, if it is true -the corresponding statements after the ``:`` are executed. Otherwise -the expression after the ``elif`` is evaluated (if there is an -``elif`` branch), if it is true the corresponding statements after -the ``:`` are executed. This goes on until the last ``elif``. If all -conditions fail, the ``else`` part is executed. If there is no ``else`` -part, execution continues with the statement after the ``if`` statement. - - -Case statement -~~~~~~~~~~~~~~ - -Syntax:: - - caseStmt ::= CASE expr (OF sliceList COLON stmt)* - (ELIF expr COLON stmt)* - [ELSE COLON stmt] - -Example: - -.. code-block:: nimrod - - case readline(stdin) - of "delete-everything", "restart-computer": - echo("permission denied") - of "go-for-a-walk": echo("please yourself") - else: echo("unknown command") - -The `case`:idx: statement is similar to the if statement, but it represents -a multi-branch selection. The expression after the keyword ``case`` is -evaluated and if its value is in a *vallist* the corresponding statements -(after the ``of`` keyword) are executed. If the value is no given *vallist* -the ``else`` part is executed. If there is no ``else`` part and not all -possible values that ``expr`` can hold occur in a ``vallist``, a static -error is given. This holds only for expressions of ordinal types. -If the expression is not of an ordinal type, and no ``else`` part is -given, control just passes after the ``case`` statement. - -To suppress the static error in the ordinal case the programmer needs -to write an ``else`` part with a ``nil`` statement. - - -When statement -~~~~~~~~~~~~~~ - -Syntax:: - - whenStmt ::= WHEN expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] - -Example: - -.. code-block:: nimrod - - when sizeof(int) == 2: - echo("running on a 16 bit system!") - elif sizeof(int) == 4: - echo("running on a 32 bit system!") - elif sizeof(int) == 8: - echo("running on a 64 bit system!") - else: - echo("cannot happen!") - -The `when`:idx: statement is almost identical to the ``if`` statement with some -exceptions: - -* Each ``expr`` has to be a constant expression (of type ``bool``). -* The statements do not open a new scope if they introduce new identifiers. -* The statements that belong to the expression that evaluated to true are - translated by the compiler, the other statements are not checked for - syntax or semantics at all! This holds also for any ``expr`` coming - after the expression that evaluated to true. - -The ``when`` statement enables conditional compilation techniques. As -a special syntatic extension, the ``when`` construct is also available -within ``record`` or ``object`` definitions. - - -Raise statement -~~~~~~~~~~~~~~~ - -Syntax:: - - raiseStmt ::= RAISE [expr] - -Example: - -.. code-block:: nimrod - raise EOS("operating system failed") - -Apart from built-in operations like array indexing, memory allocation, etc. -the ``raise`` statement is the only way to raise an exception. The -identifier has to be the name of a previously declared exception. A -comma followed by an expression may follow; the expression must be of type -``string`` or ``cstring``; this is an error message that can be extracted -with the `getCurrentExceptionMsg`:idx: procedure in the module ``system``. - -If no exception name is given, the current exception is `re-raised`:idx:. The -`ENoExceptionToReraise`:idx: exception is raised if there is no exception to -re-raise. It follows that the ``raise`` statement *always* raises an -exception. - - -Try statement -~~~~~~~~~~~~~ - -Syntax:: - - exceptList ::= (qualifiedIdent optComma)* - tryStmt ::= TRY COLON stmt - (EXCEPT exceptList COLON stmt)* - [FINALLY COLON stmt] - -Example: - -.. code-block:: nimrod - # read the first two lines of a text file that should contain numbers - # and tries to add them - var - f: TFile - if openFile(f, "numbers.txt"): - try: - var a = readLine(f) - var b = readLine(f) - echo("sum: " & $(parseInt(a) + parseInt(b))) - except EOverflow: - echo("overflow!") - except EValue: - echo("could not convert string to integer") - except EIO: - echo("IO error!") - finally: - closeFile(f) - -The statements after the `try`:idx: are executed in sequential order unless -an exception ``e`` is raised. If the exception type of ``e`` matches any -of the list ``exceptlist`` the corresponding statements are executed. -The statements following the ``except`` clauses are called -`exception handlers`:idx:. - -The empty `except`:idx: clause is executed if there is an exception that is -in no list. It is similiar to an ``else`` clause in ``if`` statements. - -If there is a `finally`:idx: clause, it is always executed after the -exception handlers. - -The exception is *consumed* in an exception handler. However, an -exception handler may raise another exception. If the exception is not -handled, it is propagated through the call stack. This means that often -the rest of the procedure - that is not within a ``finally`` clause - -is not executed (if an exception occurs). - - -Return statement -~~~~~~~~~~~~~~~~ - -Syntax:: - - returnStmt ::= RETURN [expr] - -Example: - -.. code-block:: nimrod - return 40+2 - -The `return`:idx: statement ends the execution of the current procedure. -It is only allowed in procedures. If there is an ``expr``, this is syntactic -sugar for: - -.. code-block:: nimrod - result = expr - return - -The `result`:idx: variable is always the return value of the procedure. It is -automatically declared by the compiler. - - -Yield statement -~~~~~~~~~~~~~~~ - -Syntax:: - - yieldStmt ::= YIELD expr - -Example: - -.. code-block:: nimrod - yield (1, 2, 3) - -The `yield`:idx: statement is used instead of the ``return`` statement in -iterators. It is only valid in iterators. Execution is returned to the body -of the for loop that called the iterator. Yield does not end the iteration -process, but execution is passed back to the iterator if the next iteration -starts. See the section about iterators (`Iterators and the for statement`_) -for further information. - - -Block statement -~~~~~~~~~~~~~~~ - -Syntax:: - - blockStmt ::= BLOCK [symbol] COLON stmt - -Example: - -.. code-block:: nimrod - var found = false - block myblock: - for i in 0..3: - for j in 0..3: - if a[j][i] == 7: - found = true - break myblock # leave the block, in this case both for-loops - echo(found) - -The block statement is a means to group statements to a (named) `block`:idx:. -Inside the block, the ``break`` statement is allowed to leave the block -immediately. A ``break`` statement can contain a name of a surrounding -block to specify which block is to leave. - - -Break statement -~~~~~~~~~~~~~~~ - -Syntax:: - - breakStmt ::= BREAK [symbol] - -Example: - -.. code-block:: nimrod - break - -The `break`:idx: statement is used to leave a block immediately. If ``symbol`` -is given, it is the name of the enclosing block that is to leave. If it is -absent, the innermost block is leaved. - - -While statement -~~~~~~~~~~~~~~~ - -Syntax:: - - whileStmt ::= WHILE expr COLON stmt - -Example: - -.. code-block:: nimrod - echo("Please tell me your password: \n") - var pw = readLine(stdin) - while pw != "12345": - echo("Wrong password! Next try: \n") - pw = readLine(stdin) - - -The `while`:idx: statement is executed until the ``expr`` evaluates to false. -Endless loops are no error. ``while`` statements open an `implicit block`, -so that they can be leaved with a ``break`` statement. - - -Continue statement -~~~~~~~~~~~~~~~~~~ - -Syntax:: - - continueStmt ::= CONTINUE - -A `continue`:idx: statement leads to the immediate next iteration of the -surrounding loop construct. It is only allowed within a loop. A continue -statement is syntactic sugar for a nested block: - -.. code-block:: nimrod - while expr1: - stmt1 - continue - stmt2 - - # is equivalent to: - while expr1: - block myBlockName: - stmt1 - break myBlockName - stmt2 - - -Assembler statement -~~~~~~~~~~~~~~~~~~~ -Syntax:: - - asmStmt ::= ASM [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT) - -The direct embedding of `assembler`:idx: code into Nimrod code is supported -by the unsafe ``asm`` statement. Identifiers in the assembler code that refer to -Nimrod identifiers shall be enclosed in a special character which can be -specified in the statement's pragmas. The default special character is ``'`'``. - - -Procedures -~~~~~~~~~~ -What most programming languages call `methods`:idx: or `funtions`:idx: are -called `procedures`:idx: in Nimrod (which is the correct terminology). A -procedure declaration defines an identifier and associates it with a block -of code. A procedure may call itself recursively. The syntax is:: - - paramList ::= [PAR_LE ((symbol optComma)+ COLON typeDesc optComma)* PAR_RI] - [COLON typeDesc] - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - - procDecl ::= PROC symbol ["*"] [genericParams] paramList [pragma] - [EQUALS stmt] - -If the ``EQUALS stms`` part is missing, it is a `forward`:idx: declaration. If -the proc returns a value, the procedure body can access an implicit declared -variable named `result`:idx: that represents the return value. Procs can be -overloaded. The overloading resolution algorithm tries to find the proc that is -the best match for the arguments. A parameter may be given a default value that -is used if the caller does not provide a value for this parameter. Example: - -.. code-block:: nimrod - - proc toLower(c: Char): Char = # toLower for characters - if c in {'A'..'Z'}: - result = chr(ord(c) + (ord('a') - ord('A'))) - else: - result = c - - proc toLower(s: string): string = # toLower for strings - result = newString(len(s)) - for i in 0..len(s) - 1: - result[i] = toLower(s[i]) # calls toLower for characters; no recursion! - -`Operators`:idx: are procedures with a special operator symbol as identifier: - -.. code-block:: nimrod - proc `$` (x: int): string = # converts an integer to a string; - # since it has one parameter this is a prefix - # operator. With two parameters it would be - # an infix operator. - return intToStr(x) - -Calling a procedure can be done in many different ways: - -.. code-block:: nimrod - proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ... - - # call with positional arguments# parameter bindings: - callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true) - # call with named and positional arguments: - callme(y=1, x=0, "abd", '\t') # (x=0, y=1, s="abd", c='\t', b=false) - # call with named arguments (order is not relevant): - callme(c='\t', y=1, x=0) # (x=0, y=1, s="", c='\t', b=false) - # call as a command statement: no () or , needed: - callme 0 1 "abc" '\t' - - -Iterators and the for statement -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Syntax:: - - forStmt ::= FOR (symbol optComma)+ IN expr [DOTDOT expr] COLON stmt - - paramList ::= [PAR_LE ((symbol optComma)+ COLON typeDesc optComma)* PAR_RI] - [COLON typeDesc] - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - - iteratorDecl ::= ITERATOR symbol ["*"] [genericParams] paramList [pragma] - [EQUALS stmt] - -The `for`:idx: statement is an abstract mechanism to iterate over the elements -of a container. It relies on an `iterator`:idx: to do so. Like ``while`` -statements, ``for`` statements open an `implicit block`:idx:, so that they -can be leaved with a ``break`` statement. The ``for`` loop declares -iteration variables (``x`` in the example) - their scope reaches until the -end of the loop body. The iteration variables' types are inferred by the -return type of the iterator. - -An iterator is similar to a procedure, except that it is always called in the -context of a ``for`` loop. Iterators provide a way to specify the iteration over -an abstract type. A key role in the execution of a ``for`` loop plays the -``yield`` statement in the called iterator. Whenever a ``yield`` statement is -reached the data is bound to the ``for`` loop variables and control continues -in the body of the ``for`` loop. The iterator's local variables and execution -state are automatically saved between calls. Example: - -.. code-block:: nimrod - # this definition exists in the system module - iterator items*(a: string): char {.inline.} = - var i = 0 - while i < len(a): - yield a[i] - inc(i) - - for ch in items("hello world"): # `ch` is an iteration variable - echo(ch) - -The compiler generates code as if the programmer would have written this: - -.. code-block:: nimrod - var i = 0 - while i < len(a): - var ch = a[i] - echo(ch) - inc(i) - -The current implementation always inlines the iterator code leading to zero -overhead for the abstraction. But this may increase the code size. Later -versions of the compiler will only inline iterators which have the calling -convention ``inline``. - -If the iterator yields a tuple, there have to be as many iteration variables -as there are components in the tuple. The i'th iteration variable's type is -the one of the i'th component. - - -Type sections -~~~~~~~~~~~~~ - -Syntax:: - - typeDef ::= typeDesc | recordDef | objectDef | enumDef - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - - typeDecl ::= TYPE - indPush - symbol ["*"] [genericParams] [EQUALS typeDef] - (SAD symbol ["*"] [genericParams] [EQUALS typeDef])* - DED - -Example: - -.. code-block:: nimrod - type # example demonstrates mutually recursive types - PNode = ref TNode # a traced pointer to a TNode - TNode = record - le, ri: PNode # left and right subtrees - sym: ref TSym # leaves contain a reference to a TSym - - TSym = record # a symbol - name: string # the symbol's name - line: int # the line the symbol was declared in - code: PNode # the symbol's abstract syntax tree - -A `type`:idx: section begins with the ``type`` keyword. It contains multiple -type definitions. A type definition binds a type to a name. Type definitions -can be recursive or even mutually recursive. Mutually Recursive types are only -possible within a single ``type`` section. - - -Generics -~~~~~~~~ - -Example: - -.. code-block:: nimrod - type - TBinaryTree[T] = record # TBinaryTree is a generic type with - # with generic param ``T`` - le, ri: ref TBinaryTree[T] # left and right subtrees; may be nil - data: T # the data stored in a node - PBinaryTree[T] = ref TBinaryTree[T] # a shorthand for notational convenience - - proc newNode[T](data: T): PBinaryTree[T] = # constructor for a node - new(result) - result.dat = data - - proc add[T](root: var PBinaryTree[T], n: PBinaryTree[T]) = - if root == nil: - root = n - else: - var it = root - while it != nil: - var c = cmp(it.data, n.data) # compare the data items; uses - # the generic ``cmd`` proc that works for - # any type that has a ``==`` and ``<`` - # operator - if c < 0: - if it.le == nil: - it.le = n - return - it = it.le - else: - if it.ri == nil: - it.ri = n - return - it = it.ri - - iterator inorder[T](root: PBinaryTree[T]): T = - # inorder traversal of a binary tree - # recursive iterators are not yet implemented, so this does not work in - # the current compiler! - if root.le != nil: - yield inorder(root.le) - yield root.data - if root.ri != nil: - yield inorder(root.ri) - - var - root: PBinaryTree[string] # instantiate a PBinaryTree with the type string - add(root, newNode("hallo")) # instantiates generic procs ``newNode`` and - add(root, newNode("world")) # ``add`` - for str in inorder(root): - writeln(stdout, str) - -`Generics`:idx: are Nimrod's means to parametrize procs, iterators or types with -`type parameters`:idx:. Depending on context, the brackets are used either to -introduce type parameters or to instantiate a generic proc, iterator or type. - - -Templates -~~~~~~~~~ - -A `template`:idx: is a simple form of a macro. It operates on parse trees and is -processed in the semantic pass of the compiler. So they integrate well with the -rest of the language and share none of C's preprocessor macros flaws. However, -they may lead to code that is harder to understand and maintain. So one ought -to use them sparingly. The usage of ordinary procs, iterators or generics is -preferred to the usage of templates. - -Example: - -.. code-block:: nimrod - template `!=` (a, b: expr): expr = - # this definition exists in the System module - not (a == b) - - writeln(5 != 6) # the compiler rewrites that to: writeln(not (5 == 6)) - - -Macros -~~~~~~ - -`Macros`:idx: are the most powerful feature of Nimrod. They should be used -only to implement `domain specific languages`:idx:. They may lead to code -that is harder to understand and maintain. So one ought to use them sparingly. -The usage of ordinary procs, iterators or generics is preferred to the usage of -macros. - - -Modules -------- -Nimrod supports splitting a program into pieces by a `module`:idx: concept. -Modules make separate compilation possible. Each module needs to be in its -own file. Modules enable `information hiding`:idx: and -`separate compilation`:idx:. A module may gain access to symbols of another -module by the `import`:idx: statement. `Recursive module dependancies`:idx: are -allowed, but slightly subtle. - -The algorithm for compiling modules is: - -- Compile the whole module as usual, following import statements recursively -- if we have a cycle only import the already parsed symbols (that are - exported); if an unknown identifier occurs then abort - -This is best illustrated by an example: - -.. code-block:: nimrod - # Module A - type - T1* = int - import B # the compiler starts parsing B - - proc main() = - var i = p(3) # works because B has been parsed completely here - - main() - - - # Module B - import A # A is not parsed here! Only the already known symbols - # of A are imported here. - - proc p*(x: A.T1): A.T1 # this works because the compiler has already - # added T1 to A's interface symbol table - - proc p(x: A.T1): A.T1 = return x + 1 - - -Scope rules ------------ -Identifiers are valid from the point of their declaration until the end of -the block in which the declaration occurred. The range where the identifier -is known is the `scope`:idx: of the identifier. The exact scope of an -identifier depends on the way it was declared. - -Block scope -~~~~~~~~~~~ -The *scope* of a variable declared in the declaration part of a block -is valid from the point of declaration until the end of the block. If a -block contains a second block, in which the identifier is redeclared, -then inside this block, the second declaration will be valid. Upon -leaving the inner block, the first declaration is valid again. An -identifier cannot be redefined in the same block, except if valid for -procedure or iterator overloading purposes. - - -Record or object scope -~~~~~~~~~~~~~~~~~~~~~~ -The field identifiers inside a record or object definition are valid in the -following places: - -* To the end of the record definition -* Field designators of a variable of the given record type. -* In all descendent types of the object type. - -Module scope -~~~~~~~~~~~~ -All identifiers in the interface part of a module are valid from the point of -declaration, until the end of the module. Furthermore, the identifiers are -known in other modules that import the module. Identifiers from indirectly -dependent modules are *not* available. The `system`:idx: module is automatically -imported in all other modules. - -If a module imports an identifier by two different modules, -each occurance of the identifier has to be qualified, unless it is an -overloaded procedure or iterator in which case the overloading -resolution takes place: - -.. code-block:: nimrod - # Module A - var x*: string - - # Module B - var x*: int - - # Module C - import A, B - write(stdout, x) # error: x is ambigious - write(sdtout, A.x) # no error: qualifier used - - var x = 4 - write(stdout, x) # not ambigious: uses the module C's x - - -Messages -======== - -The Nimrod compiler emits different kinds of messages: `hint`:idx:, -`warning`:idx:, and `error`:idx: messages. An *error* message is emitted if -the compiler encounters any static error. - -Pragmas -======= - -Syntax:: - - pragma ::= CURLYDOT_LE (expr [COLON expr] optComma)+ (CURLYDOT_RI | CURLY_RI) - -Pragmas are Nimrod's method to give the compiler additional information/ -commands without introducing a massive number of new keywords. Pragmas are -processed on the fly during parsing. Pragmas are always enclosed in the -special ``{.`` and ``.}`` curly brackets. - - -define pragma -------------- -The `define`:idx: pragma defines a conditional symbol. This symbol may only be -used in other pragmas and in the ``defined`` expression and not in ordinary -Nimrod source code. The conditional symbols go into a special symbol table. -The compiler defines the target processor and the target operating -system as conditional symbols. - - -undef pragma ------------- -The `undef`:idx: pragma the counterpart to the define pragma. It undefines a -conditional symbol. - - -error pragma ------------- -The `error`:idx: pragma is used to make the compiler output an error message -with the given content. Compilation currently aborts after an error, but this -may be changed in later versions. - - -fatal pragma ------------- -The `fatal`:idx: pragma is used to make the compiler output an error message -with the given content. In contrast to the ``error`` pragma, compilation -is guaranteed to be aborted by this pragma. - -warning pragma --------------- -The `warning`:idx: pragma is used to make the compiler output a warning message -with the given content. Compilation continues after the warning. - -hint pragma ------------ -The `hint`:idx: pragma is used to make the compiler output a hint message with -the given content. Compilation continues after the hint. - - -compilation option pragmas --------------------------- -The listed pragmas here can be used to override the code generation options -for a section of code. -:: - - "{." pragma: val {pragma: val} ".}" - - -The implementation currently provides the following possible options (later -various others may be added). - -=============== =============== ============================================ -pragma allowed values description -=============== =============== ============================================ -checks on|off Turns the code generation for all runtime - checks on or off. -bound_checks on|off Turns the code generation for array bound - checks on or off. -overflow_checks on|off Turns the code generation for over- or - underflow checks on or off. -nil_checks on|off Turns the code generation for nil pointer - checks on or off. -assertions on|off Turns the code generation for assertions - on or off. -warnings on|off Turns the warning messages of the compiler - on or off. -hints on|off Turns the hint messages of the compiler - on or off. -optimization none|speed|size Optimize the code for speed or size, or - disable optimization. For non-optimizing - compilers this option has no effect. - Neverless they must parse it properly. -callconv cdecl|... Specifies the default calling convention for - all procedures (and procedure types) that - follow. -=============== =============== ============================================ - -Example: - -.. code-block:: nimrod - {.checks: off, optimization: speed.} - # compile without runtime checks and optimize for speed - - -push and pop pragmas --------------------- -The `push/pop`:idx: pragmas are very similar to the option directive, -but are used to override the settings temporarily. Example: - -.. code-block:: nimrod - {.push checks: off.} - # compile this section without runtime checks as it is - # speed critical - # ... some code ... - {.pop.} # restore old settings + Nimrod contains a sophisticated + compile-time evaluator, so procedures declared with the ``{.noSideEffect.}`` + pragma can be used in constant expressions: + + .. code-block:: nimrod + + from strutils import findSubStr + const + x = findSubStr('a', "hallo") # x is 1; this is computed at compile time! + + +Types +----- + +All expressions have a `type`:idx: which is known at compile time. Thus Nimrod +is statically typed. One can declare new types, which is in +essence defining an identifier that can be used to denote this custom type. + +These are the major type classes: + +* ordinal types (consist of integer, bool, character, enumeration + (and subranges thereof) types) +* floating point types +* string type +* structured types +* reference (pointer) type +* procedural type +* generic type + + +Ordinal types +~~~~~~~~~~~~~ +`Ordinal types`:idx: have the following characteristics: + +- Ordinal types are countable and ordered. This property allows + the operation of functions as ``Inc``, ``Ord``, ``Dec`` on ordinal types to + be defined. +- Ordinal values have a smallest possible value. Trying to count farther + down than the smallest value gives a checked runtime or static error. +- Ordinal values have a largest possible value. Trying to count farther + than the largest value gives a checked runtime or static error. + +Integers, bool, characters and enumeration types (and subrange of these +types) belong to ordinal types. + + +Pre-defined numerical types +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +These integer types are pre-defined: + +``int`` + the generic signed integer type; its size is platform dependant + (the compiler chooses the processor's fastest integer type) + this type should be used in general. An integer literal that has no type + suffix is of this type. + +intXX + additional signed integer types of XX bits use this naming scheme + (example: int16 is a 16 bit wide integer). + The current implementation supports ``int8``, ``int16``, ``int32``, ``int64``. + Literals of these types have the suffix 'iXX. + + +There are no `unsigned integer`:idx: types, only `unsigned operations`:idx: +that treat their arguments as unsigned. Unsigned operations all wrap around; +they may not lead to over- or underflow errors. Unsigned operations use the +``%`` postfix as convention: + +====================== ====================================================== +operation meaning +====================== ====================================================== +``a +% b`` unsigned integer addition +``a -% b`` unsigned integer substraction +``a *% b`` unsigned integer multiplication +``a /% b`` unsigned integer division +``a %% b`` unsigned integer modulo operation +``a <% b`` treat ``a`` and ``b`` as unsigned and compare +``a <=% b`` treat ``a`` and ``b`` as unsigned and compare +``ze(a)`` extends the bits of ``a`` with zeros until it has the + width of the ``int`` type +``toU8(a)`` treats ``a`` as unsigned and converts it to an + unsigned integer of 8 bits (but still the + ``int8`` type) +``toU16(a)`` treats ``a`` as unsigned and converts it to an + unsigned integer of 16 bits (but still the + ``int16`` type) +``toU32(a)`` treats ``a`` as unsigned and converts it to an + unsigned integer of 32 bits (but still the + ``int32`` type) +====================== ====================================================== + +The following floating point types are pre-defined: + +``float`` + the generic floating point type; its size is platform dependant + (the compiler chooses the processor's fastest floating point type) + this type should be used in general + +floatXX + an implementation may define additional floating point types of XX bits using + this naming scheme (example: float64 is a 64 bit wide float). The current + implementation supports ``float32`` and ``float64``. Literals of these types + have the suffix 'fXX. + +`Automatic type conversion`:idx: in expressions where different kinds +of integer types are used is performed. However, if the type conversion +loses information, the `EInvalidValue`:idx: exception is raised. Certain cases +of the convert error are detected at compile time. + +Automatic type conversion in expressions with different kinds +of floating point types is performed: The smaller type is +converted to the larger. Arithmetic performed on floating point types +follows the IEEE standard. Only the ``int`` type is converted to a floating +point type automatically, other integer types are not. + + +Boolean type +~~~~~~~~~~~~ +The `boolean`:idx: type is named ``bool`` in Nimrod and can be one of the two +pre-defined values ``true`` and ``false``. Conditions in while, +if, elif, when statements need to be of type bool. + +This condition holds:: + + ord(false) == 0 and ord(true) == 1 + +The operators ``not, and, or, xor, implies, <, <=, >, >=, !=, ==`` are defined +for the bool type. The ``and`` and ``or`` operators perform short-cut +evaluation. Example: + +.. code-block:: nimrod + + while p != nil and p.name != "xyz": + # p.name is not evaluated if p == nil + p = p.next + + +The size of the bool type is one byte. + + +Character type +~~~~~~~~~~~~~~ +The `character type`:idx: is named ``char`` in Nimrod. Its size is one byte. +Thus it cannot represent an UTF-8 character, but a part of it. +The reason for this is efficiency: For the overwhelming majority of use-cases, +the resulting programs will still handle UTF-8 properly as UTF-8 was specially +designed for this. +Another reason is that Nimrod can support ``array[char, int]`` or +``set[char]`` efficiently as many algorithms rely on this feature. The +`TUniChar` type is used for Unicode characters, it can represent any Unicode +character. ``TUniChar`` is declared the ``unicode`` standard module. + + + +Enumeration types +~~~~~~~~~~~~~~~~~ +`Enumeration`:idx: types define a new type whose values consist only of the ones +specified. +The values are ordered by the order in enum's declaration. Example: + +.. code-block:: nimrod + + type + TDirection = enum + north, east, south, west + + +Now the following holds:: + + ord(north) == 0 + ord(east) == 1 + ord(south) == 2 + ord(west) == 3 + +Thus, north < east < south < west. The comparison operators can be used +with enumeration types. + +For better interfacing to other programming languages, the fields of enum +types can be assigned an explicit ordinal value. However, the ordinal values +have to be in ascending order. A field whose ordinal value that is not +explicitly given, is assigned the value of the previous field + 1. + +An explicit ordered enum can have *wholes*: + +.. code-block:: nimrod + type + TTokenType = enum + a = 2, b = 4, c = 89 # wholes are valid + +However, it is then not an ordinal anymore, so it is not possible to use these +enums as an index type for arrays. The procedures ``inc``, ``dec``, ``succ`` +and ``pred`` are not available for them either. + + +Subrange types +~~~~~~~~~~~~~~ +A `subrange`:idx: type is a range of values from an ordinal type (the host +type). To define a subrange type, one must specify it's limiting values: the +highest and lowest value of the type: + +.. code-block:: nimrod + type + TSubrange = range[0..5] + + +``TSubrange`` is a subrange of an integer which can only hold the values 0 +to 5. Assigning any other value to a variable of type ``TSubrange`` is a +checked runtime error (or static error if it can be statically +determined). Assignments from the base type to one of its subrange types +(and vice versa) are allowed. + +A subrange type has the same size as its base type (``int`` in the example). + + +String type +~~~~~~~~~~~ +All string literals are of the type `string`:idx:. A string in Nimrod is very +similar to a sequence of characters. However, strings in Nimrod both are +zero-terminated and have a length field. One can retrieve the length with the +builtin ``len`` procedure; the length never counts the terminating zero. +The assignment operator for strings always copies the string. + +Strings are compared by their lexicographical order. All comparison operators +are available. Strings can be indexed like arrays (lower bound is 0). Unlike +arrays, they can be used in case statements: + +.. code-block:: nimrod + + case paramStr(i) + of "-v": incl(options, optVerbose) + of "-h", "-?": incl(options, optHelp) + else: write(stdout, "invalid command line option!\n") + +Per convention, all strings are UTF-8 strings, but this is not enforced. For +example, when reading strings from binary files, they are merely a sequence of +bytes. The index operation ``s[i]`` means the i-th *char* of ``s``, not the +i-th *unichar*. The iterator ``unichars`` from the ``unicode`` standard +module can be used for iteration over all unicode characters. + + +Structured types +~~~~~~~~~~~~~~~~ +A variable of a `structured type`:idx: can hold multiple values at the same +time. Stuctured types can be nested to unlimited levels. Arrays, sequences, +tuples, objects and sets belong to the structured types. + +Array and sequence types +~~~~~~~~~~~~~~~~~~~~~~~~ +`Arrays`:idx: are a homogenous type, meaning that each element in the array +has the same type. Arrays always have a fixed length which is specified at +compile time (except for open arrays). They can be indexed by any ordinal type. +A parameter ``A`` may be an *open array*, in which case it is indexed by +integers from 0 to ``len(A)-1``. + +`Sequences`:idx: are similar to arrays but of dynamic length which may change +during runtime (like strings). A sequence ``S`` is always indexed by integers +from 0 to ``len(S)-1`` and its bounds are checked. Sequences can also be +constructed by the array constructor ``[]``. + +A sequence may be passed to a parameter that is of type *open array*, but +not to a multi-dimensional open array, because it is impossible to do so in an +efficient manner. + +An array expression may be constructed by the array constructor ``[]``. +A constructed array is assignment compatible to a sequence. + +Example: + +.. code-block:: nimrod + + type + TIntArray = array[0..5, int] # an array that is indexed with 0..5 + TIntSeq = seq[int] # a sequence of integers + var + x: TIntArray + y: TIntSeq + x = [1, 2, 3, 4, 5, 6] # [] this is the array constructor that is compatible + # with arrays, open arrays and + y = [1, 2, 3, 4, 5, 6] # sequences + +The lower bound of an array may be received by the built-in proc +``low()``, the higher bound by ``high()``. The length may be +received by ``len()``. + +Arrays are always bounds checked (at compile-time or at runtime). These +checks can be disabled via pragmas or invoking the compiler with the +``--bound_checks:off`` command line switch. + + +Tuples and object types +~~~~~~~~~~~~~~~~~~~~~~~ +A variable of a `tuple`:idx: or `object`:idx: type is a heterogenous storage +container. +A tuple or object defines various named *fields* of a type. A tuple defines an +*order* of the fields additionally. Tuples are meant for heterogenous storage +types with no overhead and few abstraction possibilities. The constructor ``()`` +can be used to construct tuples. The order of the fields in the constructor +must match the order of the tuple's definition. Different tuple-types are +*equivalent* if they specify the same fields of the same type in the same +order. + +The assignment operator for tuples copies each component. +The default assignment operator for objects is not defined. The programmer may +provide one, however. + +.. code-block:: nimrod + + type + TPerson = tuple[name: string, age: int] # type representing a person + # a person consists of a name + # and an age + var + person: TPerson + person = (name: "Peter", age: 30) + # the same, but less readable: + person = ("Peter", 30) + +The implementation aligns the fields for best access performance. The alignment +is done in a way that is compatible the way the C compiler does it. + +Objects provide many features that tuples do not. Object provide inheritance +and information hiding. Objects have access to their type at runtime, so that +the ``is`` operator can be used to determine the object's type. + +.. code-block:: nimrod + + type + TPerson = object + name*: string # the * means that `name` is accessible from the outside + age: int # no * means that the field is hidden + + TStudent = object of TPerson # a student is a person + id: int # with an id field + + var + student: TStudent + person: TPerson + assert(student is TStudent) # is true + +Object fields that should be visible outside from the defining module, have to +marked by ``*``. In contrast to tuples, different object types are +never *equivalent*. + + +Set type +~~~~~~~~ +The `set type`:idx: models the mathematical notion of a set. The set's +basetype can only be an ordinal type. The reason is that sets are implemented +as bit vectors. Sets are designed for high performance computing. + +Note: The sets module can be used for sets of other types. + +Sets can be constructed via the set constructor: ``{}`` is the empty set. The +empty set is type combatible with any special set type. The constructor +can also be used to include elements (and ranges of elements) in the set: + +.. code-block:: nimrod + + {'a'..'z', '0'..'9'} # This constructs a set that conains the + # letters from 'a' to 'z' and the digits + # from '0' to '9' + +These operations are supported by sets: + +================== ======================================================== +operation meaning +================== ======================================================== +``A + B`` union of two sets +``A * B`` intersection of two sets +``A - B`` difference of two sets (A without B's elements) +``A == B`` set equality +``A <= B`` subset relation (A is subset of B or equal to B) +``A < B`` strong subset relation (A is a real subset of B) +``e in A`` set membership (A contains element e) +``A -+- B`` symmetric set difference (= (A - B) + (B - A)) +``card(A)`` the cardinality of A (number of elements in A) +``incl(A, elem)`` same as A = A + {elem}, but may be faster +``excl(A, elem)`` same as A = A - {elem}, but may be faster +================== ======================================================== + +Reference type +~~~~~~~~~~~~~~ +References (similiar to `pointers`:idx: in other programming languages) are a +way to introduce many-to-one relationships. This means different references can +point to and modify the same location in memory. References should be used +sparingly in a program. They are only needed for constructing graphs. + +Nimrod distinguishes between `traced`:idx: and `untraced`:idx: references. +Untraced references are also called *pointers*. The difference between them is +that traced references are garbage collected, untraced are not. Thus untraced +references are *unsafe*. However for certain low-level operations (accessing +the hardware) untraced references are unavoidable. + +Traced references are declared with the **ref** keyword, untraced references +are declared with the **ptr** keyword. + +The ``^`` operator can be used to derefer a reference, the ``addr`` procedure +returns the address of an item. An address is always an untraced reference. +Thus the usage of ``addr`` is an *unsafe* feature. + +The ``.`` (access a tuple/object field operator) +and ``[]`` (array/string/sequence index operator) operators perform implicit +dereferencing operations for reference types: + +.. code-block:: nimrod + + type + PNode = ref TNode + TNode = object + le, ri: PNode + data: int + + var + n: PNode + new(n) + n.data = 9 # no need to write n^.data + +To allocate a new traced object, the built-in procedure ``new`` has to be used. +To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and +``realloc`` can be used. The documentation of the system module contains +further information. + +Special care has to be taken if an untraced object contains traced objects like +traced references, strings or sequences: In order to free everything properly, +the built-in procedure ``finalize`` has to be called before freeing the +untraced memory manually! + +.. XXX finalizers for traced objects + +Procedural type +~~~~~~~~~~~~~~~ +A `procedural type`:idx: is internally a pointer to procedure. ``nil`` is +an allowed value for variables of a procedural type. Nimrod uses procedural +types to achieve `functional`:idx: programming techniques. Dynamic dispatch +for OOP constructs can also be implemented with procedural types. + +Example: + +.. code-block:: nimrod + + type + TCallback = proc (x: int) {.cdecl.} + + proc printItem(x: Int) = ... + + proc forEach(c: TCallback) = + ... + + forEach(printItem) # this will NOT work because calling conventions differ + +A subtle issue with procedural types is that the calling convention of the +procedure influences the type compability: Procedural types are only compatible +if they have the same calling convention. + +Nimrod supports these `calling conventions`:idx:, which are all incompatible to +each other: + +`stdcall`:idx: + This the stdcall convention as specified by Microsoft. The generated C + procedure is declared with the ``__stdcall`` keyword. + +`cdecl`:idx: + The cdecl convention means that a procedure shall use the same convention + as the C compiler. Under windows the generated C procedure is declared with + the ``__cdecl`` keyword. + +`safecall`:idx: + This is the safecall convention as specified by Microsoft. The generated C + procedure is declared with the ``__safecall`` keyword. The word *safe* + refers to the fact that all hardware registers shall be pushed to the + hardware stack. + +`inline`:idx: + The inline convention means the the caller should not call the procedure, + but inline its code directly. Note that Nimrod does not inline, but leaves + this to the C compiler. Thus it generates ``__inline`` procedures. This is + only a hint for the compiler: It may completely ignore it and + it may inline procedures that are not marked as ``inline``. + +`fastcall`:idx: + Fastcall means different things to different C compilers. One gets whatever + the C ``__fastcall`` means. + +`nimcall`:idx: + Nimcall is the default convention used for Nimrod procedures. It is the + same as ``fastcall``, but only for C compilers that support ``fastcall``. + +`closure`:idx: + indicates that the procedure expects a context, a closure that needs + to be passed to the procedure. The implementation is the + same as ``cdecl``, but with a hidden pointer parameter (the + *closure*). The hidden parameter is always the last one. + +`syscall`:idx: + The syscall convention is the same as ``__syscall`` in C. It is used for + interrupts. + +`noconv`:idx: + The generated C code will not have any explicit calling convention and thus + use the C compiler's default calling convention. This is needed because + Nimrod's default calling convention for procedures is ``fastcall`` to + improve speed. This is unlikely to be needed by the user. + +Most calling conventions exist only for the Windows 32-bit platform. + + + +Statements +---------- +Nimrod uses the common statement/expression paradigma: `Statements`:idx: do not +produce a value in contrast to expressions. Call expressions are statements. +If the called procedure returns a value, it is not a valid statement +as statements do not produce values. To evaluate an expression for +side-effects and throwing its value away, one can use the ``discard`` +statement. + +Statements are separated into `simple statements`:idx: and +`complex statements`:idx:. +Simple statements are statements that cannot contain other statements, like +assignments, calls or the ``return`` statement; complex statements can +contain other statements. To avoid the `dangling else problem`:idx:, complex +statements always have to be intended:: + + simpleStmt ::= returnStmt + | yieldStmt + | discardStmt + | raiseStmt + | breakStmt + | continueStmt + | pragma + | importStmt + | fromStmt + | includeStmt + | exprStmt + complexStmt ::= ifStmt | whileStmt | caseStmt | tryStmt | forStmt + | blockStmt | asmStmt + | procDecl | iteratorDecl | macroDecl | templateDecl + | constDecl | typeDecl | whenStmt | varStmt + + + +Discard statement +~~~~~~~~~~~~~~~~~ + +Syntax:: + + discardStmt ::= DISCARD expr + +Example: + +.. code-block:: nimrod + + discard proc_call("arg1", "arg2") # discard the return value of `proc_call` + +The `discard`:idx: statement evaluates its expression for side-effects and +throws the expression's resulting value away. If the expression has no +side-effects, this generates a static error. Ignoring the return value of a +procedure without using a discard statement is not allowed. + + +Var statement +~~~~~~~~~~~~~ + +Syntax:: + + colonOrEquals ::= COLON typeDesc [EQUALS expr] | EQUALS expr + varPart ::= (symbol ["*" | "-"] [pragma] optComma)+ colonOrEquals [COMMENT] + varStmt ::= VAR (varPart | indPush varPart (SAD varPart)* DED) + +`Var`:idx: statements declare new local and global variables and +initialize them. A comma seperated list of variables can be used to specify +variables of the same type: + +.. code-block:: nimrod + + var + a: int = 0 + x, y, z: int + +If an initializer is given the type can be omitted: The variable is of the +same type as the initializing expression. Variables are always initialized +with a default value if there is no initializing expression. The default +value depends on the type and is always a zero in binary. + +============================ ============================================== +Type default value +============================ ============================================== +any integer type 0 +any float 0.0 +char '\0' +bool false +ref or pointer type nil +procedural type nil +sequence nil +string nil (**not** "") +tuple[x: A, y: B, ...] (default(A), default(B), ...) + (analogous for objects) +array[0..., T] [default(T), ...] +range[T] default(T); this may be out of the valid range +T = enum cast[T](0); this may be an invalid value +============================ ============================================== + + +Const section +~~~~~~~~~~~~~ + +Syntax:: + + colonAndEquals ::= [COLON typeDesc] EQUALS expr + constDecl ::= CONST + indPush + symbol ["*"] [pragma] colonAndEquals + (SAD symbol ["*"] [pragma] colonAndEquals)* + DED + +Example: + +.. code-block:: nimrod + + const + MyFilename = "/home/my/file.txt" + debugMode: bool = false + +The `const`:idx: section declares symbolic constants. A symbolic constant is +a name for a constant expression. Symbolic constants only allow read-access. + + +If statement +~~~~~~~~~~~~ + +Syntax:: + + ifStmt ::= IF expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] + +Example: + +.. code-block:: nimrod + + var name = readLine(stdin) + + if name == "Andreas": + echo("What a nice name!") + elif name == "": + echo("Don't you have a name?") + else: + echo("Boring name...") + +The `if`:idx: statement is a simple way to make a branch in the control flow: +The expression after the keyword ``if`` is evaluated, if it is true +the corresponding statements after the ``:`` are executed. Otherwise +the expression after the ``elif`` is evaluated (if there is an +``elif`` branch), if it is true the corresponding statements after +the ``:`` are executed. This goes on until the last ``elif``. If all +conditions fail, the ``else`` part is executed. If there is no ``else`` +part, execution continues with the statement after the ``if`` statement. + + +Case statement +~~~~~~~~~~~~~~ + +Syntax:: + + caseStmt ::= CASE expr (OF sliceList COLON stmt)* + (ELIF expr COLON stmt)* + [ELSE COLON stmt] + +Example: + +.. code-block:: nimrod + + case readline(stdin) + of "delete-everything", "restart-computer": + echo("permission denied") + of "go-for-a-walk": echo("please yourself") + else: echo("unknown command") + +The `case`:idx: statement is similar to the if statement, but it represents +a multi-branch selection. The expression after the keyword ``case`` is +evaluated and if its value is in a *vallist* the corresponding statements +(after the ``of`` keyword) are executed. If the value is no given *vallist* +the ``else`` part is executed. If there is no ``else`` part and not all +possible values that ``expr`` can hold occur in a ``vallist``, a static +error is given. This holds only for expressions of ordinal types. +If the expression is not of an ordinal type, and no ``else`` part is +given, control just passes after the ``case`` statement. + +To suppress the static error in the ordinal case the programmer needs +to write an ``else`` part with a ``nil`` statement. + + +When statement +~~~~~~~~~~~~~~ + +Syntax:: + + whenStmt ::= WHEN expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] + +Example: + +.. code-block:: nimrod + + when sizeof(int) == 2: + echo("running on a 16 bit system!") + elif sizeof(int) == 4: + echo("running on a 32 bit system!") + elif sizeof(int) == 8: + echo("running on a 64 bit system!") + else: + echo("cannot happen!") + +The `when`:idx: statement is almost identical to the ``if`` statement with some +exceptions: + +* Each ``expr`` has to be a constant expression (of type ``bool``). +* The statements do not open a new scope if they introduce new identifiers. +* The statements that belong to the expression that evaluated to true are + translated by the compiler, the other statements are not checked for + syntax or semantics at all! This holds also for any ``expr`` coming + after the expression that evaluated to true. + +The ``when`` statement enables conditional compilation techniques. As +a special syntatic extension, the ``when`` construct is also available +within ``object`` definitions. + + +Raise statement +~~~~~~~~~~~~~~~ + +Syntax:: + + raiseStmt ::= RAISE [expr] + +Example: + +.. code-block:: nimrod + raise EOS("operating system failed") + +Apart from built-in operations like array indexing, memory allocation, etc. +the ``raise`` statement is the only way to raise an exception. The +identifier has to be the name of a previously declared exception. A +comma followed by an expression may follow; the expression must be of type +``string`` or ``cstring``; this is an error message that can be extracted +with the `getCurrentExceptionMsg`:idx: procedure in the module ``system``. + +If no exception name is given, the current exception is `re-raised`:idx:. The +`ENoExceptionToReraise`:idx: exception is raised if there is no exception to +re-raise. It follows that the ``raise`` statement *always* raises an +exception. + + +Try statement +~~~~~~~~~~~~~ + +Syntax:: + + exceptList ::= (qualifiedIdent optComma)* + tryStmt ::= TRY COLON stmt + (EXCEPT exceptList COLON stmt)* + [FINALLY COLON stmt] + +Example: + +.. code-block:: nimrod + # read the first two lines of a text file that should contain numbers + # and tries to add them + var + f: TFile + if openFile(f, "numbers.txt"): + try: + var a = readLine(f) + var b = readLine(f) + echo("sum: " & $(parseInt(a) + parseInt(b))) + except EOverflow: + echo("overflow!") + except EInvalidValue: + echo("could not convert string to integer") + except EIO: + echo("IO error!") + finally: + closeFile(f) + +The statements after the `try`:idx: are executed in sequential order unless +an exception ``e`` is raised. If the exception type of ``e`` matches any +of the list ``exceptlist`` the corresponding statements are executed. +The statements following the ``except`` clauses are called +`exception handlers`:idx:. + +The empty `except`:idx: clause is executed if there is an exception that is +in no list. It is similiar to an ``else`` clause in ``if`` statements. + +If there is a `finally`:idx: clause, it is always executed after the +exception handlers. + +The exception is *consumed* in an exception handler. However, an +exception handler may raise another exception. If the exception is not +handled, it is propagated through the call stack. This means that often +the rest of the procedure - that is not within a ``finally`` clause - +is not executed (if an exception occurs). + + +Return statement +~~~~~~~~~~~~~~~~ + +Syntax:: + + returnStmt ::= RETURN [expr] + +Example: + +.. code-block:: nimrod + return 40+2 + +The `return`:idx: statement ends the execution of the current procedure. +It is only allowed in procedures. If there is an ``expr``, this is syntactic +sugar for: + +.. code-block:: nimrod + result = expr + return + +The `result`:idx: variable is always the return value of the procedure. It is +automatically declared by the compiler. + + +Yield statement +~~~~~~~~~~~~~~~ + +Syntax:: + + yieldStmt ::= YIELD expr + +Example: + +.. code-block:: nimrod + yield (1, 2, 3) + +The `yield`:idx: statement is used instead of the ``return`` statement in +iterators. It is only valid in iterators. Execution is returned to the body +of the for loop that called the iterator. Yield does not end the iteration +process, but execution is passed back to the iterator if the next iteration +starts. See the section about iterators (`Iterators and the for statement`_) +for further information. + + +Block statement +~~~~~~~~~~~~~~~ + +Syntax:: + + blockStmt ::= BLOCK [symbol] COLON stmt + +Example: + +.. code-block:: nimrod + var found = false + block myblock: + for i in 0..3: + for j in 0..3: + if a[j][i] == 7: + found = true + break myblock # leave the block, in this case both for-loops + echo(found) + +The block statement is a means to group statements to a (named) `block`:idx:. +Inside the block, the ``break`` statement is allowed to leave the block +immediately. A ``break`` statement can contain a name of a surrounding +block to specify which block is to leave. + + +Break statement +~~~~~~~~~~~~~~~ + +Syntax:: + + breakStmt ::= BREAK [symbol] + +Example: + +.. code-block:: nimrod + break + +The `break`:idx: statement is used to leave a block immediately. If ``symbol`` +is given, it is the name of the enclosing block that is to leave. If it is +absent, the innermost block is leaved. + + +While statement +~~~~~~~~~~~~~~~ + +Syntax:: + + whileStmt ::= WHILE expr COLON stmt + +Example: + +.. code-block:: nimrod + echo("Please tell me your password: \n") + var pw = readLine(stdin) + while pw != "12345": + echo("Wrong password! Next try: \n") + pw = readLine(stdin) + + +The `while`:idx: statement is executed until the ``expr`` evaluates to false. +Endless loops are no error. ``while`` statements open an `implicit block`, +so that they can be leaved with a ``break`` statement. + + +Continue statement +~~~~~~~~~~~~~~~~~~ + +Syntax:: + + continueStmt ::= CONTINUE + +A `continue`:idx: statement leads to the immediate next iteration of the +surrounding loop construct. It is only allowed within a loop. A continue +statement is syntactic sugar for a nested block: + +.. code-block:: nimrod + while expr1: + stmt1 + continue + stmt2 + + # is equivalent to: + while expr1: + block myBlockName: + stmt1 + break myBlockName + stmt2 + + +Assembler statement +~~~~~~~~~~~~~~~~~~~ +Syntax:: + + asmStmt ::= ASM [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT) + +The direct embedding of `assembler`:idx: code into Nimrod code is supported +by the unsafe ``asm`` statement. Identifiers in the assembler code that refer to +Nimrod identifiers shall be enclosed in a special character which can be +specified in the statement's pragmas. The default special character is ``'`'``. + + +Procedures +~~~~~~~~~~ +What most programming languages call `methods`:idx: or `funtions`:idx: are +called `procedures`:idx: in Nimrod (which is the correct terminology). A +procedure declaration defines an identifier and associates it with a block +of code. A procedure may call itself recursively. The syntax is:: + + paramList ::= [PAR_LE ((symbol optComma)+ COLON typeDesc optComma)* PAR_RI] + [COLON typeDesc] + genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI + + procDecl ::= PROC symbol ["*"] [genericParams] paramList [pragma] + [EQUALS stmt] + +If the ``EQUALS stms`` part is missing, it is a `forward`:idx: declaration. If +the proc returns a value, the procedure body can access an implicit declared +variable named `result`:idx: that represents the return value. Procs can be +overloaded. The overloading resolution algorithm tries to find the proc that is +the best match for the arguments. A parameter may be given a default value that +is used if the caller does not provide a value for this parameter. Example: + +.. code-block:: nimrod + + proc toLower(c: Char): Char = # toLower for characters + if c in {'A'..'Z'}: + result = chr(ord(c) + (ord('a') - ord('A'))) + else: + result = c + + proc toLower(s: string): string = # toLower for strings + result = newString(len(s)) + for i in 0..len(s) - 1: + result[i] = toLower(s[i]) # calls toLower for characters; no recursion! + +`Operators`:idx: are procedures with a special operator symbol as identifier: + +.. code-block:: nimrod + proc `$` (x: int): string = # converts an integer to a string; + # since it has one parameter this is a prefix + # operator. With two parameters it would be + # an infix operator. + return intToStr(x) + +Calling a procedure can be done in many different ways: + +.. code-block:: nimrod + proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ... + + # call with positional arguments# parameter bindings: + callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true) + # call with named and positional arguments: + callme(y=1, x=0, "abd", '\t') # (x=0, y=1, s="abd", c='\t', b=false) + # call with named arguments (order is not relevant): + callme(c='\t', y=1, x=0) # (x=0, y=1, s="", c='\t', b=false) + # call as a command statement: no () or , needed: + callme 0 1 "abc" '\t' + + +Iterators and the for statement +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Syntax:: + + forStmt ::= FOR (symbol optComma)+ IN expr [DOTDOT expr] COLON stmt + + paramList ::= [PAR_LE ((symbol optComma)+ COLON typeDesc optComma)* PAR_RI] + [COLON typeDesc] + genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI + + iteratorDecl ::= ITERATOR symbol ["*"] [genericParams] paramList [pragma] + [EQUALS stmt] + +The `for`:idx: statement is an abstract mechanism to iterate over the elements +of a container. It relies on an `iterator`:idx: to do so. Like ``while`` +statements, ``for`` statements open an `implicit block`:idx:, so that they +can be leaved with a ``break`` statement. The ``for`` loop declares +iteration variables (``x`` in the example) - their scope reaches until the +end of the loop body. The iteration variables' types are inferred by the +return type of the iterator. + +An iterator is similar to a procedure, except that it is always called in the +context of a ``for`` loop. Iterators provide a way to specify the iteration over +an abstract type. A key role in the execution of a ``for`` loop plays the +``yield`` statement in the called iterator. Whenever a ``yield`` statement is +reached the data is bound to the ``for`` loop variables and control continues +in the body of the ``for`` loop. The iterator's local variables and execution +state are automatically saved between calls. Example: + +.. code-block:: nimrod + # this definition exists in the system module + iterator items*(a: string): char {.inline.} = + var i = 0 + while i < len(a): + yield a[i] + inc(i) + + for ch in items("hello world"): # `ch` is an iteration variable + echo(ch) + +The compiler generates code as if the programmer would have written this: + +.. code-block:: nimrod + var i = 0 + while i < len(a): + var ch = a[i] + echo(ch) + inc(i) + +The current implementation always inlines the iterator code leading to zero +overhead for the abstraction. But this may increase the code size. Later +versions of the compiler will only inline iterators which have the calling +convention ``inline``. + +If the iterator yields a tuple, there have to be as many iteration variables +as there are components in the tuple. The i'th iteration variable's type is +the one of the i'th component. + + +Type sections +~~~~~~~~~~~~~ + +Syntax:: + + typeDef ::= typeDesc | objectDef | enumDef + genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI + + typeDecl ::= TYPE + indPush + symbol ["*"] [genericParams] [EQUALS typeDef] + (SAD symbol ["*"] [genericParams] [EQUALS typeDef])* + DED + +Example: + +.. code-block:: nimrod + type # example demonstrates mutually recursive types + PNode = ref TNode # a traced pointer to a TNode + TNode = object + le, ri: PNode # left and right subtrees + sym: ref TSym # leaves contain a reference to a TSym + + TSym = object # a symbol + name: string # the symbol's name + line: int # the line the symbol was declared in + code: PNode # the symbol's abstract syntax tree + +A `type`:idx: section begins with the ``type`` keyword. It contains multiple +type definitions. A type definition binds a type to a name. Type definitions +can be recursive or even mutually recursive. Mutually Recursive types are only +possible within a single ``type`` section. + + +Generics +~~~~~~~~ + +Example: + +.. code-block:: nimrod + type + TBinaryTree[T] = object # TBinaryTree is a generic type with + # with generic param ``T`` + le, ri: ref TBinaryTree[T] # left and right subtrees; may be nil + data: T # the data stored in a node + PBinaryTree[T] = ref TBinaryTree[T] # a shorthand for notational convenience + + proc newNode[T](data: T): PBinaryTree[T] = # constructor for a node + new(result) + result.dat = data + + proc add[T](root: var PBinaryTree[T], n: PBinaryTree[T]) = + if root == nil: + root = n + else: + var it = root + while it != nil: + var c = cmp(it.data, n.data) # compare the data items; uses + # the generic ``cmd`` proc that works for + # any type that has a ``==`` and ``<`` + # operator + if c < 0: + if it.le == nil: + it.le = n + return + it = it.le + else: + if it.ri == nil: + it.ri = n + return + it = it.ri + + iterator inorder[T](root: PBinaryTree[T]): T = + # inorder traversal of a binary tree + # recursive iterators are not yet implemented, so this does not work in + # the current compiler! + if root.le != nil: + yield inorder(root.le) + yield root.data + if root.ri != nil: + yield inorder(root.ri) + + var + root: PBinaryTree[string] # instantiate a PBinaryTree with the type string + add(root, newNode("hallo")) # instantiates generic procs ``newNode`` and + add(root, newNode("world")) # ``add`` + for str in inorder(root): + writeln(stdout, str) + +`Generics`:idx: are Nimrod's means to parametrize procs, iterators or types with +`type parameters`:idx:. Depending on context, the brackets are used either to +introduce type parameters or to instantiate a generic proc, iterator or type. + + +Templates +~~~~~~~~~ + +A `template`:idx: is a simple form of a macro. It operates on parse trees and is +processed in the semantic pass of the compiler. So they integrate well with the +rest of the language and share none of C's preprocessor macros flaws. However, +they may lead to code that is harder to understand and maintain. So one ought +to use them sparingly. The usage of ordinary procs, iterators or generics is +preferred to the usage of templates. + +Example: + +.. code-block:: nimrod + template `!=` (a, b: expr): expr = + # this definition exists in the System module + not (a == b) + + writeln(5 != 6) # the compiler rewrites that to: writeln(not (5 == 6)) + + +Macros +~~~~~~ + +`Macros`:idx: are the most powerful feature of Nimrod. They should be used +only to implement `domain specific languages`:idx:. They may lead to code +that is harder to understand and maintain. So one ought to use them sparingly. +The usage of ordinary procs, iterators or generics is preferred to the usage of +macros. + + +Modules +------- +Nimrod supports splitting a program into pieces by a `module`:idx: concept. +Modules make separate compilation possible. Each module needs to be in its +own file. Modules enable `information hiding`:idx: and +`separate compilation`:idx:. A module may gain access to symbols of another +module by the `import`:idx: statement. `Recursive module dependancies`:idx: are +allowed, but slightly subtle. Only top-level symbols that are marked with an +asterisk (``*``) are exported. + +The algorithm for compiling modules is: + +- Compile the whole module as usual, following import statements recursively +- if we have a cycle only import the already parsed symbols (that are + exported); if an unknown identifier occurs then abort + +This is best illustrated by an example: + +.. code-block:: nimrod + # Module A + type + T1* = int + import B # the compiler starts parsing B + + proc main() = + var i = p(3) # works because B has been parsed completely here + + main() + + + # Module B + import A # A is not parsed here! Only the already known symbols + # of A are imported here. + + proc p*(x: A.T1): A.T1 # this works because the compiler has already + # added T1 to A's interface symbol table + + proc p(x: A.T1): A.T1 = return x + 1 + + +Scope rules +----------- +Identifiers are valid from the point of their declaration until the end of +the block in which the declaration occurred. The range where the identifier +is known is the `scope`:idx: of the identifier. The exact scope of an +identifier depends on the way it was declared. + +Block scope +~~~~~~~~~~~ +The *scope* of a variable declared in the declaration part of a block +is valid from the point of declaration until the end of the block. If a +block contains a second block, in which the identifier is redeclared, +then inside this block, the second declaration will be valid. Upon +leaving the inner block, the first declaration is valid again. An +identifier cannot be redefined in the same block, except if valid for +procedure or iterator overloading purposes. + + +Tuple or object scope +~~~~~~~~~~~~~~~~~~~~~~ +The field identifiers inside a tuple or object definition are valid in the +following places: + +* To the end of the tuple/object definition +* Field designators of a variable of the given tuple/object type. +* In all descendent types of the object type. + +Module scope +~~~~~~~~~~~~ +All identifiers in the interface part of a module are valid from the point of +declaration, until the end of the module. Furthermore, the identifiers are +known in other modules that import the module. Identifiers from indirectly +dependent modules are *not* available. The `system`:idx: module is automatically +imported in all other modules. + +If a module imports an identifier by two different modules, +each occurance of the identifier has to be qualified, unless it is an +overloaded procedure or iterator in which case the overloading +resolution takes place: + +.. code-block:: nimrod + # Module A + var x*: string + + # Module B + var x*: int + + # Module C + import A, B + write(stdout, x) # error: x is ambigious + write(sdtout, A.x) # no error: qualifier used + + var x = 4 + write(stdout, x) # not ambigious: uses the module C's x + + +Messages +======== + +The Nimrod compiler emits different kinds of messages: `hint`:idx:, +`warning`:idx:, and `error`:idx: messages. An *error* message is emitted if +the compiler encounters any static error. + +Pragmas +======= + +Syntax:: + + pragma ::= CURLYDOT_LE (expr [COLON expr] optComma)+ (CURLYDOT_RI | CURLY_RI) + +Pragmas are Nimrod's method to give the compiler additional information/ +commands without introducing a massive number of new keywords. Pragmas are +processed on the fly during parsing. Pragmas are always enclosed in the +special ``{.`` and ``.}`` curly brackets. + + +define pragma +------------- +The `define`:idx: pragma defines a conditional symbol. This symbol may only be +used in other pragmas and in the ``defined`` expression and not in ordinary +Nimrod source code. The conditional symbols go into a special symbol table. +The compiler defines the target processor and the target operating +system as conditional symbols. + + +undef pragma +------------ +The `undef`:idx: pragma the counterpart to the define pragma. It undefines a +conditional symbol. + + +error pragma +------------ +The `error`:idx: pragma is used to make the compiler output an error message +with the given content. Compilation currently aborts after an error, but this +may be changed in later versions. + + +fatal pragma +------------ +The `fatal`:idx: pragma is used to make the compiler output an error message +with the given content. In contrast to the ``error`` pragma, compilation +is guaranteed to be aborted by this pragma. + +warning pragma +-------------- +The `warning`:idx: pragma is used to make the compiler output a warning message +with the given content. Compilation continues after the warning. + +hint pragma +----------- +The `hint`:idx: pragma is used to make the compiler output a hint message with +the given content. Compilation continues after the hint. + + +compilation option pragmas +-------------------------- +The listed pragmas here can be used to override the code generation options +for a section of code. +:: + + "{." pragma: val {pragma: val} ".}" + + +The implementation currently provides the following possible options (later +various others may be added). + +=============== =============== ============================================ +pragma allowed values description +=============== =============== ============================================ +checks on|off Turns the code generation for all runtime + checks on or off. +bound_checks on|off Turns the code generation for array bound + checks on or off. +overflow_checks on|off Turns the code generation for over- or + underflow checks on or off. +nil_checks on|off Turns the code generation for nil pointer + checks on or off. +assertions on|off Turns the code generation for assertions + on or off. +warnings on|off Turns the warning messages of the compiler + on or off. +hints on|off Turns the hint messages of the compiler + on or off. +optimization none|speed|size Optimize the code for speed or size, or + disable optimization. For non-optimizing + compilers this option has no effect. + Neverless they must parse it properly. +callconv cdecl|... Specifies the default calling convention for + all procedures (and procedure types) that + follow. +=============== =============== ============================================ + +Example: + +.. code-block:: nimrod + {.checks: off, optimization: speed.} + # compile without runtime checks and optimize for speed + + +push and pop pragmas +-------------------- +The `push/pop`:idx: pragmas are very similar to the option directive, +but are used to override the settings temporarily. Example: + +.. code-block:: nimrod + {.push checks: off.} + # compile this section without runtime checks as it is + # speed critical + # ... some code ... + {.pop.} # restore old settings |