diff options
Diffstat (limited to 'doc/manual.txt')
-rwxr-xr-x | doc/manual.txt | 2358 |
1 files changed, 0 insertions, 2358 deletions
diff --git a/doc/manual.txt b/doc/manual.txt deleted file mode 100755 index 3c24e4b1a..000000000 --- a/doc/manual.txt +++ /dev/null @@ -1,2358 +0,0 @@ -============= -Nimrod Manual -============= - -:Author: Andreas Rumpf -:Version: |nimrodversion| - -.. contents:: - - - "Complexity" seems to be a lot like "energy": you can transfer it from the end - user to one/some of the other players, but the total amount seems to remain - pretty much constant for a given task. -- Ran - -About this document -=================== - -**Note**: This document is a draft! Several of Nimrod's features need more -precise wording. This manual will evolve into a proper specification some -day. - -This document describes the lexis, the syntax, and the semantics of Nimrod. - -The language constructs are explained using an extended BNF, in -which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and -``(a)?`` means an optional *a*; an alternative spelling for optional parts is -``[a]``. The ``|`` symbol is used to mark alternatives -and has the lowest precedence. Parentheses may be used to group elements. -Non-terminals start with a lowercase letter, abstract terminal symbols are in -UPPERCASE. Verbatim terminal symbols (including keywords) are quoted -with ``'``. An example:: - - ifStmt ::= 'if' expr ':' stmts ('elif' expr ':' stmts)* ['else' stmts] - -Other parts of Nimrod - like scoping rules or runtime semantics are only -described in an informal manner. The reason is that formal semantics are -difficult to write and understand. However, there is only one Nimrod -implementation, so one may consider it as the formal specification; -especially since the compiler's code is pretty clean (well, some parts of it). - - -Definitions -=========== - -A Nimrod program specifies a computation that acts on a memory consisting of -components called `locations`:idx:. A variable is basically a name for a -location. Each variable and location is of a certain `type`:idx:. The -variable's type is called `static type`:idx:, the location's type is called -`dynamic type`:idx:. If the static type is not the same as the dynamic type, -it is a supertype of the dynamic type. - -An `identifier`:idx: is a symbol declared as a name for a variable, type, -procedure, etc. The region of the program over which a declaration applies is -called the `scope`:idx: of the declaration. Scopes can be nested. The meaning -of an identifier is determined by the smallest enclosing scope in which the -identifier is declared. - -An expression specifies a computation that produces a value or location. -Expressions that produce locations are called `l-values`:idx:. An l-value -can denote either a location or the value the location contains, depending on -the context. Expressions whose values can be determined statically are called -`constant expressions`:idx:; they are never l-values. - -A `static error`:idx: is an error that the implementation detects before -program execution. Unless explicitly classified, an error is a static error. - -A `checked runtime error`:idx: is an error that the implementation detects -and reports at runtime. The method for reporting such errors is via *raising -exceptions*. However, the implementation provides a means to disable these -runtime checks. See the section pragmas_ for details. - -An `unchecked runtime error`:idx: is an error that is not guaranteed to be -detected, and can cause the subsequent behavior of the computation to -be arbitrary. Unchecked runtime errors cannot occur if only `safe`:idx: -language features are used. - - -Lexical Analysis -================ - -Encoding --------- - -All Nimrod source files are in the UTF-8 encoding (or its ASCII subset). Other -encodings are not supported. Any of the standard platform line termination -sequences can be used - the Unix form using ASCII LF (linefeed), the Windows -form using the ASCII sequence CR LF (return followed by linefeed), or the old -Macintosh form using the ASCII CR (return) character. All of these forms can be -used equally, regardless of platform. - - -Indentation ------------ - -Nimrod's standard grammar describes an `indentation sensitive`:idx: language. -This means that all the control structures are recognized by indentation. -Indentation consists only of spaces; tabulators are not allowed. - -The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD`` -(same indentation) are generated by the scanner, denoting an indentation. - -These terminals are only generated for lines that are not empty. - -The parser and the scanner communicate over a stack which indentation terminal -should be generated: The stack consists of integers counting the spaces. The -stack is initialized with a zero on its top. The scanner reads from the stack: -If the current indentation token consists of more spaces than the entry at the -top of the stack, a ``IND`` token is generated, else if it consists of the same -number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces, -a ``DED`` token is generated for any item on the stack that is greater than the -current. These items are later popped from the stack by the parser. At the end -of the file, a ``DED`` token is generated for each number remaining on the -stack that is larger than zero. - -Because the grammar contains some optional ``IND`` tokens, the scanner cannot -push new indentation levels. This has to be done by the parser. The symbol -``indPush`` indicates that an ``IND`` token is expected; the current number of -leading spaces is pushed onto the stack by the parser. The symbol ``indPop`` -denotes that the parser pops an item from the indentation stack. No token is -consumed by ``indPop``. - - -Comments --------- - -`Comments`:idx: start anywhere outside a string or character literal with the -hash character ``#``. -Comments consist of a concatenation of `comment pieces`:idx:. A comment piece -starts with ``#`` and runs until the end of the line. The end of line characters -belong to the piece. If the next line only consists of a comment piece which is -aligned to the preceding one, it does not start a new comment: - -.. code-block:: nimrod - - i = 0 # This is a single comment over multiple lines belonging to the - # assignment statement. The scanner merges these two pieces. - # This is a new comment belonging to the current block, but to no particular - # statement. - i = i + 1 # This a new comment that is NOT - echo(i) # continued here, because this comment refers to the echo statement - -Comments are tokens; they are only allowed at certain places in the input file -as they belong to the syntax tree! This feature enables perfect source-to-source -transformations (such as pretty-printing) and superior documentation generators. -A nice side-effect is that the human reader of the code always knows exactly -which code snippet the comment refers to. - - -Identifiers & Keywords ----------------------- - -`Identifiers`:idx: in Nimrod can be any string of letters, digits -and underscores, beginning with a letter. Two immediate following -underscores ``__`` are not allowed:: - - letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff' - digit ::= '0'..'9' - IDENTIFIER ::= letter ( ['_'] letter | digit )* - -The following `keywords`:idx: are reserved and cannot be used as identifiers: - -.. code-block:: nimrod - :file: ../data/keywords.txt - -Some keywords are unused; they are reserved for future developments of the -language. - -Nimrod is a `style-insensitive`:idx: language. This means that it is not -case-sensitive and even underscores are ignored: -**type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind -this is that this allows programmers to use their own prefered spelling style -and libraries written by different programmers cannot use incompatible -conventions. A Nimrod-aware editor or IDE can show the identifiers as -preferred. Another advantage is that it frees the programmer from remembering -the exact spelling of an identifier. - - -String literals ---------------- - -`String literals`:idx: can be delimited by matching double quotes, and can -contain the following `escape sequences`:idx:\ : - -================== =================================================== - Escape sequence Meaning -================== =================================================== - ``\n`` `newline`:idx: - ``\r``, ``\c`` `carriage return`:idx: - ``\l`` `line feed`:idx: - ``\f`` `form feed`:idx: - ``\t`` `tabulator`:idx: - ``\v`` `vertical tabulator`:idx: - ``\\`` `backslash`:idx: - ``\"`` `quotation mark`:idx: - ``\'`` `apostrophe`:idx: - ``\d+`` `character with decimal value d`:idx:; - all decimal digits directly - following are used for the character - ``\a`` `alert`:idx: - ``\b`` `backspace`:idx: - ``\e`` `escape`:idx: `[ESC]`:idx: - ``\xHH`` `character with hex value HH`:idx:; - exactly two hex digits are allowed -================== =================================================== - - -Strings in Nimrod may contain any 8-bit value, except embedded zeros. - - -Triple quoted string literals ------------------------------ - -String literals can also be delimited by three double quotes -``"""`` ... ``"""``. -Literals in this form may run for several lines, may contain ``"`` and do not -interpret any escape sequences. -For convenience, when the opening ``"""`` is immediately followed by a newline, -the newline is not included in the string. - - -Raw string literals -------------------- - -There are also `raw string literals` that are preceded with the letter ``r`` -(or ``R``) and are delimited by matching double quotes (just like ordinary -string literals) and do not interpret the escape sequences. This is especially -convenient for regular expressions or Windows paths: - -.. code-block:: nimrod - - var f = openFile(r"C:\texts\text.txt") # a raw string, so ``\t`` is no tab - - -Generalized raw string literals -------------------------------- - -The construct ``identifier"string literal"`` (without whitespace between the -identifier and the opening quotation mark) is a -`generalized raw string literal`:idx:. It is a shortcut for the construct -``identifier(r"string literal")``, so it denotes a procedure call with a -raw string literal as its only argument. Generalized raw string literals -are especially convenient for embedding mini languages directly into Nimrod -(for example regular expressions). - -The construct ``identifier"""string literal"""`` exists too. It is a shortcut -for ``identifier("""string literal""")``. - - -Character literals ------------------- - -Character literals are enclosed in single quotes ``''`` and can contain the -same escape sequences as strings - with one exception: ``\n`` is not allowed -as it may be wider than one character (often it is the pair CR/LF for example). -A character is not an Unicode character but a single byte. The reason for this -is efficiency: For the overwhelming majority of use-cases, the resulting -programs will still handle UTF-8 properly as UTF-8 was specially designed for -this. -Another reason is that Nimrod can thus support ``array[char, int]`` or -``set[char]`` efficiently as many algorithms rely on this feature. - - -Numerical constants -------------------- - -`Numerical constants`:idx: are of a single type and have the form:: - - hexdigit ::= digit | 'A'..'F' | 'a'..'f' - octdigit ::= '0'..'7' - bindigit ::= '0'..'1' - INT_LIT ::= digit ( ['_'] digit )* - | '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )* - | '0o' octdigit ( ['_'] octdigit )* - | '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )* - - INT8_LIT ::= INT_LIT '\'' ('i' | 'I' ) '8' - INT16_LIT ::= INT_LIT '\'' ('i' | 'I' ) '16' - INT32_LIT ::= INT_LIT '\'' ('i' | 'I' ) '32' - INT64_LIT ::= INT_LIT '\'' ('i' | 'I' ) '64' - - exponent ::= ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )* - FLOAT_LIT ::= digit (['_'] digit)* ('.' (['_'] digit)* [exponent] |exponent) - FLOAT32_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '32' - FLOAT64_LIT ::= ( FLOAT_LIT | INT_LIT ) '\'' ('f' | 'F') '64' - - -As can be seen in the productions, numerical constants can contain unterscores -for readability. Integer and floating point literals may be given in decimal (no -prefix), binary (prefix ``0b``), octal (prefix ``0o``) and hexadecimal -(prefix ``0x``) notation. - -There exists a literal for each numerical type that is -defined. The suffix starting with an apostophe ('\'') is called a -`type suffix`:idx:. Literals without a type prefix are of the type ``int``, -unless the literal contains a dot or an ``E`` in which case it is of -type ``float``. - -The type suffixes are: - -================= ========================= - Type Suffix Resulting type of literal -================= ========================= - ``'i8`` int8 - ``'i16`` int16 - ``'i32`` int32 - ``'i64`` int64 - ``'f32`` float32 - ``'f64`` float64 -================= ========================= - -Floating point literals may also be in binary, octal or hexadecimal -notation: -``0B0_10001110100_0000101001000111101011101111111011000101001101001001'f64`` -is approximately 1.72826e35 according to the IEEE floating point standard. - - - -Other tokens ------------- - -The following strings denote other tokens:: - - ( ) { } [ ] , ; [. .] {. .} (. .) - : = ^ .. ` - -`..`:tok: takes precedence over other tokens that contain a dot: `{..}`:tok: are -the three tokens `{`:tok:, `..`:tok:, `}`:tok: and not the two tokens -`{.`:tok:, `.}`:tok:. - -In Nimrod one can define his own operators. An `operator`:idx: is any -combination of the following characters that is not listed above:: - - + - * / < > - = @ $ ~ & % - ! ? ^ . | \ - -These keywords are also operators: -``and or not xor shl shr div mod in notin is isnot``. - - -Syntax -====== - -This section lists Nimrod's standard syntax in ENBF. How the parser receives -indentation tokens is already described in the Lexical Analysis section. - -Nimrod allows user-definable operators. -Binary operators have 8 different levels of precedence. For user-defined -operators, the precedence depends on the first character the operator consists -of. All binary operators are left-associative. - -================ ============================================== ================== =============== -Precedence level Operators First characters Terminal symbol -================ ============================================== ================== =============== - 7 (highest) ``$`` OP7 - 6 ``* / div mod shl shr %`` ``* % \ /`` OP6 - 5 ``+ -`` ``+ ~ |`` OP5 - 4 ``&`` ``&`` OP4 - 3 ``== <= < >= > != in not_in is isnot`` ``= < > !`` OP3 - 2 ``and`` OP2 - 1 ``or xor`` OP1 - 0 (lowest) ``? @ ^ ` : .`` OP0 -================ ============================================== ================== =============== - - -The grammar's start symbol is ``module``. The grammar is LL(1) and therefore -not ambiguous. - -.. include:: grammar.txt - :literal: - - - -Semantics -========= - -Constants ---------- - -`Constants`:idx: are symbols which are bound to a value. The constant's value -cannot change. The compiler must be able to evaluate the expression in a -constant declaration at compile time. - -Nimrod contains a sophisticated compile-time evaluator, so procedures which -have no side-effect can be used in constant expressions too: - -.. code-block:: nimrod - import strutils - const - constEval = contains("abc", 'b') # computed at compile time! - - -Types ------ - -All expressions have a `type`:idx: which is known at compile time. Nimrod -is statically typed. One can declare new types, which is in essence defining -an identifier that can be used to denote this custom type. - -These are the major type classes: - -* ordinal types (consist of integer, bool, character, enumeration - (and subranges thereof) types) -* floating point types -* string type -* structured types -* reference (pointer) type -* procedural type -* generic type - - -Ordinal types -~~~~~~~~~~~~~ -`Ordinal types`:idx: have the following characteristics: - -- Ordinal types are countable and ordered. This property allows - the operation of functions as ``Inc``, ``Ord``, ``Dec`` on ordinal types to - be defined. -- Ordinal values have a smallest possible value. Trying to count further - down than the smallest value gives a checked runtime or static error. -- Ordinal values have a largest possible value. Trying to count further - than the largest value gives a checked runtime or static error. - -Integers, bool, characters and enumeration types (and subrange of these -types) belong to ordinal types. - - -Pre-defined numerical types -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -These integer types are pre-defined: - -``int`` - the generic signed integer type; its size is platform dependant - (the compiler chooses the processor's fastest integer type) - this type should be used in general. An integer literal that has no type - suffix is of this type. - -intXX - additional signed integer types of XX bits use this naming scheme - (example: int16 is a 16 bit wide integer). - The current implementation supports ``int8``, ``int16``, ``int32``, ``int64``. - Literals of these types have the suffix 'iXX. - - -There are no `unsigned integer`:idx: types, only `unsigned operations`:idx: -that treat their arguments as unsigned. Unsigned operations all wrap around; -they cannot lead to over- or underflow errors. Unsigned operations use the -``%`` suffix as convention: - -====================== ====================================================== -operation meaning -====================== ====================================================== -``a +% b`` unsigned integer addition -``a -% b`` unsigned integer substraction -``a *% b`` unsigned integer multiplication -``a /% b`` unsigned integer division -``a %% b`` unsigned integer modulo operation -``a <% b`` treat ``a`` and ``b`` as unsigned and compare -``a <=% b`` treat ``a`` and ``b`` as unsigned and compare -``ze(a)`` extends the bits of ``a`` with zeros until it has the - width of the ``int`` type -``toU8(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 8 bits (but still the - ``int8`` type) -``toU16(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 16 bits (but still the - ``int16`` type) -``toU32(a)`` treats ``a`` as unsigned and converts it to an - unsigned integer of 32 bits (but still the - ``int32`` type) -====================== ====================================================== - -The following floating point types are pre-defined: - -``float`` - the generic floating point type; its size is platform dependant - (the compiler chooses the processor's fastest floating point type) - this type should be used in general - -floatXX - an implementation may define additional floating point types of XX bits using - this naming scheme (example: float64 is a 64 bit wide float). The current - implementation supports ``float32`` and ``float64``. Literals of these types - have the suffix 'fXX. - -`Automatic type conversion`:idx: is performed in expressions where different -kinds of integer types are used. However, if the type conversion -loses information, the `EOutOfRange`:idx: exception is raised (if the error -cannot be detected at compile time). - -Automatic type conversion in expressions with different kinds -of floating point types is performed: The smaller type is -converted to the larger. Arithmetic performed on floating point types -follows the IEEE standard. Integer types are not converted to floating point -types automatically and vice versa. - - -Boolean type -~~~~~~~~~~~~ -The `boolean`:idx: type is named ``bool`` in Nimrod and can be one of the two -pre-defined values ``true`` and ``false``. Conditions in while, -if, elif, when statements need to be of type bool. - -This condition holds:: - - ord(false) == 0 and ord(true) == 1 - -The operators ``not, and, or, xor, <, <=, >, >=, !=, ==`` are defined -for the bool type. The ``and`` and ``or`` operators perform short-cut -evaluation. Example: - -.. code-block:: nimrod - - while p != nil and p.name != "xyz": - # p.name is not evaluated if p == nil - p = p.next - - -The size of the bool type is one byte. - - -Character type -~~~~~~~~~~~~~~ -The `character type`:idx: is named ``char`` in Nimrod. Its size is one byte. -Thus it cannot represent an UTF-8 character, but a part of it. -The reason for this is efficiency: For the overwhelming majority of use-cases, -the resulting programs will still handle UTF-8 properly as UTF-8 was specially -designed for this. -Another reason is that Nimrod can support ``array[char, int]`` or -``set[char]`` efficiently as many algorithms rely on this feature. The -`TRune` type is used for Unicode characters, it can represent any Unicode -character. ``TRune`` is declared in the ``unicode`` module. - - - -Enumeration types -~~~~~~~~~~~~~~~~~ -`Enumeration`:idx: types define a new type whose values consist of the ones -specified. The values are ordered. Example: - -.. code-block:: nimrod - - type - TDirection = enum - north, east, south, west - - -Now the following holds:: - - ord(north) == 0 - ord(east) == 1 - ord(south) == 2 - ord(west) == 3 - -Thus, north < east < south < west. The comparison operators can be used -with enumeration types. - -For better interfacing to other programming languages, the fields of enum -types can be assigned an explicit ordinal value. However, the ordinal values -have to be in ascending order. A field whose ordinal value is not -explicitly given is assigned the value of the previous field + 1. - -An explicit ordered enum can have *wholes*: - -.. code-block:: nimrod - type - TTokenType = enum - a = 2, b = 4, c = 89 # wholes are valid - -However, it is then not an ordinal anymore, so it is not possible to use these -enums as an index type for arrays. The procedures ``inc``, ``dec``, ``succ`` -and ``pred`` are not available for them either. - - -Subrange types -~~~~~~~~~~~~~~ -A `subrange`:idx: type is a range of values from an ordinal type (the base -type). To define a subrange type, one must specify it's limiting values: the -lowest and highest value of the type: - -.. code-block:: nimrod - type - TSubrange = range[0..5] - - -``TSubrange`` is a subrange of an integer which can only hold the values 0 -to 5. Assigning any other value to a variable of type ``TSubrange`` is a -checked runtime error (or static error if it can be statically -determined). Assignments from the base type to one of its subrange types -(and vice versa) are allowed. - -A subrange type has the same size as its base type (``int`` in the example). - - -String type -~~~~~~~~~~~ -All string literals are of the type `string`:idx:. A string in Nimrod is very -similar to a sequence of characters. However, strings in Nimrod are both -zero-terminated and have a length field. One can retrieve the length with the -builtin ``len`` procedure; the length never counts the terminating zero. -The assignment operator for strings always copies the string. - -Strings are compared by their lexicographical order. All comparison operators -are available. Strings can be indexed like arrays (lower bound is 0). Unlike -arrays, they can be used in case statements: - -.. code-block:: nimrod - - case paramStr(i) - of "-v": incl(options, optVerbose) - of "-h", "-?": incl(options, optHelp) - else: write(stdout, "invalid command line option!\n") - -Per convention, all strings are UTF-8 strings, but this is not enforced. For -example, when reading strings from binary files, they are merely a sequence of -bytes. The index operation ``s[i]`` means the i-th *char* of ``s``, not the -i-th *unichar*. The iterator ``runes`` from the ``unicode`` -module can be used for iteration over all unicode characters. - - -Structured types -~~~~~~~~~~~~~~~~ -A variable of a `structured type`:idx: can hold multiple values at the same -time. Stuctured types can be nested to unlimited levels. Arrays, sequences, -tuples, objects and sets belong to the structured types. - -Array and sequence types -~~~~~~~~~~~~~~~~~~~~~~~~ -`Arrays`:idx: are a homogenous type, meaning that each element in the array -has the same type. Arrays always have a fixed length which is specified at -compile time (except for open arrays). They can be indexed by any ordinal type. -A parameter ``A`` may be an *open array*, in which case it is indexed by -integers from 0 to ``len(A)-1``. An array expression may be constructed by the -array constructor ``[]``. - -`Sequences`:idx: are similar to arrays but of dynamic length which may change -during runtime (like strings). A sequence ``S`` is always indexed by integers -from 0 to ``len(S)-1`` and its bounds are checked. Sequences can be -constructed by the array constructor ``[]`` in conjunction with the array to -sequence operator ``@``. Another way to allocate space for a sequence is to -call the built-in ``newSeq`` procedure. - -A sequence may be passed to a parameter that is of type *open array*. - -Example: - -.. code-block:: nimrod - - type - TIntArray = array[0..5, int] # an array that is indexed with 0..5 - TIntSeq = seq[int] # a sequence of integers - var - x: TIntArray - y: TIntSeq - x = [1, 2, 3, 4, 5, 6] # [] this is the array constructor - y = @[1, 2, 3, 4, 5, 6] # the @ turns the array into a sequence - -The lower bound of an array or sequence may be received by the built-in proc -``low()``, the higher bound by ``high()``. The length may be -received by ``len()``. ``low()`` for a sequence or an open array always returns -0, as this is the first valid index. - -The notation ``x[i]`` can be used to access the i-th element of ``x``. - -Arrays are always bounds checked (at compile-time or at runtime). These -checks can be disabled via pragmas or invoking the compiler with the -``--bound_checks:off`` command line switch. - -An open array is also a means to implement passing a variable number of -arguments to a procedure. The compiler converts the list of arguments -to an array automatically: - -.. code-block:: nimrod - proc myWriteln(f: TFile, a: openarray[string]) = - for s in items(a): - write(f, s) - write(f, "\n") - - myWriteln(stdout, "abc", "def", "xyz") - # is transformed by the compiler to: - myWriteln(stdout, ["abc", "def", "xyz"]) - -This transformation is only done if the openarray parameter is the -last parameter in the procedure header. The current implementation does not -support nested open arrays. - - -Tuples and object types -~~~~~~~~~~~~~~~~~~~~~~~ -A variable of a `tuple`:idx: or `object`:idx: type is a heterogenous storage -container. -A tuple or object defines various named *fields* of a type. A tuple also -defines an *order* of the fields. Tuples are meant for heterogenous storage -types with no overhead and few abstraction possibilities. The constructor ``()`` -can be used to construct tuples. The order of the fields in the constructor -must match the order of the tuple's definition. Different tuple-types are -*equivalent* if they specify the same fields of the same type in the same -order. - -The assignment operator for tuples copies each component. -The default assignment operator for objects copies each component. Overloading -of the assignment operator for objects is not possible, but this may change in -future versions of the compiler. - -.. code-block:: nimrod - - type - TPerson = tuple[name: string, age: int] # type representing a person - # a person consists of a name - # and an age - var - person: TPerson - person = (name: "Peter", age: 30) - # the same, but less readable: - person = ("Peter", 30) - -The implementation aligns the fields for best access performance. The alignment -is compatible with the way the C compiler does it. - -Objects provide many features that tuples do not. Object provide inheritance -and information hiding. Objects have access to their type at runtime, so that -the ``is`` operator can be used to determine the object's type. - -.. code-block:: nimrod - - type - TPerson = object - name*: string # the * means that `name` is accessible from other modules - age: int # no * means that the field is hidden - - TStudent = object of TPerson # a student is a person - id: int # with an id field - - var - student: TStudent - person: TPerson - assert(student is TStudent) # is true - -Object fields that should be visible from outside the defining module, have to -marked by ``*``. In contrast to tuples, different object types are -never *equivalent*. - - -Object variants -~~~~~~~~~~~~~~~ -Often an object hierarchy is overkill in certain situations where simple -`variant`:idx: types are needed. - -An example: - -.. code-block:: nimrod - - # This is an example how an abstract syntax tree could be modelled in Nimrod - type - TNodeKind = enum # the different node types - nkInt, # a leaf with an integer value - nkFloat, # a leaf with a float value - nkString, # a leaf with a string value - nkAdd, # an addition - nkSub, # a subtraction - nkIf # an if statement - PNode = ref TNode - TNode = object - case kind: TNodeKind # the ``kind`` field is the discriminator - of nkInt: intVal: int - of nkFloat: floavVal: float - of nkString: strVal: string - of nkAdd, nkSub: - leftOp, rightOp: PNode - of nkIf: - condition, thenPart, elsePart: PNode - - var - n: PNode - new(n) # creates a new node - n.kind = nkFloat - n.floatVal = 0.0 # valid, because ``n.kind==nkFloat``, so that it fits - - # the following statement raises an `EInvalidField` exception, because - # n.kind's value does not fit: - n.strVal = "" - -As can been seen from the example, an advantage to an object hierarchy is that -no casting between different object types is needed. Yet, access to invalid -object fields raises an exception. - - -Set type -~~~~~~~~ -The `set type`:idx: models the mathematical notion of a set. The set's -basetype can only be an ordinal type. The reason is that sets are implemented -as high performance bit vectors. - -Sets can be constructed via the set constructor: ``{}`` is the empty set. The -empty set is type compatible with any special set type. The constructor -can also be used to include elements (and ranges of elements) in the set: - -.. code-block:: nimrod - - {'a'..'z', '0'..'9'} # This constructs a set that conains the - # letters from 'a' to 'z' and the digits - # from '0' to '9' - -These operations are supported by sets: - -================== ======================================================== -operation meaning -================== ======================================================== -``A + B`` union of two sets -``A * B`` intersection of two sets -``A - B`` difference of two sets (A without B's elements) -``A == B`` set equality -``A <= B`` subset relation (A is subset of B or equal to B) -``A < B`` strong subset relation (A is a real subset of B) -``e in A`` set membership (A contains element e) -``A -+- B`` symmetric set difference (= (A - B) + (B - A)) -``card(A)`` the cardinality of A (number of elements in A) -``incl(A, elem)`` same as A = A + {elem} -``excl(A, elem)`` same as A = A - {elem} -================== ======================================================== - - -Reference and pointer types -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -References (similiar to `pointers`:idx: in other programming languages) are a -way to introduce many-to-one relationships. This means different references can -point to and modify the same location in memory. - -Nimrod distinguishes between `traced`:idx: and `untraced`:idx: references. -Untraced references are also called *pointers*. Traced references point to -objects of a garbage collected heap, untraced references point to -manually allocated objects or to objects somewhere else in memory. Thus -untraced references are *unsafe*. However for certain low-level operations -(accessing the hardware) untraced references are unavoidable. - -Traced references are declared with the **ref** keyword, untraced references -are declared with the **ptr** keyword. - -The ``^`` operator can be used to derefer a reference, the ``addr`` procedure -returns the address of an item. An address is always an untraced reference. -Thus the usage of ``addr`` is an *unsafe* feature. - -The ``.`` (access a tuple/object field operator) -and ``[]`` (array/string/sequence index operator) operators perform implicit -dereferencing operations for reference types: - -.. code-block:: nimrod - - type - PNode = ref TNode - TNode = object - le, ri: PNode - data: int - - var - n: PNode - new(n) - n.data = 9 # no need to write n^ .data - -To allocate a new traced object, the built-in procedure ``new`` has to be used. -To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and -``realloc`` can be used. The documentation of the system module contains -further information. - -If a reference points to *nothing*, it has the value ``nil``. - -Special care has to be taken if an untraced object contains traced objects like -traced references, strings or sequences: In order to free everything properly, -the built-in procedure ``GCunref`` has to be called before freeing the -untraced memory manually! - -.. XXX finalizers for traced objects - -Procedural type -~~~~~~~~~~~~~~~ -A `procedural type`:idx: is internally a pointer to a procedure. ``nil`` is -an allowed value for variables of a procedural type. Nimrod uses procedural -types to achieve `functional`:idx: programming techniques. Dynamic dispatch -for OOP constructs can also be implemented with procedural types. - -Example: - -.. code-block:: nimrod - - type - TCallback = proc (x: int) {.cdecl.} - - proc printItem(x: Int) = ... - - proc forEach(c: TCallback) = - ... - - forEach(printItem) # this will NOT work because calling conventions differ - -A subtle issue with procedural types is that the calling convention of the -procedure influences the type compability: Procedural types are only compatible -if they have the same calling convention. - -Nimrod supports these `calling conventions`:idx:, which are all incompatible to -each other: - -`stdcall`:idx: - This the stdcall convention as specified by Microsoft. The generated C - procedure is declared with the ``__stdcall`` keyword. - -`cdecl`:idx: - The cdecl convention means that a procedure shall use the same convention - as the C compiler. Under windows the generated C procedure is declared with - the ``__cdecl`` keyword. - -`safecall`:idx: - This is the safecall convention as specified by Microsoft. The generated C - procedure is declared with the ``__safecall`` keyword. The word *safe* - refers to the fact that all hardware registers shall be pushed to the - hardware stack. - -`inline`:idx: - The inline convention means the the caller should not call the procedure, - but inline its code directly. Note that Nimrod does not inline, but leaves - this to the C compiler. Thus it generates ``__inline`` procedures. This is - only a hint for the compiler: It may completely ignore it and - it may inline procedures that are not marked as ``inline``. - -`fastcall`:idx: - Fastcall means different things to different C compilers. One gets whatever - the C ``__fastcall`` means. - -`nimcall`:idx: - Nimcall is the default convention used for Nimrod procedures. It is the - same as ``fastcall``, but only for C compilers that support ``fastcall``. - -`closure`:idx: - indicates that the procedure expects a context, a closure that needs - to be passed to the procedure. The calling convention ``nimcall`` is - compatible to ``closure``. - -`syscall`:idx: - The syscall convention is the same as ``__syscall`` in C. It is used for - interrupts. - -`noconv`:idx: - The generated C code will not have any explicit calling convention and thus - use the C compiler's default calling convention. This is needed because - Nimrod's default calling convention for procedures is ``fastcall`` to - improve speed. - -Most calling conventions exist only for the Windows 32-bit platform. - - -Distinct type -~~~~~~~~~~~~~ - -A distinct type is new type derived from a `base type`:idx: that is -incompatible with its base type. In particular, it is an essential property -of a distinct type that it **does not** imply a subtype relation between it -and its base type. Explict type conversions from a distinct type to its -base type and vice versa are allowed. - -A distinct type can be used to model different physical `units`:idx: with a -numerical base type, for example. The following example models currencies. - -Different currencies should not be mixed in monetary calculations. Distinct -types are a perfect tool to model different currencies: - -.. code-block:: nimrod - type - TDollar = distinct int - TEuro = distinct int - - var - d: TDollar - e: TEuro - - echo d + 12 - # Error: cannot add a number with no unit and a ``TDollar`` - -Unfortunetaly, ``d + 12.TDollar`` is not allowed either, -because ``+`` is defined for ``int`` (among others), not for ``TDollar``. So -a ``+`` for dollars needs to be defined: - -.. code-block:: - proc `+` (x, y: TDollar): TDollar = - result = TDollar(int(x) + int(y)) - -It does not make sense to multiply a dollar with a dollar, but with a -number without unit; and the same holds for division: - -.. code-block:: - proc `*` (x: TDollar, y: int): TDollar = - result = TDollar(int(x) * y) - - proc `*` (x: int, y: TDollar): TDollar = - result = TDollar(x * int(y)) - - proc `div` ... - -This quickly gets tedious. The implementations are trivial and the compiler -should not generate all this code only to optimize it away later - after all -``+`` for dollars should produce the same binary code as ``+`` for ints. -The pragma ``borrow`` has been designed to solve this problem; in principle -it generates the above trivial implementations: - -.. code-block:: nimrod - proc `*` (x: TDollar, y: int): TDollar {.borrow.} - proc `*` (x: int, y: TDollar): TDollar {.borrow.} - proc `div` (x: TDollar, y: int): TDollar {.borrow.} - -The ``borrow`` pragma makes the compiler use the same implementation as -the proc that deals with the distinct type's base type, so no code is -generated. - -But it seems all this boilerplate code needs to be repeated for the ``TEuro`` -currency. This can be solved with templates_. - -.. code-block:: nimrod - template Additive(typ: typeDesc): stmt = - proc `+` *(x, y: typ): typ {.borrow.} - proc `-` *(x, y: typ): typ {.borrow.} - - # unary operators: - proc `+` *(x: typ): typ {.borrow.} - proc `-` *(x: typ): typ {.borrow.} - - template Multiplicative(typ, base: typeDesc): stmt = - proc `*` *(x: typ, y: base): typ {.borrow.} - proc `*` *(x: base, y: typ): typ {.borrow.} - proc `div` *(x: typ, y: base): typ {.borrow.} - proc `mod` *(x: typ, y: base): typ {.borrow.} - - template Comparable(typ: typeDesc): stmt = - proc `<` * (x, y: typ): bool {.borrow.} - proc `<=` * (x, y: typ): bool {.borrow.} - proc `==` * (x, y: typ): bool {.borrow.} - - template DefineCurrency(typ, base: expr): stmt = - type - typ* = distinct base - Additive(typ) - Multiplicative(typ, base) - Comparable(typ) - - DefineCurrency(TDollar, int) - DefineCurrency(TEuro, int) - - - -Type relations --------------- - -The following section defines several relations on types that are needed to -describe the type checking done by the compiler. - - -Type equality -~~~~~~~~~~~~~ -Nimrod uses structural type equivalence for most types. Only for objects, -enumerations and abstract types name equivalence is used. The following -algorithm determines type equality: - -.. code-block:: nimrod - proc typeEqualsAux(a, b: PType, - s: var set[tuple[PType, PType]]): bool = - if (a,b) in s: return true - incl(s, (a,b)) - if a.kind == b.kind: - case a.kind - of int, intXX, float, floatXX, char, string, cstring, pointer, bool, nil: - # leaf type: kinds identical; nothing more to check - result = true - of ref, ptr, var, set, seq, openarray: - result = typeEqualsAux(a.baseType, b.baseType, s) - of range: - result = typeEqualsAux(a.baseType, b.baseType, s) and - (a.rangeA == b.rangeA) and (a.rangeB == b.rangeB) - of array: - result = typeEqualsAux(a.baseType, b.baseType, s) and - typeEqualsAux(a.indexType, b.indexType, s) - of tuple: - if a.tupleLen == b.tupleLen: - for i in 0..a.tupleLen-1: - if not typeEqualsAux(a[i], b[i], s): return false - result = true - of object, enum, distinct: - result = a == b - of proc: - result = typeEqualsAux(a.parameterTuple, b.parameterTuple, s) and - typeEqualsAux(a.resultType, b.resultType, s) and - a.callingConvention == b.callingConvention - - proc typeEquals(a, b: PType): bool = - var s: set[tuple[PType, PType]] = {} - result = typeEqualsAux(a, b, s) - -Since types are graphs which can have cycles, the above algorithm needs an -auxiliary set ``s`` to detect this case. - - -Subtype relation -~~~~~~~~~~~~~~~~ -If object ``a`` inherits from ``b``, ``a`` is a subtype of ``b``. This subtype -relation is extended to the types ``var``, ``ref``, ``ptr``: - -.. code-block:: nimrod - proc isSubtype(a, b: PType): bool = - if a.kind == b.kind: - case a.kind - of object: - var aa = a.baseType - while aa != nil and aa != b: aa = aa.baseType - result = aa == b - of var, ref, ptr: - result = isSubtype(a.baseType, b.baseType) - -.. XXX nil is a special value! - - -Convertible relation -~~~~~~~~~~~~~~~~~~~~ -A type ``a`` is **implicitely** convertible to type ``b`` iff the following -algorithm returns true: - -.. code-block:: nimrod - # XXX range types? - proc isImplicitelyConvertible(a, b: PType): bool = - case a.kind - of proc: - if b.kind == proc: - var x = a.parameterTuple - var y = b.parameterTuple - if x.tupleLen == y.tupleLen: - for i in 0.. x.tupleLen-1: - if not isSubtype(x[i], y[i]): return false - result = isSubType(b.resultType, a.resultType) - of int8: result = b.kind in {int16, int32, int64, int} - of int16: result = b.kind in {int32, int64, int} - of int32: result = b.kind in {int64, int} - of float: result = b.kind in {float32, float64} - of float32: result = b.kind in {float64, float} - of float64: result = b.kind in {float32, float} - of seq: - result = b.kind == openArray and typeEquals(a.baseType, b.baseType) - of array: - result = b.kind == openArray and typeEquals(a.baseType, b.baseType) - if a.baseType == char and a.indexType.rangeA == 0: - result = b.kind = cstring - of cstring, ptr: - result = b.kind == pointer - of string: - result = b.kind == cstring - -A type ``a`` is **explicitely** convertible to type ``b`` iff the following -algorithm returns true: - -.. code-block:: nimrod - proc isIntegralType(t: PType): bool = - result = isOrdinal(t) or t.kind in {float, float32, float64} - - proc isExplicitelyConvertible(a, b: PType): bool = - if isImplicitelyConvertible(a, b): return true - if isIntegralType(a) and isIntegralType(b): return true - if isSubtype(a, b) or isSubtype(b, a): return true - if a.kind == distinct and typeEquals(a.baseType, b): return true - if b.kind == distinct and typeEquals(b.baseType, a): return true - return false - - -Assignment compability -~~~~~~~~~~~~~~~~~~~~~~ - -An expression ``b`` can be assigned to an expression ``a`` iff ``a`` is an -`l-value` and ``isImplicitelyConvertible(b.typ, a.typ)`` holds. - - -Overloading resolution -~~~~~~~~~~~~~~~~~~~~~~ - -To be written. - - -Statements and expressions --------------------------- -Nimrod uses the common statement/expression paradigm: `Statements`:idx: do not -produce a value in contrast to expressions. Call expressions are statements. -If the called procedure returns a value, it is not a valid statement -as statements do not produce values. To evaluate an expression for -side-effects and throw its value away, one can use the ``discard`` statement. - -Statements are separated into `simple statements`:idx: and -`complex statements`:idx:. -Simple statements are statements that cannot contain other statements like -assignments, calls or the ``return`` statement; complex statements can -contain other statements. To avoid the `dangling else problem`:idx:, complex -statements always have to be intended:: - - simpleStmt ::= returnStmt - | yieldStmt - | discardStmt - | raiseStmt - | breakStmt - | continueStmt - | pragma - | importStmt - | fromStmt - | includeStmt - | exprStmt - complexStmt ::= ifStmt | whileStmt | caseStmt | tryStmt | forStmt - | blockStmt | asmStmt - | procDecl | iteratorDecl | macroDecl | templateDecl - | constSection | typeSection | whenStmt | varSection - - - -Discard statement -~~~~~~~~~~~~~~~~~ - -Syntax:: - - discardStmt ::= 'discard' expr - -Example: - -.. code-block:: nimrod - - discard proc_call("arg1", "arg2") # discard the return value of `proc_call` - -The `discard`:idx: statement evaluates its expression for side-effects and -throws the expression's resulting value away. If the expression has no -side-effects, this generates a static error. Ignoring the return value of a -procedure without using a discard statement is not allowed. - - -Var statement -~~~~~~~~~~~~~ - -Syntax:: - - colonOrEquals ::= ':' typeDesc ['=' expr] | '=' expr - varField ::= symbol ['*'] [pragma] - varPart ::= symbol (comma symbol)* [comma] colonOrEquals [COMMENT | IND COMMENT] - varSection ::= 'var' (varPart - | indPush (COMMENT|varPart) - (SAD (COMMENT|varPart))* DED indPop) - - -`Var`:idx: statements declare new local and global variables and -initialize them. A comma seperated list of variables can be used to specify -variables of the same type: - -.. code-block:: nimrod - - var - a: int = 0 - x, y, z: int - -If an initializer is given the type can be omitted: The variable is of the -same type as the initializing expression. Variables are always initialized -with a default value if there is no initializing expression. The default -value depends on the type and is always a zero in binary. - -============================ ============================================== -Type default value -============================ ============================================== -any integer type 0 -any float 0.0 -char '\\0' -bool false -ref or pointer type nil -procedural type nil -sequence nil (*not* ``@[]``) -string nil (*not* "") -tuple[x: A, y: B, ...] (default(A), default(B), ...) - (analogous for objects) -array[0..., T] [default(T), ...] -range[T] default(T); this may be out of the valid range -T = enum cast[T](0); this may be an invalid value -============================ ============================================== - - -Const section -~~~~~~~~~~~~~ - -Syntax:: - - colonAndEquals ::= [':' typeDesc] '=' expr - - constDecl ::= symbol ['*'] [pragma] colonAndEquals [COMMENT | IND COMMENT] - | COMMENT - constSection ::= 'const' indPush constDecl (SAD constDecl)* DED indPop - - -Example: - -.. code-block:: nimrod - - const - MyFilename = "/home/my/file.txt" - debugMode: bool = false - -The `const`:idx: section declares symbolic constants. A symbolic constant is -a name for a constant expression. Symbolic constants only allow read-access. - - -If statement -~~~~~~~~~~~~ - -Syntax:: - - ifStmt ::= 'if' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt] - -Example: - -.. code-block:: nimrod - - var name = readLine(stdin) - - if name == "Andreas": - echo("What a nice name!") - elif name == "": - echo("Don't you have a name?") - else: - echo("Boring name...") - -The `if`:idx: statement is a simple way to make a branch in the control flow: -The expression after the keyword ``if`` is evaluated, if it is true -the corresponding statements after the ``:`` are executed. Otherwise -the expression after the ``elif`` is evaluated (if there is an -``elif`` branch), if it is true the corresponding statements after -the ``:`` are executed. This goes on until the last ``elif``. If all -conditions fail, the ``else`` part is executed. If there is no ``else`` -part, execution continues with the statement after the ``if`` statement. - - -Case statement -~~~~~~~~~~~~~~ - -Syntax:: - - caseStmt ::= 'case' expr [':'] ('of' sliceExprList ':' stmt)* - ('elif' expr ':' stmt)* - ['else' ':' stmt] - -Example: - -.. code-block:: nimrod - - case readline(stdin) - of "delete-everything", "restart-computer": - echo("permission denied") - of "go-for-a-walk": echo("please yourself") - else: echo("unknown command") - -The `case`:idx: statement is similar to the if statement, but it represents -a multi-branch selection. The expression after the keyword ``case`` is -evaluated and if its value is in a *vallist* the corresponding statements -(after the ``of`` keyword) are executed. If the value is not in any -given *slicelist* the ``else`` part is executed. If there is no ``else`` -part and not all possible values that ``expr`` can hold occur in a ``vallist``, -a static error is given. This holds only for expressions of ordinal types. -If the expression is not of an ordinal type, and no ``else`` part is -given, control passes after the ``case`` statement. - -To suppress the static error in the ordinal case an ``else`` part with a ``nil`` -statement can be used. - - -When statement -~~~~~~~~~~~~~~ - -Syntax:: - - whenStmt ::= 'when' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt] - -Example: - -.. code-block:: nimrod - - when sizeof(int) == 2: - echo("running on a 16 bit system!") - elif sizeof(int) == 4: - echo("running on a 32 bit system!") - elif sizeof(int) == 8: - echo("running on a 64 bit system!") - else: - echo("cannot happen!") - -The `when`:idx: statement is almost identical to the ``if`` statement with some -exceptions: - -* Each ``expr`` has to be a constant expression (of type ``bool``). -* The statements do not open a new scope if they introduce new identifiers. -* The statements that belong to the expression that evaluated to true are - translated by the compiler, the other statements are not checked for - semantics! However, each ``expr`` is checked for semantics. - -The ``when`` statement enables conditional compilation techniques. As -a special syntatic extension, the ``when`` construct is also available -within ``object`` definitions. - - -Raise statement -~~~~~~~~~~~~~~~ - -Syntax:: - - raiseStmt ::= 'raise' [expr] - -Example: - -.. code-block:: nimrod - raise newEOS("operating system failed") - -Apart from built-in operations like array indexing, memory allocation, etc. -the ``raise`` statement is the only way to raise an exception. - -.. XXX document this better! - -If no exception name is given, the current exception is `re-raised`:idx:. The -`ENoExceptionToReraise`:idx: exception is raised if there is no exception to -re-raise. It follows that the ``raise`` statement *always* raises an -exception. - - -Try statement -~~~~~~~~~~~~~ - -Syntax:: - - qualifiedIdent ::= symbol ['.' symbol] - exceptList ::= [qualifiedIdent (comma qualifiedIdent)* [comma]] - tryStmt ::= 'try' ':' stmt - ('except' exceptList ':' stmt)* - ['finally' ':' stmt] - -Example: - -.. code-block:: nimrod - # read the first two lines of a text file that should contain numbers - # and tries to add them - var - f: TFile - if open(f, "numbers.txt"): - try: - var a = readLine(f) - var b = readLine(f) - echo("sum: " & $(parseInt(a) + parseInt(b))) - except EOverflow: - echo("overflow!") - except EInvalidValue: - echo("could not convert string to integer") - except EIO: - echo("IO error!") - except: - echo("Unknown exception!") - finally: - close(f) - -The statements after the `try`:idx: are executed in sequential order unless -an exception ``e`` is raised. If the exception type of ``e`` matches any -of the list ``exceptlist`` the corresponding statements are executed. -The statements following the ``except`` clauses are called -`exception handlers`:idx:. - -The empty `except`:idx: clause is executed if there is an exception that is -in no list. It is similiar to an ``else`` clause in ``if`` statements. - -If there is a `finally`:idx: clause, it is always executed after the -exception handlers. - -The exception is *consumed* in an exception handler. However, an -exception handler may raise another exception. If the exception is not -handled, it is propagated through the call stack. This means that often -the rest of the procedure - that is not within a ``finally`` clause - -is not executed (if an exception occurs). - - -Return statement -~~~~~~~~~~~~~~~~ - -Syntax:: - - returnStmt ::= 'return' [expr] - -Example: - -.. code-block:: nimrod - return 40+2 - -The `return`:idx: statement ends the execution of the current procedure. -It is only allowed in procedures. If there is an ``expr``, this is syntactic -sugar for: - -.. code-block:: nimrod - result = expr - return result - -``return`` without an expression is a short notation for ``return result`` if -the proc has a return type. The `result`:idx: variable is always the return -value of the procedure. It is automatically declared by the compiler. As all -variables, ``result`` is initialized to (binary) zero: - -.. code-block:: nimrod - proc returnZero(): int = - # implicitely returns 0 - - -Yield statement -~~~~~~~~~~~~~~~ - -Syntax:: - - yieldStmt ::= 'yield' expr - -Example: - -.. code-block:: nimrod - yield (1, 2, 3) - -The `yield`:idx: statement is used instead of the ``return`` statement in -iterators. It is only valid in iterators. Execution is returned to the body -of the for loop that called the iterator. Yield does not end the iteration -process, but execution is passed back to the iterator if the next iteration -starts. See the section about iterators (`Iterators and the for statement`_) -for further information. - - -Block statement -~~~~~~~~~~~~~~~ - -Syntax:: - - blockStmt ::= 'block' [symbol] ':' stmt - -Example: - -.. code-block:: nimrod - var found = false - block myblock: - for i in 0..3: - for j in 0..3: - if a[j][i] == 7: - found = true - break myblock # leave the block, in this case both for-loops - echo(found) - -The block statement is a means to group statements to a (named) `block`:idx:. -Inside the block, the ``break`` statement is allowed to leave the block -immediately. A ``break`` statement can contain a name of a surrounding -block to specify which block is to leave. - - -Break statement -~~~~~~~~~~~~~~~ - -Syntax:: - - breakStmt ::= 'break' [symbol] - -Example: - -.. code-block:: nimrod - break - -The `break`:idx: statement is used to leave a block immediately. If ``symbol`` -is given, it is the name of the enclosing block that is to leave. If it is -absent, the innermost block is left. - - -While statement -~~~~~~~~~~~~~~~ - -Syntax:: - - whileStmt ::= 'while' expr ':' stmt - -Example: - -.. code-block:: nimrod - echo("Please tell me your password: \n") - var pw = readLine(stdin) - while pw != "12345": - echo("Wrong password! Next try: \n") - pw = readLine(stdin) - - -The `while`:idx: statement is executed until the ``expr`` evaluates to false. -Endless loops are no error. ``while`` statements open an `implicit block`, -so that they can be left with a ``break`` statement. - - -Continue statement -~~~~~~~~~~~~~~~~~~ - -Syntax:: - - continueStmt ::= 'continue' - -A `continue`:idx: statement leads to the immediate next iteration of the -surrounding loop construct. It is only allowed within a loop. A continue -statement is syntactic sugar for a nested block: - -.. code-block:: nimrod - while expr1: - stmt1 - continue - stmt2 - -Is equivalent to: - -.. code-block:: nimrod - while expr1: - block myBlockName: - stmt1 - break myBlockName - stmt2 - - -Assembler statement -~~~~~~~~~~~~~~~~~~~ -Syntax:: - - asmStmt ::= 'asm' [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT) - -The direct embedding of `assembler`:idx: code into Nimrod code is supported -by the unsafe ``asm`` statement. Identifiers in the assembler code that refer to -Nimrod identifiers shall be enclosed in a special character which can be -specified in the statement's pragmas. The default special character is ``'`'``. - - -If expression -~~~~~~~~~~~~~ - -An `if expression` is almost like an if statement, but it is an expression. -Example: - -.. code-block:: nimrod - p(if x > 8: 9 else: 10) - -An if expression always results in a value, so the ``else`` part is -required. ``Elif`` parts are also allowed (but unlikely to be good -style). - - -Type convertions -~~~~~~~~~~~~~~~~ -Syntactically a `type conversion` is like a procedure call, but a -type name replaces the procedure name. A type conversion is always -safe in the sense that a failure to convert a type to another -results in an exception (if it cannot be determined statically). - - -Type casts -~~~~~~~~~~ -Example: - -.. code-block:: nimrod - cast[int](x) - -Type casts are a crude mechanism to interpret the bit pattern of -an expression as if it would be of another type. Type casts are -only needed for low-level programming and are inherently unsafe. - - -The addr operator -~~~~~~~~~~~~~~~~~ -The `addr` operator returns the address of an l-value. If the -type of the location is ``T``, the `addr` operator result is -of the type ``ptr T``. Taking the address of an object that resides -on the stack is **unsafe**, as the pointer may live longer than the -object on the stack and can thus reference a non-existing object. - - -Procedures -~~~~~~~~~~ -What most programming languages call `methods`:idx: or `functions`:idx: are -called `procedures`:idx: in Nimrod (which is the correct terminology). A -procedure declaration defines an identifier and associates it with a block -of code. A procedure may call itself recursively. The syntax is:: - - param ::= symbol (comma symbol)* [comma] ':' typeDesc - paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc] - - genericParam ::= symbol [':' typeDesc] - genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' - - procDecl ::= 'proc' symbol ['*'] [genericParams] paramList [pragma] - ['=' stmt] - -If the ``= stmt`` part is missing, it is a `forward`:idx: declaration. If -the proc returns a value, the procedure body can access an implicit declared -variable named `result`:idx: that represents the return value. Procs can be -overloaded. The overloading resolution algorithm tries to find the proc that is -the best match for the arguments. A parameter may be given a default value that -is used if the caller does not provide a value for this parameter. Example: - -.. code-block:: nimrod - - proc toLower(c: Char): Char = # toLower for characters - if c in {'A'..'Z'}: - result = chr(ord(c) + (ord('a') - ord('A'))) - else: - result = c - - proc toLower(s: string): string = # toLower for strings - result = newString(len(s)) - for i in 0..len(s) - 1: - result[i] = toLower(s[i]) # calls toLower for characters; no recursion! - -Calling a procedure can be done in many different ways: - -.. code-block:: nimrod - proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ... - - # call with positional arguments # parameter bindings: - callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true) - # call with named and positional arguments: - callme(y=1, x=0, "abd", '\t') # (x=0, y=1, s="abd", c='\t', b=false) - # call with named arguments (order is not relevant): - callme(c='\t', y=1, x=0) # (x=0, y=1, s="", c='\t', b=false) - # call as a command statement: no () needed: - callme 0, 1, "abc", '\t' - - -A procedure cannot modify its parameters (unless the parameters have the -type `var`). - -`Operators`:idx: are procedures with a special operator symbol as identifier: - -.. code-block:: nimrod - proc `$` (x: int): string = - # converts an integer to a string; this is a prefix operator. - return intToStr(x) - -Operators with one parameter are prefix operators, operators with two -parameters are infix operators. (However, the parser distinguishes these from -the operators position within an expression.) There is no way to declare -postfix operators: All postfix operators are built-in and handled by the -grammar explicitely. - -Any operator can be called like an ordinary proc with the '`opr`' -notation. (Thus an operator can have more than two parameters): - -.. code-block:: nimrod - proc `*+` (a, b, c: int): int = - # Multiply and add - return a * b + c - - assert `*+`(3, 4, 6) == `*`(a, `+`(b, c)) - - - -Var parameters -~~~~~~~~~~~~~~ -The type of a parameter may be prefixed with the ``var`` keyword: - -.. code-block:: nimrod - proc divmod(a, b: int, res, remainder: var int) = - res = a div b - remainder = a mod b - - var - x, y: int - - divmod(8, 5, x, y) # modifies x and y - assert x == 1 - assert y == 3 - -In the example, ``res`` and ``remainder`` are `var parameters`. -Var parameters can be modified by the procedure and the changes are -visible to the caller. The argument passed to a var parameter has to be -an l-value. Var parameters are implemented as hidden pointers. The -above example is equivalent to: - -.. code-block:: nimrod - proc divmod(a, b: int, res, remainder: ptr int) = - res^ = a div b - remainder^ = a mod b - - var - x, y: int - divmod(8, 5, addr(x), addr(y)) - assert x == 1 - assert y == 3 - -In the examples, var parameters or pointers are used to provide two -return values. This can be done in a cleaner way by returning a tuple: - -.. code-block:: nimrod - proc divmod(a, b: int): tuple[res, remainder: int] = - return (a div b, a mod b) - - var t = divmod(8, 5) - assert t.res == 1 - assert t.remainder = 3 - -Even more elegant is to use `tuple unpacking`:idx: to access the tuple's fields: - -.. code-block:: nimrod - var (x, y) = divmod(8, 5) # tuple unpacking - assert x == 1 - assert y == 3 - - - -Iterators and the for statement -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Syntax:: - - forStmt ::= 'for' symbol (comma symbol)* [comma] 'in' expr ['..' expr] ':' stmt - - param ::= symbol (comma symbol)* [comma] ':' typeDesc - paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc] - - genericParam ::= symbol [':' typeDesc] - genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' - - iteratorDecl ::= 'iterator' symbol ['*'] [genericParams] paramList [pragma] - ['=' stmt] - -The `for`:idx: statement is an abstract mechanism to iterate over the elements -of a container. It relies on an `iterator`:idx: to do so. Like ``while`` -statements, ``for`` statements open an `implicit block`:idx:, so that they -can be left with a ``break`` statement. The ``for`` loop declares -iteration variables (``x`` in the example) - their scope reaches until the -end of the loop body. The iteration variables' types are inferred by the -return type of the iterator. - -An iterator is similar to a procedure, except that it is always called in the -context of a ``for`` loop. Iterators provide a way to specify the iteration over -an abstract type. A key role in the execution of a ``for`` loop plays the -``yield`` statement in the called iterator. Whenever a ``yield`` statement is -reached the data is bound to the ``for`` loop variables and control continues -in the body of the ``for`` loop. The iterator's local variables and execution -state are automatically saved between calls. Example: - -.. code-block:: nimrod - # this definition exists in the system module - iterator items*(a: string): char {.inline.} = - var i = 0 - while i < len(a): - yield a[i] - inc(i) - - for ch in items("hello world"): # `ch` is an iteration variable - echo(ch) - -The compiler generates code as if the programmer would have written this: - -.. code-block:: nimrod - var i = 0 - while i < len(a): - var ch = a[i] - echo(ch) - inc(i) - -The current implementation always inlines the iterator code leading to zero -overhead for the abstraction. But this may increase the code size. Later -versions of the compiler will only inline iterators which have the calling -convention ``inline``. - -If the iterator yields a tuple, there have to be as many iteration variables -as there are components in the tuple. The i'th iteration variable's type is -the one of the i'th component. - - -Type sections -~~~~~~~~~~~~~ - -Syntax:: - - typeDef ::= typeDesc | objectDef | enumDef - - genericParam ::= symbol [':' typeDesc] - genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' - - typeDecl ::= COMMENT - | symbol ['*'] [genericParams] ['=' typeDef] [COMMENT|IND COMMENT] - - typeSection ::= 'type' indPush typeDecl (SAD typeDecl)* DED indPop - - -Example: - -.. code-block:: nimrod - type # example demonstrates mutually recursive types - PNode = ref TNode # a traced pointer to a TNode - TNode = object - le, ri: PNode # left and right subtrees - sym: ref TSym # leaves contain a reference to a TSym - - TSym = object # a symbol - name: string # the symbol's name - line: int # the line the symbol was declared in - code: PNode # the symbol's abstract syntax tree - -A `type`:idx: section begins with the ``type`` keyword. It contains multiple -type definitions. A type definition binds a type to a name. Type definitions -can be recursive or even mutually recursive. Mutually recursive types are only -possible within a single ``type`` section. - - -Generics -~~~~~~~~ - -Example: - -.. code-block:: nimrod - type - TBinaryTree[T] = object # TBinaryTree is a generic type with - # with generic param ``T`` - le, ri: ref TBinaryTree[T] # left and right subtrees; may be nil - data: T # the data stored in a node - PBinaryTree[T] = ref TBinaryTree[T] # a shorthand for notational convenience - - proc newNode[T](data: T): PBinaryTree[T] = # constructor for a node - new(result) - result.dat = data - - proc add[T](root: var PBinaryTree[T], n: PBinaryTree[T]) = - if root == nil: - root = n - else: - var it = root - while it != nil: - var c = cmp(it.data, n.data) # compare the data items; uses - # the generic ``cmd`` proc that works for - # any type that has a ``==`` and ``<`` - # operator - if c < 0: - if it.le == nil: - it.le = n - return - it = it.le - else: - if it.ri == nil: - it.ri = n - return - it = it.ri - - iterator inorder[T](root: PBinaryTree[T]): T = - # inorder traversal of a binary tree - # recursive iterators are not yet implemented, so this does not work in - # the current compiler! - if root.le != nil: yield inorder(root.le) - yield root.data - if root.ri != nil: yield inorder(root.ri) - - var - root: PBinaryTree[string] # instantiate a PBinaryTree with the type string - add(root, newNode("hallo")) # instantiates generic procs ``newNode`` and - add(root, newNode("world")) # ``add`` - for str in inorder(root): - writeln(stdout, str) - -`Generics`:idx: are Nimrod's means to parametrize procs, iterators or types with -`type parameters`:idx:. Depending on context, the brackets are used either to -introduce type parameters or to instantiate a generic proc, iterator or type. - - -Templates -~~~~~~~~~ - -A `template`:idx: is a simple form of a macro: It is a simple substitution -mechanism that operates on Nimrod's abstract syntax trees. It is processed in -the semantic pass of the compiler. - -The syntax to *invoke* a template is the same as calling a procedure. - -Example: - -.. code-block:: nimrod - template `!=` (a, b: expr): expr = - # this definition exists in the System module - not (a == b) - - assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6)) - -The ``!=``, ``>``, ``>=``, ``in``, ``notin``, ``isnot`` operators are in fact -templates: - -| ``a > b`` is transformed into ``b < a``. -| ``a in b`` is transformed into ``contains(b, a)``. -| ``notin`` and ``isnot`` have the obvious meanings. - -The "types" of templates can be the symbols ``expr`` (stands for *expression*), -``stmt`` (stands for *statement*) or ``typedesc`` (stands for *type -description*). These are no real types, they just help the compiler parsing. -Real types can be used too; this implies that expressions are expected. -However, for parameter type checking the arguments are semantically checked -before being passed to the template. Other arguments are not semantically -checked before being passed to the template. - -The template body does not open a new scope. To open a new scope a ``block`` -statement can be used: - -.. code-block:: nimrod - template declareInScope(x: expr, t: typeDesc): stmt = - var x: t - - template declareInNewScope(x: expr, t: typeDesc): stmt = - # open a new scope: - block: - var x: t - - declareInScope(a, int) - a = 42 # works, `a` is known here - - declareInNewScope(b, int) - b = 42 # does not work, `b` is unknown - - -If there is a ``stmt`` parameter it should be the last in the template -declaration, because statements are passed to a template via a -special ``:`` syntax: - -.. code-block:: nimrod - - template withFile(f, fn, mode: expr, actions: stmt): stmt = - block: - var f: TFile - if open(f, fn, mode): - try: - actions - finally: - close(f) - else: - quit("cannot open: " & fn) - - withFile(txt, "ttempl3.txt", fmWrite): - txt.writeln("line 1") - txt.writeln("line 2") - -In the example the two ``writeln`` statements are bound to the ``actions`` -parameter. - - -**Style note**: For code readability, it is the best idea to use the least -powerful programming construct that still suffices. So the "check list" is: - -(1) Use an ordinary proc/iterator, if possible. -(2) Else: Use a generic proc/iterator, if possible. -(3) Else: Use a template, if possible. -(4) Else: Use a macro. - - -Macros ------- - -`Macros`:idx: are the most powerful feature of Nimrod. They can be used -to implement `domain specific languages`:idx:. - -While macros enable advanced compile-time code tranformations, they -cannot change Nimrod's syntax. However, this is no real restriction because -Nimrod's syntax is flexible enough anyway. - -To write macros, one needs to know how the Nimrod concrete syntax is converted -to an abstract syntax tree. - -There are two ways to invoke a macro: -(1) invoking a macro like a procedure call (`expression macros`) -(2) invoking a macro with the special ``macrostmt`` syntax (`statement macros`) - - -Expression Macros -~~~~~~~~~~~~~~~~~ - -The following example implements a powerful ``debug`` command that accepts a -variable number of arguments: - -.. code-block:: nimrod - # to work with Nimrod syntax trees, we need an API that is defined in the - # ``macros`` module: - import macros - - macro debug(n: expr): stmt = - # `n` is a Nimrod AST that contains the whole macro invokation - # this macro returns a list of statements: - result = newNimNode(nnkStmtList, n) - # iterate over any argument that is passed to this macro: - for i in 1..n.len-1: - # add a call to the statement list that writes the expression; - # `toStrLit` converts an AST to its string representation: - add(result, newCall("write", newIdentNode("stdout"), toStrLit(n[i]))) - # add a call to the statement list that writes ": " - add(result, newCall("write", newIdentNode("stdout"), newStrLitNode(": "))) - # add a call to the statement list that writes the expressions value: - add(result, newCall("writeln", newIdentNode("stdout"), n[i])) - - var - a: array [0..10, int] - x = "some string" - a[0] = 42 - a[1] = 45 - - debug(a[0], a[1], x) - -The macro call expands to: - -.. code-block:: nimrod - write(stdout, "a[0]") - write(stdout, ": ") - writeln(stdout, a[0]) - - write(stdout, "a[1]") - write(stdout, ": ") - writeln(stdout, a[1]) - - write(stdout, "x") - write(stdout, ": ") - writeln(stdout, x) - - -Statement Macros -~~~~~~~~~~~~~~~~ - -Statement macros are defined just as expression macros. However, they are -invoked by an expression following a colon:: - - exprStmt ::= lowestExpr ['=' expr | [expr (comma expr)* [comma]] [macroStmt]] - macroStmt ::= ':' [stmt] ('of' [sliceExprList] ':' stmt - | 'elif' expr ':' stmt - | 'except' exceptList ':' stmt )* - ['else' ':' stmt] - -The following example outlines a macro that generates a lexical analyser from -regular expressions: - -.. code-block:: nimrod - import macros - - macro case_token(n: stmt): stmt = - # creates a lexical analyser from regular expressions - # ... (implementation is an exercise for the reader :-) - nil - - case_token: # this colon tells the parser it is a macro statement - of r"[A-Za-z_]+[A-Za-z_0-9]*": - return tkIdentifier - of r"0-9+": - return tkInteger - of r"[\+\-\*\?]+": - return tkOperator - else: - return tkUnknown - - - -Modules -------- -Nimrod supports splitting a program into pieces by a `module`:idx: concept. -Each module needs to be in its own file. Modules enable -`information hiding`:idx: and `separate compilation`:idx:. A module may gain -access to symbols of another module by the `import`:idx: statement. -`Recursive module dependancies`:idx: are allowed, but slightly subtle. Only -top-level symbols that are marked with an asterisk (``*``) are exported. - -The algorithm for compiling modules is: - -- Compile the whole module as usual, following import statements recursively -- if there is a cycle only import the already parsed symbols (that are - exported); if an unknown identifier occurs then abort - -This is best illustrated by an example: - -.. code-block:: nimrod - # Module A - type - T1* = int # Module A exports the type ``T1`` - import B # the compiler starts parsing B - - proc main() = - var i = p(3) # works because B has been parsed completely here - - main() - - -.. code-block:: nimrod - # Module B - import A # A is not parsed here! Only the already known symbols - # of A are imported. - - proc p*(x: A.T1): A.T1 = - # this works because the compiler has already - # added T1 to A's interface symbol table - return x + 1 - - -Scope rules ------------ -Identifiers are valid from the point of their declaration until the end of -the block in which the declaration occurred. The range where the identifier -is known is the `scope`:idx: of the identifier. The exact scope of an -identifier depends on the way it was declared. - -Block scope -~~~~~~~~~~~ -The *scope* of a variable declared in the declaration part of a block -is valid from the point of declaration until the end of the block. If a -block contains a second block, in which the identifier is redeclared, -then inside this block, the second declaration will be valid. Upon -leaving the inner block, the first declaration is valid again. An -identifier cannot be redefined in the same block, except if valid for -procedure or iterator overloading purposes. - - -Tuple or object scope -~~~~~~~~~~~~~~~~~~~~~ -The field identifiers inside a tuple or object definition are valid in the -following places: - -* To the end of the tuple/object definition. -* Field designators of a variable of the given tuple/object type. -* In all descendent types of the object type. - -Module scope -~~~~~~~~~~~~ -All identifiers of a module are valid from the point of declaration until -the end of the module. Identifiers from indirectly dependent modules are *not* -available. The `system`:idx: module is automatically imported in every other -module. - -If a module imports an identifier by two different modules, each occurance of -the identifier has to be qualified, unless it is an overloaded procedure or -iterator in which case the overloading resolution takes place: - -.. code-block:: nimrod - # Module A - var x*: string - -.. code-block:: nimrod - # Module B - var x*: int - -.. code-block:: nimrod - # Module C - import A, B - write(stdout, x) # error: x is ambiguous - write(stdout, A.x) # no error: qualifier used - - var x = 4 - write(stdout, x) # not ambiguous: uses the module C's x - - -Messages -======== - -The Nimrod compiler emits different kinds of messages: `hint`:idx:, -`warning`:idx:, and `error`:idx: messages. An *error* message is emitted if -the compiler encounters any static error. - -Pragmas -======= - -Syntax:: - - colonExpr ::= expr [':' expr] - colonExprList ::= [colonExpr (comma colonExpr)* [comma]] - - pragma ::= '{.' optInd (colonExpr [comma])* [SAD] ('.}' | '}') - -Pragmas are Nimrod's method to give the compiler additional information/ -commands without introducing a massive number of new keywords. Pragmas are -processed on the fly during semantic checking. Pragmas are enclosed in the -special ``{.`` and ``.}`` curly brackets. - - -noSideEffect pragma -------------------- -The `noSideEffect`:idx: pragma is used to mark a proc/iterator to have no side -effects. This means that the proc/iterator only changes locations that are -reachable from its parameters and the return value only depends on the -arguments. If none of its parameters have the type ``var T`` -or ``ref T`` or ``ptr T`` this means no locations are modified. It is a static -error to mark a proc/iterator to have no side effect if the compiler cannot -verify this. - - -compileTime pragma ------------------- -The `compileTime`:idx: pragma is used to mark a proc to be used at compile -time only. No code will be generated for it. Compile time procs are useful -as helpers for macros. - - -error pragma ------------- -The `error`:idx: pragma is used to make the compiler output an error message -with the given content. Compilation currently aborts after an error, but this -may be changed in later versions. - - -fatal pragma ------------- -The `fatal`:idx: pragma is used to make the compiler output an error message -with the given content. In contrast to the ``error`` pragma, compilation -is guaranteed to be aborted by this pragma. - -warning pragma --------------- -The `warning`:idx: pragma is used to make the compiler output a warning message -with the given content. Compilation continues after the warning. - -hint pragma ------------ -The `hint`:idx: pragma is used to make the compiler output a hint message with -the given content. Compilation continues after the hint. - - -compilation option pragmas --------------------------- -The listed pragmas here can be used to override the code generation options -for a section of code. - -The implementation currently provides the following possible options (later -various others may be added). - -=============== =============== ============================================ -pragma allowed values description -=============== =============== ============================================ -checks on|off Turns the code generation for all runtime - checks on or off. -bound_checks on|off Turns the code generation for array bound - checks on or off. -overflow_checks on|off Turns the code generation for over- or - underflow checks on or off. -nil_checks on|off Turns the code generation for nil pointer - checks on or off. -assertions on|off Turns the code generation for assertions - on or off. -warnings on|off Turns the warning messages of the compiler - on or off. -hints on|off Turns the hint messages of the compiler - on or off. -optimization none|speed|size Optimize the code for speed or size, or - disable optimization. -callconv cdecl|... Specifies the default calling convention for - all procedures (and procedure types) that - follow. -=============== =============== ============================================ - -Example: - -.. code-block:: nimrod - {.checks: off, optimization: speed.} - # compile without runtime checks and optimize for speed - - -push and pop pragmas --------------------- -The `push/pop`:idx: pragmas are very similar to the option directive, -but are used to override the settings temporarily. Example: - -.. code-block:: nimrod - {.push checks: off.} - # compile this section without runtime checks as it is - # speed critical - # ... some code ... - {.pop.} # restore old settings |