diff options
Diffstat (limited to 'doc/manual.txt')
-rw-r--r-- | doc/manual.txt | 565 |
1 files changed, 399 insertions, 166 deletions
diff --git a/doc/manual.txt b/doc/manual.txt index cd982302f..04b8bb97b 100644 --- a/doc/manual.txt +++ b/doc/manual.txt @@ -11,6 +11,10 @@ Nimrod Manual About this document =================== +**Note**: This document is a draft! Several of Nimrod's features need more +precise wording. This manual will evolve into a proper specification some +day. + This document describes the lexis, the syntax, and the semantics of Nimrod. The language constructs are explained using an extended BNF, in @@ -18,10 +22,11 @@ which ``(a)*`` means 0 or more ``a``'s, ``a+`` means 1 or more ``a``'s, and ``(a)?`` means an optional *a*; an alternative spelling for optional parts is ``[a]``. The ``|`` symbol is used to mark alternatives and has the lowest precedence. Parentheses may be used to group elements. -Non-terminals are in lowercase, terminal symbols (including keywords) are in -UPPERCASE. An example:: +Non-terminals start with a lowercase letter, abstract terminal symbols are in +UPPERCASE. Verbatim terminal symbols (including keywords) are quoted +with ``'``. An example:: - if_stmt ::= IF expr COLON stmts (ELIF expr COLON stmts)* [ELSE stmts] + ifStmt ::= 'if' expr ':' stmts ('elif' expr ':' stmts)* ['else' stmts] Other parts of Nimrod - like scoping rules or runtime semantics are only described in an informal manner. The reason is that formal semantics are @@ -90,8 +95,7 @@ Indentation consists only of spaces; tabulators are not allowed. The terminals ``IND`` (indentation), ``DED`` (dedentation) and ``SAD`` (same indentation) are generated by the scanner, denoting an indentation. -These terminals are only generated for lines that are not empty or contain -only whitespace and comments. +These terminals are only generated for lines that are not empty. The parser and the scanner communicate over a stack which indentation terminal should be generated: The stack consists of integers counting the spaces. The @@ -100,14 +104,17 @@ If the current indentation token consists of more spaces than the entry at the top of the stack, a ``IND`` token is generated, else if it consists of the same number of spaces, a ``SAD`` token is generated. If it consists of fewer spaces, a ``DED`` token is generated for any item on the stack that is greater than the -current. These items are then popped from the stack by the scanner. At the end +current. These items are later popped from the stack by the parser. At the end of the file, a ``DED`` token is generated for each number remaining on the stack that is larger than zero. Because the grammar contains some optional ``IND`` tokens, the scanner cannot push new indentation levels. This has to be done by the parser. The symbol ``indPush`` indicates that an ``IND`` token is expected; the current number of -leading spaces is pushed onto the stack by the parser. +leading spaces is pushed onto the stack by the parser. The symbol ``indPop`` +denotes that the parser pops an item from the indentation stack. No token is +consumed by ``indPop``. + Comments -------- @@ -131,8 +138,8 @@ aligned to the preceding one, it does not start a new comment: Comments are tokens; they are only allowed at certain places in the input file as they belong to the syntax tree! This feature enables perfect source-to-source transformations (such as pretty-printing) and superior documentation generators. -A side-effect is that the human reader of the code always knows exactly which -code snippet the comment refers to. +A nice side-effect is that the human reader of the code always knows exactly +which code snippet the comment refers to. Identifiers & Keywords @@ -159,9 +166,9 @@ case-sensitive and even underscores are ignored: **type** is a reserved word, and so is **TYPE** or **T_Y_P_E**. The idea behind this is that this allows programmers to use their own prefered spelling style and libraries written by different programmers cannot use incompatible -conventions. The editors or IDE can show the identifiers as preferred. Another -advantage is that it frees the programmer from remembering the exact spelling -of an identifier. +conventions. A Nimrod-aware editor or IDE can show the identifiers as +preferred. Another advantage is that it frees the programmer from remembering +the exact spelling of an identifier. Literal strings @@ -174,7 +181,7 @@ contain the following `escape sequences`:idx:\ : Escape sequence Meaning ================== =================================================== ``\n`` `newline`:idx: - ``\r`` `carriage return`:idx: + ``\r``, ``\c`` `carriage return`:idx: ``\l`` `line feed`:idx: ``\f`` `form feed`:idx: ``\t`` `tabulator`:idx: @@ -184,8 +191,7 @@ contain the following `escape sequences`:idx:\ : ``\'`` `apostrophe`:idx: ``\d+`` `character with decimal value d`:idx:; all decimal digits directly - following are used for the - character + following are used for the character ``\a`` `alert`:idx: ``\b`` `backspace`:idx: ``\e`` `escape`:idx: `[ESC]`:idx: @@ -194,15 +200,14 @@ contain the following `escape sequences`:idx:\ : ================== =================================================== -Strings in Nimrod may contain any 8-bit value, except embedded zeros -which are not allowed for compability with `C`:idx:. +Strings in Nimrod may contain any 8-bit value, except embedded zeros. Literal strings can also be delimited by three double squotes ``"""`` ... ``"""``. Literals in this form may run for several lines, may contain ``"`` and do not interpret any escape sequences. -For convenience, when the opening ``"""`` is immediately -followed by a newline, the newline is not included in the string. +For convenience, when the opening ``"""`` is immediately followed by a newline, +the newline is not included in the string. There are also `raw string literals` that are preceded with the letter ``r`` (or ``R``) and are delimited by matching double quotes (just like ordinary string literals) and do not interpret the escape sequences. This is especially @@ -253,8 +258,8 @@ Numerical constants As can be seen in the productions, numerical constants can contain unterscores for readability. Integer and floating point literals may be given in decimal (no -prefix), binary (prefix ``0b``), octal (prefix ``0o``) and -hexadecimal (prefix ``0x``) notation. +prefix), binary (prefix ``0b``), octal (prefix ``0o``) and hexadecimal +(prefix ``0x``) notation. There exists a literal for each numerical type that is defined. The suffix starting with an apostophe ('\'') is called a @@ -262,7 +267,7 @@ defined. The suffix starting with an apostophe ('\'') is called a unless the literal contains a dot or an ``E`` in which case it is of type ``float``. -The following table specifies type suffixes: +The type suffixes are: ================= ========================= Type Suffix Resulting type of literal @@ -295,11 +300,11 @@ the three tokens `{`:tok:, `..`:tok:, `}`:tok: and not the two tokens `{.`:tok:, `.}`:tok:. In Nimrod one can define his own operators. An `operator`:idx: is any -combination of the following characters that are not listed above:: +combination of the following characters that is not listed above:: + - * / < > = @ $ ~ & % - ! ? ^ . | + ! ? ^ . | \ These keywords are also operators: ``and or not xor shl shr div mod in notin is isnot``. @@ -348,16 +353,13 @@ Constants cannot change. The compiler must be able to evaluate the expression in a constant declaration at compile time. -.. - Nimrod contains a sophisticated - compile-time evaluator, so procedures declared with the ``{.noSideEffect.}`` - pragma can be used in constant expressions: - - .. code-block:: nimrod +Nimrod contains a sophisticated compile-time evaluator, so procedures which +have no side-effect can be used in constant expressions too: - from strutils import findSubStr - const - x = findSubStr('a', "hallo") # x is 1; this is computed at compile time! +.. code-block:: nimrod + import strutils + const + constEval = contains("abc", 'b') # computed at compile time! Types @@ -414,8 +416,8 @@ intXX There are no `unsigned integer`:idx: types, only `unsigned operations`:idx: that treat their arguments as unsigned. Unsigned operations all wrap around; -they may not lead to over- or underflow errors. Unsigned operations use the -``%`` postfix as convention: +they cannot lead to over- or underflow errors. Unsigned operations use the +``%`` suffix as convention: ====================== ====================================================== operation meaning @@ -453,7 +455,7 @@ floatXX implementation supports ``float32`` and ``float64``. Literals of these types have the suffix 'fXX. -`Automatic type conversion`:idx: is performed in expressions where different +`Automatic type conversion`:idx: is performed in expressions where different kinds of integer types are used. However, if the type conversion loses information, the `EOutOfRange`:idx: exception is raised (if the error cannot be detected at compile time). @@ -498,16 +500,15 @@ the resulting programs will still handle UTF-8 properly as UTF-8 was specially designed for this. Another reason is that Nimrod can support ``array[char, int]`` or ``set[char]`` efficiently as many algorithms rely on this feature. The -`TUniChar` type is used for Unicode characters, it can represent any Unicode -character. ``TUniChar`` is declared the ``unicode`` standard module. +`TRune` type is used for Unicode characters, it can represent any Unicode +character. ``TRune`` is declared the ``unicode`` module. Enumeration types ~~~~~~~~~~~~~~~~~ -`Enumeration`:idx: types define a new type whose values consist only of the ones -specified. -The values are ordered by the order in enum's declaration. Example: +`Enumeration`:idx: types define a new type whose values consist of the ones +specified. The values are ordered. Example: .. code-block:: nimrod @@ -528,8 +529,8 @@ with enumeration types. For better interfacing to other programming languages, the fields of enum types can be assigned an explicit ordinal value. However, the ordinal values -have to be in ascending order. A field whose ordinal value that is not -explicitly given, is assigned the value of the previous field + 1. +have to be in ascending order. A field whose ordinal value is not +explicitly given is assigned the value of the previous field + 1. An explicit ordered enum can have *wholes*: @@ -545,7 +546,7 @@ and ``pred`` are not available for them either. Subrange types ~~~~~~~~~~~~~~ -A `subrange`:idx: type is a range of values from an ordinal type (the host +A `subrange`:idx: type is a range of values from an ordinal type (the base type). To define a subrange type, one must specify it's limiting values: the highest and lowest value of the type: @@ -566,7 +567,7 @@ A subrange type has the same size as its base type (``int`` in the example). String type ~~~~~~~~~~~ All string literals are of the type `string`:idx:. A string in Nimrod is very -similar to a sequence of characters. However, strings in Nimrod both are +similar to a sequence of characters. However, strings in Nimrod are both zero-terminated and have a length field. One can retrieve the length with the builtin ``len`` procedure; the length never counts the terminating zero. The assignment operator for strings always copies the string. @@ -585,7 +586,7 @@ arrays, they can be used in case statements: Per convention, all strings are UTF-8 strings, but this is not enforced. For example, when reading strings from binary files, they are merely a sequence of bytes. The index operation ``s[i]`` means the i-th *char* of ``s``, not the -i-th *unichar*. The iterator ``unichars`` from the ``unicode`` standard +i-th *unichar*. The iterator ``runes`` from the ``unicode`` module can be used for iteration over all unicode characters. @@ -611,9 +612,7 @@ constructed by the array constructor ``[]`` in conjunction with the array to sequence operator ``@``. Another way to allocate space for a sequence is to call the built-in ``newSeq`` procedure. -A sequence may be passed to a parameter that is of type *open array*, but -not to a multi-dimensional open array, because it is impossible to do so in an -efficient manner. +A sequence may be passed to a parameter that is of type *open array*. Example: @@ -633,18 +632,36 @@ The lower bound of an array or sequence may be received by the built-in proc received by ``len()``. ``low()`` for a sequence or an open array always returns 0, as this is the first valid index. -The notation ``x[i]`` can be used to access the i-th element of ``x``. +The notation ``x[i]`` can be used to access the i-th element of ``x``. Arrays are always bounds checked (at compile-time or at runtime). These checks can be disabled via pragmas or invoking the compiler with the ``--bound_checks:off`` command line switch. +An open array is also a means to implement passing a variable number of +arguments to a procedure. The compiler converts the list of arguments +to an array automatically: + +.. code-block:: nimrod + proc myWriteln(f: TFile, a: openarray[string]) = + for s in items(a): + write(f, s) + write(f, "\n") + + myWriteln(stdout, "abc", "def", "xyz") + # is transformed by the compiler to: + myWriteln(stdout, ["abc", "def", "xyz"]) + +This transformation is only done if the openarray parameter is the +last parameter in the procedure header. The current implementation does not +support nested open arrays. + Tuples and object types ~~~~~~~~~~~~~~~~~~~~~~~ A variable of a `tuple`:idx: or `object`:idx: type is a heterogenous storage container. -A tuple or object defines various named *fields* of a type. A tuple also +A tuple or object defines various named *fields* of a type. A tuple also defines an *order* of the fields. Tuples are meant for heterogenous storage types with no overhead and few abstraction possibilities. The constructor ``()`` can be used to construct tuples. The order of the fields in the constructor @@ -691,7 +708,7 @@ the ``is`` operator can be used to determine the object's type. person: TPerson assert(student is TStudent) # is true -Object fields that should be visible outside from the defining module, have to +Object fields that should be visible from outside the defining module, have to marked by ``*``. In contrast to tuples, different object types are never *equivalent*. @@ -730,7 +747,7 @@ An example: new(n) # creates a new node n.kind = nkFloat n.floatVal = 0.0 # valid, because ``n.kind==nkFloat``, so that it fits - + # the following statement raises an `EInvalidField` exception, because # n.kind's value does not fit: n.strVal = "" @@ -783,8 +800,8 @@ point to and modify the same location in memory. Nimrod distinguishes between `traced`:idx: and `untraced`:idx: references. Untraced references are also called *pointers*. Traced references point to -objects of a garbage collected heap, untraced references point to -manually allocated objects or to objects somewhere else in memory. Thus +objects of a garbage collected heap, untraced references point to +manually allocated objects or to objects somewhere else in memory. Thus untraced references are *unsafe*. However for certain low-level operations (accessing the hardware) untraced references are unavoidable. @@ -817,7 +834,7 @@ To deal with untraced memory, the procedures ``alloc``, ``dealloc`` and ``realloc`` can be used. The documentation of the system module contains further information. -If a reference points to *nothing*, it has the value ``nil``. +If a reference points to *nothing*, it has the value ``nil``. Special care has to be taken if an untraced object contains traced objects like traced references, strings or sequences: In order to free everything properly, @@ -904,9 +921,9 @@ Most calling conventions exist only for the Windows 32-bit platform. -Statements ----------- -Nimrod uses the common statement/expression paradigma: `Statements`:idx: do not +Statements and expressions +-------------------------- +Nimrod uses the common statement/expression paradigm: `Statements`:idx: do not produce a value in contrast to expressions. Call expressions are statements. If the called procedure returns a value, it is not a valid statement as statements do not produce values. To evaluate an expression for @@ -943,7 +960,7 @@ Discard statement Syntax:: - discardStmt ::= DISCARD expr + discardStmt ::= 'discard' expr Example: @@ -962,11 +979,13 @@ Var statement Syntax:: - colonOrEquals ::= COLON typeDesc [EQUALS expr] | EQUALS expr - varField ::= symbol ["*"] [pragma] + colonOrEquals ::= ':' typeDesc ['=' expr] | '=' expr + varField ::= symbol ['*'] [pragma] varPart ::= symbol (comma symbol)* [comma] colonOrEquals [COMMENT | IND COMMENT] - varSection ::= VAR (varPart - | indPush (COMMENT|varPart) (SAD (COMMENT|varPart))* DED) + varSection ::= 'var' (varPart + | indPush (COMMENT|varPart) + (SAD (COMMENT|varPart))* DED indPop) + `Var`:idx: statements declare new local and global variables and initialize them. A comma seperated list of variables can be used to specify @@ -992,7 +1011,7 @@ char '\0' bool false ref or pointer type nil procedural type nil -sequence nil +sequence nil (**not** ``@[]``) string nil (**not** "") tuple[x: A, y: B, ...] (default(A), default(B), ...) (analogous for objects) @@ -1007,12 +1026,12 @@ Const section Syntax:: - colonAndEquals ::= [COLON typeDesc] EQUALS expr - constDecl ::= CONST - indPush - symbol ["*"] [pragma] colonAndEquals - (SAD symbol ["*"] [pragma] colonAndEquals)* - DED + colonAndEquals ::= [':' typeDesc] '=' expr + + constDecl ::= symbol ['*'] [pragma] colonAndEquals [COMMENT | IND COMMENT] + | COMMENT + constSection ::= 'const' indPush constDecl (SAD constDecl)* DED indPop + Example: @@ -1031,7 +1050,7 @@ If statement Syntax:: - ifStmt ::= IF expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] + ifStmt ::= 'if' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt] Example: @@ -1061,9 +1080,9 @@ Case statement Syntax:: - caseStmt ::= CASE expr (OF sliceList COLON stmt)* - (ELIF expr COLON stmt)* - [ELSE COLON stmt] + caseStmt ::= 'case' expr ('of' sliceExprList ':' stmt)* + ('elif' expr ':' stmt)* + ['else' ':' stmt] Example: @@ -1078,9 +1097,9 @@ Example: The `case`:idx: statement is similar to the if statement, but it represents a multi-branch selection. The expression after the keyword ``case`` is evaluated and if its value is in a *vallist* the corresponding statements -(after the ``of`` keyword) are executed. If the value is not in any -given *slicelist* the ``else`` part is executed. If there is no ``else`` -part and not all possible values that ``expr`` can hold occur in a ``vallist``, +(after the ``of`` keyword) are executed. If the value is not in any +given *slicelist* the ``else`` part is executed. If there is no ``else`` +part and not all possible values that ``expr`` can hold occur in a ``vallist``, a static error is given. This holds only for expressions of ordinal types. If the expression is not of an ordinal type, and no ``else`` part is given, control just passes after the ``case`` statement. @@ -1094,7 +1113,7 @@ When statement Syntax:: - whenStmt ::= WHEN expr COLON stmt (ELIF expr COLON stmt)* [ELSE COLON stmt] + whenStmt ::= 'when' expr ':' stmt ('elif' expr ':' stmt)* ['else' ':' stmt] Example: @@ -1116,8 +1135,7 @@ exceptions: * The statements do not open a new scope if they introduce new identifiers. * The statements that belong to the expression that evaluated to true are translated by the compiler, the other statements are not checked for - syntax or semantics at all! This holds also for any ``expr`` coming - after the expression that evaluated to true. + semantics! However, each ``expr`` is checked for semantics. The ``when`` statement enables conditional compilation techniques. As a special syntatic extension, the ``when`` construct is also available @@ -1129,7 +1147,7 @@ Raise statement Syntax:: - raiseStmt ::= RAISE [expr] + raiseStmt ::= 'raise' [expr] Example: @@ -1137,7 +1155,7 @@ Example: raise newEOS("operating system failed") Apart from built-in operations like array indexing, memory allocation, etc. -the ``raise`` statement is the only way to raise an exception. +the ``raise`` statement is the only way to raise an exception. .. XXX document this better! @@ -1152,11 +1170,11 @@ Try statement Syntax:: + qualifiedIdent ::= symbol ['.' symbol] exceptList ::= [qualifiedIdent (comma qualifiedIdent)* [comma]] - tryStmt ::= TRY COLON stmt - (EXCEPT exceptList COLON stmt)* - [FINALLY COLON stmt] - + tryStmt ::= 'try' ':' stmt + ('except' exceptList ':' stmt)* + ['finally' ':' stmt] Example: @@ -1176,6 +1194,8 @@ Example: echo("could not convert string to integer") except EIO: echo("IO error!") + except: + echo("Unknown exception!") finally: closeFile(f) @@ -1203,7 +1223,7 @@ Return statement Syntax:: - returnStmt ::= RETURN [expr] + returnStmt ::= 'return' [expr] Example: @@ -1219,12 +1239,13 @@ sugar for: return result ``return`` without an expression is a short notation for ``return result`` if -the proc has a return type. The `result`:idx: variable is always the return +the proc has a return type. The `result`:idx: variable is always the return value of the procedure. It is automatically declared by the compiler. As all variables, ``result`` is initialized to (binary) zero:: .. code-block:: nimrod - proc returnZero(): int = nil # implicitely returns 0 + proc returnZero(): int = + # implicitely returns 0 Yield statement @@ -1232,7 +1253,7 @@ Yield statement Syntax:: - yieldStmt ::= YIELD expr + yieldStmt ::= 'yield' expr Example: @@ -1252,7 +1273,7 @@ Block statement Syntax:: - blockStmt ::= BLOCK [symbol] COLON stmt + blockStmt ::= 'block' [symbol] ':' stmt Example: @@ -1277,7 +1298,7 @@ Break statement Syntax:: - breakStmt ::= BREAK [symbol] + breakStmt ::= 'break' [symbol] Example: @@ -1294,7 +1315,7 @@ While statement Syntax:: - whileStmt ::= WHILE expr COLON stmt + whileStmt ::= 'while' expr ':' stmt Example: @@ -1316,7 +1337,7 @@ Continue statement Syntax:: - continueStmt ::= CONTINUE + continueStmt ::= 'continue' A `continue`:idx: statement leads to the immediate next iteration of the surrounding loop construct. It is only allowed within a loop. A continue @@ -1340,7 +1361,7 @@ Assembler statement ~~~~~~~~~~~~~~~~~~~ Syntax:: - asmStmt ::= ASM [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT) + asmStmt ::= 'asm' [pragma] (STR_LIT | RSTR_LIT | TRIPLESTR_LIT) The direct embedding of `assembler`:idx: code into Nimrod code is supported by the unsafe ``asm`` statement. Identifiers in the assembler code that refer to @@ -1348,6 +1369,49 @@ Nimrod identifiers shall be enclosed in a special character which can be specified in the statement's pragmas. The default special character is ``'`'``. +If expression +~~~~~~~~~~~~~ + +An `if expression` is almost like an if statement, but it is an expression. +Example: + +.. code-block:: nimrod + p(if x > 8: 9 else: 10) + +An if expression always results in a value, so the ``else`` part is +required. ``Elif`` parts are also allowed (but unlikely to be good +style). + + +Type convertions +~~~~~~~~~~~~~~~~ +Syntactically a `type conversion` is like a procedure call, but a +type name replaces the procedure name. A type conversion is always +safe in the sense that a failure to convert a type to another +results in an exception (if it cannot be determined statically). + + +Type casts +~~~~~~~~~~ +Example: + +.. code-block:: nimrod + cast[int](x) + +Type casts are a crude mechanism to interpret the bit pattern of +an expression as if it would be of another type. Type casts are +only needed for low-level programming and are inherently unsafe. + + +The addr operator +~~~~~~~~~~~~~~~~~ +The `addr` operator returns the address of an l-value. If the +type of the location is ``T``, the `addr` operator result is +of the type ``ptr T``. Taking the address of an object that resides +on the stack is **unsafe**, as the pointer may live longer than the +object on the stack and can thus reference a non-existing object. + + Procedures ~~~~~~~~~~ What most programming languages call `methods`:idx: or `funtions`:idx: are @@ -1355,16 +1419,16 @@ called `procedures`:idx: in Nimrod (which is the correct terminology). A procedure declaration defines an identifier and associates it with a block of code. A procedure may call itself recursively. The syntax is:: - param ::= symbol (comma symbol)* [comma] COLON typeDesc - paramList ::= [PAR_LE [param (comma param)* [comma]] PAR_RI] [COLON typeDesc] - - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - - procDecl ::= PROC symbol ["*"] [genericParams] - paramList [pragma] - [EQUALS stmt] - -If the ``EQUALS stmt`` part is missing, it is a `forward`:idx: declaration. If + param ::= symbol (comma symbol)* [comma] ':' typeDesc + paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc] + + genericParam ::= symbol [':' typeDesc] + genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' + + procDecl ::= 'proc' symbol ['*'] [genericParams] paramList [pragma] + ['=' stmt] + +If the ``= stmt`` part is missing, it is a `forward`:idx: declaration. If the proc returns a value, the procedure body can access an implicit declared variable named `result`:idx: that represents the return value. Procs can be overloaded. The overloading resolution algorithm tries to find the proc that is @@ -1384,6 +1448,24 @@ is used if the caller does not provide a value for this parameter. Example: for i in 0..len(s) - 1: result[i] = toLower(s[i]) # calls toLower for characters; no recursion! +Calling a procedure can be done in many different ways: + +.. code-block:: nimrod + proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ... + + # call with positional arguments # parameter bindings: + callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true) + # call with named and positional arguments: + callme(y=1, x=0, "abd", '\t') # (x=0, y=1, s="abd", c='\t', b=false) + # call with named arguments (order is not relevant): + callme(c='\t', y=1, x=0) # (x=0, y=1, s="", c='\t', b=false) + # call as a command statement: no () needed: + callme 0, 1, "abc", '\t' + + +A procedure cannot modify its parameters (unless the parameters have the +type `var`). + `Operators`:idx: are procedures with a special operator symbol as identifier: .. code-block:: nimrod @@ -1391,19 +1473,79 @@ is used if the caller does not provide a value for this parameter. Example: # converts an integer to a string; this is a prefix operator. return intToStr(x) -Calling a procedure can be done in many different ways: +Operators with one parameter are prefix operators, operators with two +parameters are infix operators. There is no way to declare postfix +operators: All postfix operators are built-in and handled by the +grammar explicitely. + +Any operator can be called like an ordinary proc with the '`opr`' +notation. (Thus an operator can have more than two parameters): .. code-block:: nimrod - proc callme(x, y: int, s: string = "", c: char, b: bool = false) = ... + proc `*+` (a, b, c: int): int = + # Multiply and add + return a * b + c + + assert `*+`(3, 4, 6) == `*`(a, `+`(b, c)) + + + +Var parameters +~~~~~~~~~~~~~~ +The type of a parameter may be prefixed with the ``var`` keyword: + +.. code-block:: nimrod + proc divmod(a, b: int, res, remainder: var int) = + res = a div b + remainder = a mod b + + var + x, y: int + + divmod(8, 5, x, y) # modifies x and y + assert x == 1 + assert y == 3 + +In the example, ``res`` and ``remainder`` are `var parameters`. +Var parameters can be modified by the procedure and the changes are +visible to the caller. The argument passed to a var parameter has to be +an l-value. Var parameters are implemented as hidden pointers. The +above example is equivalent to: + +.. code-block:: nimrod + proc divmod(a, b: int, res, remainder: ptr int) = + res = a div b + remainder = a mod b + + var + x, y: int + divmod(8, 5, addr(x), addr(y)) + assert x == 1 + assert y == 3 + +In the examples, var parameters or pointers are used to provide two +return values. This can be done in a cleaner way by returning a tuple: + +.. code-block:: nimrod + proc divmod(a, b: int): tuple[res, remainder: int] = + return (a div b, a mod b) + + var t = divmod(8, 5) + assert t.res == 1 + assert t.remainder = 3 + +Even more elegant is to use `tuple unpacking` to access the tuple's fields: + +.. code-block:: nimrod + var (x, y) = divmod(8, 5) # tuple unpacking + assert x == 1 + assert y == 3 + +Unfortunately, this form of tuple unpacking is not yet implemented. + +.. + XXX remove this as soon as tuple unpacking is implemented - # call with positional arguments# parameter bindings: - callme(0, 1, "abc", '\t', true) # (x=0, y=1, s="abc", c='\t', b=true) - # call with named and positional arguments: - callme(y=1, x=0, "abd", '\t') # (x=0, y=1, s="abd", c='\t', b=false) - # call with named arguments (order is not relevant): - callme(c='\t', y=1, x=0) # (x=0, y=1, s="", c='\t', b=false) - # call as a command statement: no () needed: - callme 0, 1, "abc", '\t' Iterators and the for statement @@ -1411,15 +1553,16 @@ Iterators and the for statement Syntax:: - forStmt ::= FOR symbol (comma symbol)* [comma] IN expr [DOTDOT expr] COLON stmt + forStmt ::= 'for' symbol (comma symbol)* [comma] 'in' expr ['..' expr] ':' stmt - param ::= symbol (comma symbol)* [comma] COLON typeDesc - paramList ::= [PAR_LE [param (comma param)* [comma]] PAR_RI] [COLON typeDesc] - - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - - iteratorDecl ::= ITERATOR symbol ["*"] [genericParams] paramList [pragma] - [EQUALS stmt] + param ::= symbol (comma symbol)* [comma] ':' typeDesc + paramList ::= ['(' [param (comma param)* [comma]] ')'] [':' typeDesc] + + genericParam ::= symbol [':' typeDesc] + genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' + + iteratorDecl ::= 'iterator' symbol ['*'] [genericParams] paramList [pragma] + ['=' stmt] The `for`:idx: statement is an abstract mechanism to iterate over the elements of a container. It relies on an `iterator`:idx: to do so. Like ``while`` @@ -1473,13 +1616,15 @@ Type sections Syntax:: typeDef ::= typeDesc | objectDef | enumDef - genericParams ::= BRACKET_LE (symbol [EQUALS typeDesc] )* BRACKET_RI - typeDecl ::= TYPE - indPush - symbol ["*"] [genericParams] [EQUALS typeDef] - (SAD symbol ["*"] [genericParams] [EQUALS typeDef])* - DED + genericParam ::= symbol [':' typeDesc] + genericParams ::= '[' genericParam (comma genericParam)* [comma] ']' + + typeDecl ::= COMMENT + | symbol ['*'] [genericParams] ['=' typeDef] [COMMENT|IND COMMENT] + + typeSection ::= 'type' indPush typeDecl (SAD typeDecl)* DED indPop + Example: @@ -1504,6 +1649,8 @@ possible within a single ``type`` section. Generics ~~~~~~~~ +`Version 0.7.4: Complex generic types like in the example do not work.`:red: + Example: .. code-block:: nimrod @@ -1578,26 +1725,119 @@ Example: # this definition exists in the System module not (a == b) - writeln(5 != 6) # the compiler rewrites that to: writeln(not (5 == 6)) + assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6)) Macros -~~~~~~ +------ -`Macros`:idx: are the most powerful feature of Nimrod. They should be used -only to implement `domain specific languages`:idx:. They may lead to code +`Macros`:idx: are the most powerful feature of Nimrod. They can be used +to implement `domain specific languages`:idx:. But they may lead to code that is harder to understand and maintain. So one ought to use them sparingly. -The usage of ordinary procs, iterators or generics is preferred to the usage of -macros. + +While macros enable advanced compile-time code tranformations, they +cannot change Nimrod's syntax. However, this is no real restriction because +Nimrod's syntax is flexible enough anyway. + +To write macros, one needs to know how the Nimrod concrete syntax is converted +to an abstract syntax tree. (Unfortunately the AST is not yet documented.) + +There are two ways to invoke a macro: +(1) invoking a macro like a procedure call (`expression macros`) +(2) invoking a macro with the special ``macrostmt`` syntax (`statement macros`) + + +Expression Macros +~~~~~~~~~~~~~~~~~ + +The following example implements a powerful ``debug`` command that accepts a +variable number of arguments: + +.. code-block:: nimrod + # to work with Nimrod syntax trees, we need an API that is defined in the + # ``macros`` module: + import macros + + macro debug(n: expr): stmt = + # `n` is a Nimrod AST that contains the whole macro expression + # this macro returns a list of statements: + result = newNimNode(nnkStmtList, n) + # iterate over any argument that is passed to this macro: + for i in 1..n.len-1: + # add a call to the statement list that writes the expression; + # `toStrLit` converts an AST to its string representation: + add(result, newCall("write", newIdentNode("stdout"), toStrLit(n[i]))) + # add a call to the statement list that writes ": " + add(result, newCall("write", newIdentNode("stdout"), newStrLitNode(": "))) + # add a call to the statement list that writes the expressions value: + add(result, newCall("writeln", newIdentNode("stdout"), n[i])) + + var + a: array [0..10, int] + x = "some string" + a[0] = 42 + a[1] = 45 + + debug(a[0], a[1], x) + +The macro call expands to: + +.. code-block:: nimrod + write(stdout, "a[0]") + write(stdout, ": ") + writeln(stdout, a[0]) + + write(stdout, "a[1]") + write(stdout, ": ") + writeln(stdout, a[1]) + + write(stdout, "x") + write(stdout, ": ") + writeln(stdout, x) + + +Statement Macros +~~~~~~~~~~~~~~~~ + +Statement macros are defined just as expression macros. However, they are +invoked by an expression following a colon:: + + exprStmt ::= lowestExpr ['=' expr | [expr (comma expr)* [comma]] [macroStmt]] + macroStmt ::= ':' [stmt] ('of' [sliceExprList] ':' stmt + | 'elif' expr ':' stmt + | 'except' exceptList ':' stmt )* + ['else' ':' stmt] + +The following example outlines a macro that generates a lexical analyser from +regular expressions: + +.. code-block:: nimrod + import macros + + macro case_token(n: stmt): stmt = + # creates a lexical analyser from regular expressions + # ... (implementation is an exercise for the reader :-) + nil + + case_token: # this colon tells the parser it is a macro statement + of r"[A-Za-z_]+[A-Za-z_0-9]*": + return tkIdentifier + of r"0-9+": + return tkInteger + of r"[\+\-\*\?]+": + return tkOperator + else: + return tkUnknown + Modules ------- Nimrod supports splitting a program into pieces by a `module`:idx: concept. -Each module needs to be in its own file. Modules enable +Each module needs to be in its own file. Modules enable `information hiding`:idx: and `separate compilation`:idx:. A module may gain -access to symbols of another module by the `import`:idx: statement. -`Recursive module dependancies`:idx: are allowed, but slightly subtle. Only +access to symbols of another module by the `import`:idx: statement. +`Recursive module dependancies`:idx: are allowed, but slightly subtle. Only top-level symbols that are marked with an asterisk (``*``) are exported. The algorithm for compiling modules is: @@ -1622,12 +1862,12 @@ This is best illustrated by an example: # Module B import A # A is not parsed here! Only the already known symbols - # of A are imported here. + # of A are imported. - proc p*(x: A.T1): A.T1 # this works because the compiler has already - # added T1 to A's interface symbol table - - proc p(x: A.T1): A.T1 = return x + 1 + proc p*(x: A.T1): A.T1 = + # this works because the compiler has already + # added T1 to A's interface symbol table + return x + 1 Scope rules @@ -1649,7 +1889,7 @@ procedure or iterator overloading purposes. Tuple or object scope -~~~~~~~~~~~~~~~~~~~~~~ +~~~~~~~~~~~~~~~~~~~~~ The field identifiers inside a tuple or object definition are valid in the following places: @@ -1659,16 +1899,14 @@ following places: Module scope ~~~~~~~~~~~~ -All identifiers in the interface part of a module are valid from the point of -declaration, until the end of the module. Furthermore, the identifiers are -known in other modules that import the module. Identifiers from indirectly -dependent modules are *not* available. The `system`:idx: module is automatically -imported in all other modules. +All identifiers of a module are valid from the point of declaration until +the end of the module. Identifiers from indirectly dependent modules are *not* +available. The `system`:idx: module is automatically imported in every other +module. -If a module imports an identifier by two different modules, -each occurance of the identifier has to be qualified, unless it is an -overloaded procedure or iterator in which case the overloading -resolution takes place: +If a module imports an identifier by two different modules, each occurance of +the identifier has to be qualified, unless it is an overloaded procedure or +iterator in which case the overloading resolution takes place: .. code-block:: nimrod # Module A @@ -1680,7 +1918,7 @@ resolution takes place: # Module C import A, B write(stdout, x) # error: x is ambigious - write(sdtout, A.x) # no error: qualifier used + write(stdout, A.x) # no error: qualifier used var x = 4 write(stdout, x) # not ambigious: uses the module C's x @@ -1698,14 +1936,14 @@ Pragmas Syntax:: - colonExpr ::= expr [COLON expr] - colonExprList ::= [ colonExpr (comma colonExpr)* [comma] ] + colonExpr ::= expr [':' expr] + colonExprList ::= [colonExpr (comma colonExpr)* [comma]] - pragma ::= CURLYDOT_LE colonExprList (CURLYDOT_RI | CURLY_RI) + pragma ::= '{.' optInd (colonExpr [comma])* [SAD] ('.}' | '}') Pragmas are Nimrod's method to give the compiler additional information/ commands without introducing a massive number of new keywords. Pragmas are -processed on the fly during parsing. Pragmas are always enclosed in the +processed on the fly during semantic checking. Pragmas are enclosed in the special ``{.`` and ``.}`` curly brackets. @@ -1718,7 +1956,7 @@ The compiler defines the target processor and the target operating system as conditional symbols. Warning: The ``define`` pragma is deprecated as it conflicts with separate -compilation! One should use boolean constants as a replacement - this is +compilation! One should use boolean constants as a replacement - this is cleaner anyway. @@ -1759,10 +1997,6 @@ compilation option pragmas -------------------------- The listed pragmas here can be used to override the code generation options for a section of code. -:: - - "{." pragma: val {pragma: val} ".}" - The implementation currently provides the following possible options (later various others may be added). @@ -1785,8 +2019,7 @@ warnings on|off Turns the warning messages of the compiler hints on|off Turns the hint messages of the compiler on or off. optimization none|speed|size Optimize the code for speed or size, or - disable optimization. For non-optimizing - compilers this option has no effect. + disable optimization. callconv cdecl|... Specifies the default calling convention for all procedures (and procedure types) that follow. |