diff options
Diffstat (limited to 'doc/tut2.txt')
-rw-r--r-- | doc/tut2.txt | 718 |
1 files changed, 718 insertions, 0 deletions
diff --git a/doc/tut2.txt b/doc/tut2.txt new file mode 100644 index 000000000..dc14aabc0 --- /dev/null +++ b/doc/tut2.txt @@ -0,0 +1,718 @@ +============================= +The Nimrod Tutorial (Part II) +============================= + +:Author: Andreas Rumpf +:Version: |nimrodversion| + +.. contents:: + + +Introduction +============ + + "With great power comes great responsibility." -- Spider-man + +This document is a tutorial for the advanced constructs of the *Nimrod* +programming language. + + +Pragmas +======= +Pragmas are Nimrod's method to give the compiler additional information/ +commands without introducing a massive number of new keywords. Pragmas are +processed during semantic checking. Pragmas are enclosed in the +special ``{.`` and ``.}`` curly dot brackets. This tutorial does not cover +pragmas. See the `manual <manual.html>`_ or `user guide <nimrodc.html>`_ for +a description of the available pragmas. + + +Object Oriented Programming +=========================== + +While Nimrod's support for object oriented programming (OOP) is minimalistic, +powerful OOP technics can be used. OOP is seen as *one* way to design a +program, not *the only* way. Often a procedural approach leads to simpler +and more efficient code. + + +Objects +------- + +Like tuples, objects are a means to pack different values together in a +structured way. However, objects provide many features that tuples do not: +They provide inheritance and information hiding. Because objects encapsulate +data, the ``()`` tuple constructor cannot be used to construct objects. So +the order of the object's fields is not as important as it is for tuples. The +programmer should provide a proc to initialize the object (this is called +a *constructor*). + +Objects have access to their type at runtime. There is an +``is`` operator that can be used to check the object's type: + +.. code-block:: nimrod + + type + TPerson = object of TObject + name*: string # the * means that `name` is accessible from other modules + age: int # no * means that the field is hidden from other modules + + TStudent = object of TPerson # TStudent inherits from TPerson + id: int # with an id field + + var + student: TStudent + person: TPerson + assert(student is TStudent) # is true + +Object fields that should be visible from outside the defining module, have to +be marked by ``*``. In contrast to tuples, different object types are +never *equivalent*. New object types can only be defined within a type +section. + +Inheritance is done with the ``object of`` syntax. Multiple inheritance is +currently not supported. If an object type has no suitable ancestor, ``TObject`` +should be used as its ancestor, but this is only a convention. + +Note that aggregation (*has-a* relation) is often preferable to inheritance +(*is-a* relation) for simple code reuse. Since objects are value types in +Nimrod, aggregation is as efficient as inheritance. + + +Mutually recursive types +------------------------ + +Objects, tuples and references can model quite complex data structures which +depend on each other. This is called *mutually recursive types*. In Nimrod +these types need to be declared within a single type section. Anything else +would require arbitrary symbol lookahead which slows down compilation. + +Example: + +.. code-block:: nimrod + type + PNode = ref TNode # a traced reference to a TNode + TNode = object + le, ri: PNode # left and right subtrees + sym: ref TSym # leaves contain a reference to a TSym + + TSym = object # a symbol + name: string # the symbol's name + line: int # the line the symbol was declared in + code: PNode # the symbol's abstract syntax tree + + +Type conversions +---------------- +Nimrod distinguishes between `type casts`:idx: and `type conversions`:idx:. +Casts are done with the ``cast`` operator and force the compiler to +interpret a bit pattern to be of another type. + +Type conversions are a much more polite way to convert a type into another: +They preserve the abstract *value*, not necessarily the *bit-pattern*. If a +type conversion is not possible, the compiler complains or an exception is +raised. + +The syntax for type conversions is ``destination_type(expression_to_convert)`` +(like an ordinary call): + +.. code-block:: nimrod + proc getID(x: TPerson): int = + return TStudent(x).id + +The ``EInvalidObjectConversion`` exception is raised if ``x`` is not a +``TStudent``. + + +Object variants +--------------- +Often an object hierarchy is overkill in certain situations where simple +`variant`:idx: types are needed. + +An example: + +.. code-block:: nimrod + + # This is an example how an abstract syntax tree could be modelled in Nimrod + type + TNodeKind = enum # the different node types + nkInt, # a leaf with an integer value + nkFloat, # a leaf with a float value + nkString, # a leaf with a string value + nkAdd, # an addition + nkSub, # a subtraction + nkIf # an if statement + PNode = ref TNode + TNode = object + case kind: TNodeKind # the ``kind`` field is the discriminator + of nkInt: intVal: int + of nkFloat: floavVal: float + of nkString: strVal: string + of nkAdd, nkSub: + leftOp, rightOp: PNode + of nkIf: + condition, thenPart, elsePart: PNode + + var + n: PNode + new(n) # creates a new node + n.kind = nkFloat + n.floatVal = 0.0 # valid, because ``n.kind==nkFloat`` + + # the following statement raises an `EInvalidField` exception, because + # n.kind's value does not fit: + n.strVal = "" + +As can been seen from the example, an advantage to an object hierarchy is that +no conversion between different object types is needed. Yet, access to invalid +object fields raises an exception. + + +Methods +------- +In ordinary object oriented languages, procedures (also called *methods*) are +bound to a class. This has disadvantages: + +* Adding a method to a class the programmer has no control over is + impossible or needs ugly workarounds. +* Often it is unclear where the procedure should belong to: Is + ``join`` a string method or an array method? Should the complex + ``vertexCover`` algorithm really be a method of the ``graph`` class? + +Nimrod avoids these problems by not distinguishing between methods and +procedures. Methods are just ordinary procedures. However, there is a special +syntactic sugar for calling procedures: The syntax ``obj.method(args)`` can be +used instead of ``method(obj, args)``. If there are no remaining arguments, the +parentheses can be omitted: ``obj.len`` (instead of ``len(obj)``). + +This `method call syntax`:idx: is not restricted to objects, it can be used +for any type: + +.. code-block:: nimrod + + echo("abc".len) # is the same as echo(len("abc")) + echo("abc".toUpper()) + echo({'a', 'b', 'c'}.card) + stdout.writeln("Hallo") # the same as write(stdout, "Hallo") + +If it gives you warm fuzzy feelings, you can even write ``1.`+`(2)`` instead of +``1 + 2`` and claim that Nimrod is a pure object oriented language. (That +would not even be lying: *pure OO* has no meaning anyway. :-) + + +Properties +---------- +As the above example shows, Nimrod has no need for *get-properties*: +Ordinary get-procedures that are called with the *method call syntax* achieve +the same. But setting a value is different; for this a special setter syntax +is needed: + +.. code-block:: nimrod + + type + TSocket* = object of TObject + FHost: int # cannot be accessed from the outside of the module + # the `F` prefix is a convention to avoid clashes since + # the accessors are named `host` + + proc `host=`*(s: var TSocket, value: int) {.inline.} = + ## setter of hostAddr + s.FHost = value + + proc host*(s: TSocket): int {.inline.} = + ## getter of hostAddr + return s.FHost + + var + s: TSocket + s.host = 34 # same as `host=`(s, 34) + +(The example also shows ``inline`` procedures.) + + +The ``[]`` array access operator can be overloaded to provide +`array properties`:idx:\ : + +.. code-block:: nimrod + type + TVector* = object + x, y, z: float + + proc `[]=`* (v: var TVector, i: int, value: float) = + # setter + case i + of 0: v.x = value + of 1: v.y = value + of 2: v.z = value + else: assert(false) + + proc `[]`* (v: TVector, i: int): float = + # getter + case i + of 0: result = v.x + of 1: result = v.y + of 2: result = v.z + else: assert(false) + +The example is silly, since a vector is better modelled by a tuple which +already provides ``v[]`` access. + + +Dynamic binding +--------------- +In Nimrod procedural types are used to implement dynamic binding. The following +example also shows some more conventions: The ``self`` or ``this`` object +is named ``my`` (because it is shorter than the alternatives), each class +provides a constructor, etc. + +.. code-block:: nimrod + type + TFigure = object of TObject # abstract base class: + draw: proc (my: var TFigure) # concrete classes implement this proc + + proc init(f: var TFigure) = + f.draw = nil + + type + TCircle = object of TFigure + radius: int + + proc drawCircle(my: var TCircle) = echo("o " & $my.radius) + + proc init(my: var TCircle) = + init(TFigure(my)) # call base constructor + my.radius = 5 + my.draw = drawCircle + + type + TRectangle = object of TFigure + width, height: int + + proc drawRectangle(my: var TRectangle) = echo("[]") + + proc init(my: var TRectangle) = + init(TFigure(my)) # call base constructor + my.width = 5 + my.height = 10 + my.draw = drawRectangle + + # now use these classes: + var + r: TRectangle + c: TCircle + init(r) + init(c) + r.draw(r) + c.draw(c) + +The last line shows the syntactical difference between static and dynamic +binding: The ``r.draw(r)`` dynamic call refers to ``r`` twice. This difference +is not necessarily bad. But if you want to eliminate the somewhat redundant +``r``, it can be done by using *closures*: + +.. code-block:: nimrod + type + TFigure = object of TObject # abstract base class: + draw: proc () {.closure.} # concrete classes implement this proc + + proc init(f: var TFigure) = + f.draw = nil + + type + TCircle = object of TFigure + radius: int + + proc init(me: var TCircle) = + init(TFigure(me)) # call base constructor + me.radius = 5 + me.draw = lambda () = + echo("o " & $me.radius) + + type + TRectangle = object of TFigure + width, height: int + + proc init(me: var TRectangle) = + init(TFigure(me)) # call base constructor + me.width = 5 + me.height = 10 + me.draw = lambda () = + echo("[]") + + # now use these classes: + var + r: TRectangle + c: TCircle + init(r) + init(c) + r.draw() + c.draw() + +The example also introduces `lambda`:idx: expressions: A ``lambda`` expression +defines a new proc with the ``closure`` calling convention on the fly. + +`Version 0.7.4: Closures and lambda expressions are not implemented.`:red: + + +Exceptions +========== + +In Nimrod `exceptions`:idx: are objects. By convention, exception types are +prefixed with an 'E', not 'T'. The ``system`` module defines an exception +hierarchy that you should stick to. Reusing an existing exception type is +often better than defining a new exception type: It avoids a proliferation of +types. + +Exceptions should be allocated on the heap because their lifetime is unknown. + +A convention is that exceptions should be raised in *exceptional* cases: +For example, if a file cannot be opened, this should not raise an exception +since this is quite common (the file may have been deleted). + + +Raise statement +--------------- +Raising an exception is done with the ``raise`` statement: + +.. code-block:: nimrod + var + e: ref EOS + new(e) + e.msg = "the request to the OS failed" + raise e + +If the ``raise`` keyword is not followed by an expression, the last exception +is *re-raised*. + + +Try statement +------------- + +The `try`:idx: statement handles exceptions: + +.. code-block:: nimrod + # read the first two lines of a text file that should contain numbers + # and tries to add them + var + f: TFile + if openFile(f, "numbers.txt"): + try: + var a = readLine(f) + var b = readLine(f) + echo("sum: " & $(parseInt(a) + parseInt(b))) + except EOverflow: + echo("overflow!") + except EInvalidValue: + echo("could not convert string to integer") + except EIO: + echo("IO error!") + except: + echo("Unknown exception!") + # reraise the unknown exception: + raise + finally: + closeFile(f) + +The statements after the ``try`` are executed unless an exception is +raised. Then the appropriate ``except`` part is executed. + +The empty ``except`` part is executed if there is an exception that is +not explicitely listed. It is similiar to an ``else`` part in ``if`` +statements. + +If there is a ``finally`` part, it is always executed after the +exception handlers. + +The exception is *consumed* in an ``except`` part. If an exception is not +handled, it is propagated through the call stack. This means that often +the rest of the procedure - that is not within a ``finally`` clause - +is not executed (if an exception occurs). + + +Generics +======== + +`Version 0.7.4: Complex generic types like in the example do not work.`:red: + +`Generics`:idx: are Nimrod's means to parametrize procs, iterators or types +with `type parameters`:idx:. They are most useful for efficient type safe +containers: + +.. code-block:: nimrod + type + TBinaryTree[T] = object # TBinaryTree is a generic type with + # with generic param ``T`` + le, ri: ref TBinaryTree[T] # left and right subtrees; may be nil + data: T # the data stored in a node + PBinaryTree*[T] = ref TBinaryTree[T] # type that is exported + + proc newNode*[T](data: T): PBinaryTree[T] = + # constructor for a node + new(result) + result.dat = data + + proc add*[T](root: var PBinaryTree[T], n: PBinaryTree[T]) = + # insert a node into the tree + if root == nil: + root = n + else: + var it = root + while it != nil: + # compare the data items; uses the generic ``cmd`` proc that works for + # any type that has a ``==`` and ``<`` operator + var c = cmp(it.data, n.data) + if c < 0: + if it.le == nil: + it.le = n + return + it = it.le + else: + if it.ri == nil: + it.ri = n + return + it = it.ri + + proc add*[T](root: var PBinaryTree[T], data: T) = + # convenience proc: + add(root, newNode(data)) + + iterator preorder*[T](root: PBinaryTree[T]): T = + # Preorder traversal of a binary tree. + # Since recursive iterators are not yet implemented, + # this uses an explicit stack (which is more efficient anyway): + var stack: seq[PBinaryTree[T]] = @[root] + while stack.len > 0: + var n = stack[stack.len-1] + setLen(stack, stack.len-1) # pop `n` of the stack + while n != nil: + yield n + add(stack, n.ri) # push right subtree onto the stack + n = n.le # and follow the left pointer + + var + root: PBinaryTree[string] # instantiate a PBinaryTree with ``string`` + add(root, newNode("hallo")) # instantiates generic procs ``newNode`` and ``add`` + add(root, "world") # instantiates the second ``add`` proc + for str in preorder(root): + stdout.writeln(str) + +The example shows a generic binary tree. Depending on context, the brackets are +used either to introduce type parameters or to instantiate a generic proc, +iterator or type. As the example shows, generics work with overloading: The +best match of ``add`` is used. The built-in ``add`` procedure for sequences +is not hidden and used in the ``preorder`` iterator. + + +Templates +========= + +Templates are a simple substitution mechanism that operates on Nimrod's +abstract syntax trees. Templates are processed in the semantic pass of the +compiler. They integrate well with the rest of the language and share none +of C's preprocessor macros flaws. However, they may lead to code that is harder +to understand and maintain. So one should use them sparingly. + +To *invoke* a template, call it like a procedure. + +Example: + +.. code-block:: nimrod + template `!=` (a, b: expr): expr = + # this definition exists in the System module + not (a == b) + + assert(5 != 6) # the compiler rewrites that to: assert(not (5 == 6)) + +The ``!=``, ``>``, ``>=``, ``in``, ``notin``, ``isnot`` operators are in fact +templates: This has the benefit that if you overload the ``==`` operator, +the ``!=`` operator is available automatically and does the right thing. + +``a > b`` is transformed into ``b < a``. +``a in b`` is transformed into ``contains(b, a)``. +``notin`` and ``isnot`` have the obvious meanings. + +Templates are especially useful for lazy evaluation purposes. Consider a +simple proc for logging: + +.. code-block:: nimrod + const + debug = True + + proc log(msg: string) {.inline.} = + if debug: + stdout.writeln(msg) + + var + x = 4 + log("x has the value: " & $x) + +This code has a shortcoming: If ``debug`` is set to false someday, the quite +expensive ``$`` and ``&`` operations are still performed! (The argument +evaluation for procedures is said to be *eager*). + +Turning the ``log`` proc into a template solves this problem in an elegant way: + +.. code-block:: nimrod + const + debug = True + + template log(msg: expr): stmt = + if debug: + stdout.writeln(msg) + + var + x = 4 + log("x has the value: " & $x) + +The "types" of templates can be the symbols ``expr`` (stands for *expression*), +``stmt`` (stands for *statement*) or ``typedesc`` (stands for *type +description*). These are no real types, they just help the compiler parsing. + +The template body does not open a new scope. To open a new scope +use a ``block`` statement: + +.. code-block:: nimrod + template declareInScope(x: expr, t: typeDesc): stmt = + var x: t + + template declareInNewScope(x: expr, t: typeDesc): stmt = + # open a new scope: + block: + var x: t + + declareInScope(a, int) + a = 42 # works, `a` is known here + + declareInNewScope(b, int) + b = 42 # does not work, `b` is unknown + + +Macros +====== + +If the template mechanism scares you, you will be pleased to hear that +templates are not really necessary: Macros can do anything that templates can +do and much more. Macros are harder to write than templates and even harder +to get right :-). Now that you have been warned, lets see what a macro *is*. + +Macros enable advanced compile-time code tranformations, but they +cannot change Nimrod's syntax. However, this is no real restriction because +Nimrod's syntax is flexible enough anyway. + +`Macros`:idx: can be used to implement `domain specific languages`:idx:. + +To write macros, one needs to know how the Nimrod concrete syntax is converted +to an abstract syntax tree (AST). (Unfortunately the AST is not documented yet.) + +There are two ways to invoke a macro: +(1) invoking a macro like a procedure call (`expression macros`:idx:) +(2) invoking a macro with the special ``macrostmt`` syntax (`statement macros`:idx:) + + +Expression Macros +----------------- + +The following example implements a powerful ``debug`` command that accepts a +variable number of arguments (this cannot be done with templates): + +.. code-block:: nimrod + # to work with Nimrod syntax trees, we need an API that is defined in the + # ``macros`` module: + import macros + + macro debug(n: expr): stmt = + # `n` is a Nimrod AST that contains the whole macro expression + # this macro returns a list of statements: + result = newNimNode(nnkStmtList, n) + # iterate over any argument that is passed to this macro: + for i in 1..n.len-1: + # add a call to the statement list that writes the expression; + # `toStrLit` converts an AST to its string representation: + result.add(newCall("write", newIdentNode("stdout"), toStrLit(n[i]))) + # add a call to the statement list that writes ": " + result.add(newCall("write", newIdentNode("stdout"), newStrLitNode(": "))) + # add a call to the statement list that writes the expressions value: + result.add(newCall("writeln", newIdentNode("stdout"), n[i])) + + var + a: array[0..10, int] + x = "some string" + a[0] = 42 + a[1] = 45 + + debug(a[0], a[1], x) + +The macro call expands to: + +.. code-block:: nimrod + write(stdout, "a[0]") + write(stdout, ": ") + writeln(stdout, a[0]) + + write(stdout, "a[1]") + write(stdout, ": ") + writeln(stdout, a[1]) + + write(stdout, "x") + write(stdout, ": ") + writeln(stdout, x) + + +Lets return to the dynamic binding ``r.draw(r)`` notational "problem". Apart +from closures, there is another "solution": Define an infix ``!`` macro +operator which hides it: + +.. code-block:: + + macro `!` (n: expr): expr = + result = newNimNode(nnkCall, n) + var dot = newNimNode(nnkDotExpr, n) + dot.add(n[1]) # obj + if n[2].kind == nnkCall: + # transforms ``obj!method(arg1, arg2, ...)`` to + # ``(obj.method)(obj, arg1, arg2, ...)`` + dot.add(n[2][0]) # method + result.add(dot) + result.add(n[1]) # obj + for i in 1..n[2].len-1: + result.add(n[2][i]) + else: + # transforms ``obj!method`` to + # ``(obj.method)(obj)`` + dot.add(n[2]) # method + result.add(dot) + result.add(n[1]) # obj + + r!draw(a, b, c) # will be transfomed into ``r.draw(r, a, b, c)`` + +Great! 20 lines of complex code to safe a few keystrokes! Obviously, this is +exactly you should not do! (But it makes a cool example.) + + +Statement Macros +---------------- + +Statement macros are defined just as expression macros. However, they are +invoked by an expression following a colon. + +The following example outlines a macro that generates a lexical analyser from +regular expressions: + +.. code-block:: nimrod + + macro case_token(n: stmt): stmt = + # creates a lexical analyser from regular expressions + # ... (implementation is an exercise for the reader :-) + nil + + case_token: # this colon tells the parser it is a macro statement + of r"[A-Za-z_]+[A-Za-z_0-9]*": + return tkIdentifier + of r"0-9+": + return tkInteger + of r"[\+\-\*\?]+": + return tkOperator + else: + return tkUnknown + + |