diff options
author | Andreas Rumpf <rumpf_a@web.de> | 2009-01-07 17:03:25 +0100 |
---|---|---|
committer | Andreas Rumpf <rumpf_a@web.de> | 2009-01-07 17:03:25 +0100 |
commit | 439aa2d04d5528b5aed288f70895515d1da2dc3d (patch) | |
tree | cda2d0bc4d4f2bab189c4a0567cae3c1428c5ed0 /doc/intern.txt | |
parent | 1c8ddca7e08af9075a930edaca6c522d5e6fd8b5 (diff) | |
download | Nim-439aa2d04d5528b5aed288f70895515d1da2dc3d.tar.gz |
version 0.7.4
Diffstat (limited to 'doc/intern.txt')
-rw-r--r-- | doc/intern.txt | 187 |
1 files changed, 71 insertions, 116 deletions
diff --git a/doc/intern.txt b/doc/intern.txt index 4d65c1e55..6496e0e29 100644 --- a/doc/intern.txt +++ b/doc/intern.txt @@ -34,7 +34,6 @@ Path Purpose on it! ``web`` website of Nimrod; generated by ``koch.py`` from the ``*.txt`` and ``*.tmpl`` files -``koch`` the Koch Build System (written for Nimrod) ``obj`` generated ``*.obj`` files go into here ============ ============================================== @@ -45,44 +44,76 @@ Bootstrapping the compiler The compiler is written in a subset of Pascal with special annotations so that it can be translated to Nimrod code automatically. This conversion is done by Nimrod itself via the undocumented ``boot`` command. Thus both Nimrod -and Free Pascal can compile the Nimrod compiler. +and Free Pascal can compile the Nimrod compiler. However, the Pascal version +has no garbage collector and leaks memory like crazy! So the Pascal version +should only be used for bootstrapping. Requirements for bootstrapping: -- Free Pascal (I used version 2.2) [optional] -- Python (should work with version 1.5 or higher) +- Python (should work with version 1.5 or higher) (optional) +- supported C compiler -- C compiler -- one of: +Compiling the compiler is a simple matter of running:: - * win32-lcc (currently broken) - * Borland C++ (tested with 5.5; currently broken) - * Microsoft C++ - * Digital Mars C++ - * Watcom C++ (currently broken) - * GCC - * Intel C++ - * Pelles C (currently broken) - * llvm-gcc + koch.py boot -| Compiling the compiler is a simple matter of running: -| ``koch.py boot`` -| Or you can compile by hand, this is not difficult. +For a release version use:: -If you want to debug the compiler, use the command:: + koch.py boot -d:release - koch.py boot --debugger:on +The ``koch.py`` script is Nimrod's maintainance script. It is a replacement for +make and shell scripting with the advantage that it is much more portable. -The ``koch.py`` script is Nimrod's maintainance script: Everything that has -been automated is accessible with it. It is a replacement for make and shell -scripting with the advantage that it is more portable. +If you don't have Python, there is a ``boot`` Nimrod program which does roughly +the same:: + nimrod cc boot.nim + ./boot [-d:release] -Coding standards -================ -The compiler is written in a subset of Pascal with special annotations so -that it can be translated to Nimrod code automatically. As a general rule, -Pascal code that does not translate to Nimrod automatically is forbidden. +Pascal annotations +================== +There are some annotations that the Pascal sources use so that they can +be converted to Nimrod automatically: + +``{@discard} <expr>`` + Tells the compiler that a ``discard`` statement is needed for Nimrod + here. + +``{@cast}typ(expr)`` + Tells the compiler that the Pascal conversion is a ``cast`` in Nimrod. + +``{@emit <code>}`` + Emits ``<code>``. The code fragment needs to be in Pascal syntax. + +``{@ignore} <codeA> {@emit <codeB>}`` + Ignores ``<codeA>`` and instead emits ``<codeB>`` which needs to be in + Pascal syntax. An empty ``{@emit}`` is possible too (it then only closes + the ``<codeA>`` part). + +``record {@tuple}`` + Is used to tell the compiler that the record type should be transformed + to a Nimrod tuple type. + +``^ {@ptr}`` + Is used to tell the compiler that the pointer type should be transformed + to a Nimrod ``ptr`` type. The default is a ``ref`` type. + +``'a' + ''`` + The idiom ``+''`` is used to tell the compiler that it is a string + literal and not a character literal. (Pascal does not distinguish between + character literals and string literals of length 1.) + +``+{&}`` + This tells the compiler that Pascal's ``+`` here is a string concatenation + and thus should be converted to ``&``. Note that this is not needed if + any of the operands is a string literal because the compiler then can + figure this out by itself. + +``{@set}['a', 'b', 'c']`` + Tells the compiler that Pascal's ``[]`` constructor is a set and not an + array. This is only needed if the compiler cannot figure this out for + itself. Porting to new platforms @@ -99,7 +130,7 @@ check that the OS, System modules work and recompile Nimrod. The only case where things aren't as easy is when the garbage collector needs some assembler tweaking to work. The standard version of the GC uses C's ``setjmp`` function to store all registers -on the hardware stack. It may be that the new platform needs to +on the hardware stack. It may be necessary that the new platform needs to replace this generic code by some assembler code. @@ -132,11 +163,11 @@ The Garbage Collector Introduction ------------ -We use the term *cell* here to refer to everything that is traced +I use the term *cell* here to refer to everything that is traced (sequences, refs, strings). This section describes how the new GC works. -The basic algorithm is *Deferrent reference counting* with cycle detection. +The basic algorithm is *Deferrent Reference Counting* with cycle detection. References in the stack are not counted for better performance and easier C code generation. @@ -170,7 +201,7 @@ modifying a ``TCellSet`` during traversation leads to undefined behaviour. iterator elements(s: TCellSet): (elem: PCell) -All the operations have to be perform efficiently. Because a Cellset can +All the operations have to perform efficiently. Because a Cellset can become huge a hash table alone is not suitable for this. We use a mixture of bitset and hash table for this. The hash table maps *pages* @@ -246,16 +277,10 @@ This syntax tree is the interface between the parser and the code generator. It is essential to understand most of the compiler's code. In order to compile Nimrod correctly, type-checking has to be seperated from -parsing. Otherwise generics would not work. Code generation is done for a -whole module only after it has been checked for semantics. +parsing. Otherwise generics would not work. .. include:: filelist.txt -The first command line argument selects the backend. Thus the backend is -responsible for calling the parser and semantic checker. However, when -compiling ``import`` or ``include`` statements, the semantic checker needs to -call the backend, this is done by embedding a PBackend into a TContext. - The syntax tree --------------- @@ -265,7 +290,7 @@ may contain cycles. The AST changes its shape after semantic checking. This is needed to make life easier for the code generators. See the "ast" module for the type definitions. -We use the notation ``nodeKind(fields, [sons])`` for describing +I use the notation ``nodeKind(fields, [sons])`` for describing nodes. ``nodeKind[sons]`` is a short-cut for ``nodeKind([sons])``. XXX: Description of the language's syntax and the corresponding trees. @@ -273,12 +298,16 @@ XXX: Description of the language's syntax and the corresponding trees. How the RTL is compiled ======================= -The system module contains the part of the RTL which needs support by +The ``system`` module contains the part of the RTL which needs support by compiler magic (and the stuff that needs to be in it because the spec says so). The C code generator generates the C code for it just like any other module. However, calls to some procedures like ``addInt`` are inserted by -the CCG. Therefore the module ``magicsys`` contains a table -(``compilerprocs``) with all symbols that are marked as ``compilerproc``. +the CCG. Therefore the module ``magicsys`` contains a table (``compilerprocs``) +with all symbols that are marked as ``compilerproc``. ``compilerprocs`` are +needed by the code generator. A ``magic`` proc is not the same as a +``compilerproc``: A ``magic`` is a proc that needs compiler magic for its +semantic checking, a ``compilerproc`` is a proc that is used by the code +generator. @@ -290,77 +319,3 @@ underlying C compiler already does all the hard work for us. The problem is the common runtime library, especially the memory manager. Note that Borland's Delphi had exactly the same problem. The workaround is to not link the GC with the Dll and provide an extra runtime dll that needs to be initialized. - - - -How to implement closures -========================= - -A closure is a record of a proc pointer and a context ref. The context ref -points to a garbage collected record that contains the needed variables. -An example: - -.. code-block:: Nimrod - - type - TListRec = record - data: string - next: ref TListRec - - proc forEach(head: ref TListRec, visitor: proc (s: string) {.closure.}) = - var it = head - while it != nil: - visit(it.data) - it = it.next - - proc sayHello() = - var L = new List(["hallo", "Andreas"]) - var temp = "jup\xff" - forEach(L, lambda(s: string) = - io.write(temp) - io.write(s) - ) - - -This should become the following in C: - -.. code-block:: C - typedef struct ... /* List type */ - - typedef struct closure { - void (*PrcPart)(string, void*); - void* ClPart; - } - - typedef struct Tcl_data { - string temp; // all accessed variables are put in here! - } - - void forEach(TListRec* head, const closure visitor) { - TListRec* it = head; - while (it != NIM_NULL) { - visitor.prc(it->data, visitor->cl_data); - it = it->next; - } - } - - void printStr(string s, void* cl_data) { - Tcl_data* x = (Tcl_data*) cl_data; - io_write(x->temp); - io_write(s); - } - - void sayhello() { - Tcl_data* data = new(...); - asgnRef(&data->temp, "jup\xff"); - ... - - closure cl; - cl.prc = printStr; - cl.cl_data = data; - foreach(L, cl); - } - - -What about nested closure? - There's not much difference: Just put all used -variables in the data record. |