summary refs log tree commit diff stats
path: root/doc/intern.txt
diff options
context:
space:
mode:
authorAndreas Rumpf <rumpf_a@web.de>2009-01-07 17:03:25 +0100
committerAndreas Rumpf <rumpf_a@web.de>2009-01-07 17:03:25 +0100
commit439aa2d04d5528b5aed288f70895515d1da2dc3d (patch)
treecda2d0bc4d4f2bab189c4a0567cae3c1428c5ed0 /doc/intern.txt
parent1c8ddca7e08af9075a930edaca6c522d5e6fd8b5 (diff)
downloadNim-439aa2d04d5528b5aed288f70895515d1da2dc3d.tar.gz
version 0.7.4
Diffstat (limited to 'doc/intern.txt')
-rw-r--r--doc/intern.txt187
1 files changed, 71 insertions, 116 deletions
diff --git a/doc/intern.txt b/doc/intern.txt
index 4d65c1e55..6496e0e29 100644
--- a/doc/intern.txt
+++ b/doc/intern.txt
@@ -34,7 +34,6 @@ Path           Purpose
                on it!
 ``web``        website of Nimrod; generated by ``koch.py``
                from the ``*.txt`` and ``*.tmpl`` files
-``koch``       the Koch Build System (written for Nimrod)
 ``obj``        generated ``*.obj`` files go into here
 ============   ==============================================
 
@@ -45,44 +44,76 @@ Bootstrapping the compiler
 The compiler is written in a subset of Pascal with special annotations so
 that it can be translated to Nimrod code automatically. This conversion is
 done by Nimrod itself via the undocumented ``boot`` command. Thus both Nimrod
-and Free Pascal can compile the Nimrod compiler.
+and Free Pascal can compile the Nimrod compiler. However, the Pascal version
+has no garbage collector and leaks memory like crazy! So the Pascal version
+should only be used for bootstrapping.
 
 Requirements for bootstrapping:
 
-- Free Pascal (I used version 2.2) [optional]
-- Python (should work with version 1.5 or higher)
+- Python (should work with version 1.5 or higher) (optional)
+- supported C compiler
 
-- C compiler -- one of:
+Compiling the compiler is a simple matter of running::
 
-  * win32-lcc (currently broken)
-  * Borland C++ (tested with 5.5; currently broken)
-  * Microsoft C++
-  * Digital Mars C++
-  * Watcom C++ (currently broken)
-  * GCC
-  * Intel C++
-  * Pelles C (currently broken)
-  * llvm-gcc
+  koch.py boot
 
-| Compiling the compiler is a simple matter of running:
-| ``koch.py boot``
-| Or you can compile by hand, this is not difficult.
+For a release version use::
 
-If you want to debug the compiler, use the command::
+  koch.py boot -d:release
 
-  koch.py boot --debugger:on
+The ``koch.py`` script is Nimrod's maintainance script. It is a replacement for
+make and shell scripting with the advantage that it is much more portable.
 
-The ``koch.py`` script is Nimrod's maintainance script: Everything that has 
-been automated is accessible with it. It is a replacement for make and shell
-scripting with the advantage that it is more portable.
+If you don't have Python, there is a ``boot`` Nimrod program which does roughly
+the same::
 
+  nimrod cc boot.nim
+  ./boot [-d:release]
 
-Coding standards
-================
 
-The compiler is written in a subset of Pascal with special annotations so
-that it can be translated to Nimrod code automatically. As a general rule,
-Pascal code that does not translate to Nimrod automatically is forbidden.
+Pascal annotations
+==================
+There are some annotations that the Pascal sources use so that they can
+be converted to Nimrod automatically:
+
+``{@discard} <expr>``
+    Tells the compiler that a ``discard`` statement is needed for Nimrod
+    here.
+
+``{@cast}typ(expr)``
+    Tells the compiler that the Pascal conversion is a ``cast`` in Nimrod.
+
+``{@emit <code>}``
+    Emits ``<code>``. The code fragment needs to be in Pascal syntax.
+
+``{@ignore} <codeA> {@emit <codeB>}``
+    Ignores ``<codeA>`` and instead emits ``<codeB>`` which needs to be in
+    Pascal syntax. An empty ``{@emit}`` is possible too (it then only closes
+    the ``<codeA>`` part).
+
+``record {@tuple}``
+    Is used to tell the compiler that the record type should be transformed
+    to a Nimrod tuple type.
+
+``^ {@ptr}``
+    Is used to tell the compiler that the pointer type should be transformed
+    to a Nimrod ``ptr`` type. The default is a ``ref`` type.
+
+``'a' + ''``
+    The idiom ``+''`` is used to tell the compiler that it is a string
+    literal and not a character literal. (Pascal does not distinguish between
+    character literals and string literals of length 1.)
+
+``+{&}``
+    This tells the compiler that Pascal's ``+`` here is a string concatenation
+    and thus should be converted to ``&``. Note that this is not needed if
+    any of the operands is a string literal because the compiler then can
+    figure this out by itself.
+
+``{@set}['a', 'b', 'c']``
+    Tells the compiler that Pascal's ``[]`` constructor is a set and not an
+    array. This is only needed if the compiler cannot figure this out for
+    itself.
 
 
 Porting to new platforms
@@ -99,7 +130,7 @@ check that the OS, System modules work and recompile Nimrod.
 The only case where things aren't as easy is when the garbage
 collector needs some assembler tweaking to work. The standard
 version of the GC uses C's ``setjmp`` function to store all registers
-on the hardware stack. It may be that the new platform needs to
+on the hardware stack. It may be necessary that the new platform needs to
 replace this generic code by some assembler code.
 
 
@@ -132,11 +163,11 @@ The Garbage Collector
 Introduction
 ------------
 
-We use the term *cell* here to refer to everything that is traced
+I use the term *cell* here to refer to everything that is traced
 (sequences, refs, strings).
 This section describes how the new GC works.
 
-The basic algorithm is *Deferrent reference counting* with cycle detection.
+The basic algorithm is *Deferrent Reference Counting* with cycle detection.
 References in the stack are not counted for better performance and easier C
 code generation.
 
@@ -170,7 +201,7 @@ modifying a ``TCellSet`` during traversation leads to undefined behaviour.
   iterator elements(s: TCellSet): (elem: PCell)
 
 
-All the operations have to be perform efficiently. Because a Cellset can
+All the operations have to perform efficiently. Because a Cellset can
 become huge a hash table alone is not suitable for this.
 
 We use a mixture of bitset and hash table for this. The hash table maps *pages*
@@ -246,16 +277,10 @@ This syntax tree is the interface between the parser and the code generator.
 It is essential to understand most of the compiler's code.
 
 In order to compile Nimrod correctly, type-checking has to be seperated from
-parsing. Otherwise generics would not work. Code generation is done for a 
-whole module only after it has been checked for semantics.
+parsing. Otherwise generics would not work.
 
 .. include:: filelist.txt
 
-The first command line argument selects the backend. Thus the backend is
-responsible for calling the parser and semantic checker. However, when 
-compiling ``import`` or ``include`` statements, the semantic checker needs to
-call the backend, this is done by embedding a PBackend into a TContext.
-
 
 The syntax tree
 ---------------
@@ -265,7 +290,7 @@ may contain cycles. The AST changes its shape after semantic checking. This
 is needed to make life easier for the code generators. See the "ast" module
 for the type definitions.
 
-We use the notation ``nodeKind(fields, [sons])`` for describing
+I use the notation ``nodeKind(fields, [sons])`` for describing
 nodes. ``nodeKind[sons]`` is a short-cut for ``nodeKind([sons])``.
 XXX: Description of the language's syntax and the corresponding trees.
 
@@ -273,12 +298,16 @@ XXX: Description of the language's syntax and the corresponding trees.
 How the RTL is compiled
 =======================
 
-The system module contains the part of the RTL which needs support by
+The ``system`` module contains the part of the RTL which needs support by
 compiler magic (and the stuff that needs to be in it because the spec
 says so). The C code generator generates the C code for it just like any other
 module. However, calls to some procedures like ``addInt`` are inserted by
-the CCG. Therefore the module ``magicsys`` contains a table
-(``compilerprocs``) with all symbols that are marked as ``compilerproc``.
+the CCG. Therefore the module ``magicsys`` contains a table (``compilerprocs``)
+with all symbols that are marked as ``compilerproc``. ``compilerprocs`` are
+needed by the code generator. A ``magic`` proc is not the same as a
+``compilerproc``: A ``magic`` is a proc that needs compiler magic for its
+semantic checking, a ``compilerproc`` is a proc that is used by the code
+generator.
 
 
 
@@ -290,77 +319,3 @@ underlying C compiler already does all the hard work for us. The problem is the
 common runtime library, especially the memory manager. Note that Borland's
 Delphi had exactly the same problem. The workaround is to not link the GC with
 the Dll and provide an extra runtime dll that needs to be initialized.
-
-
-
-How to implement closures
-=========================
-
-A closure is a record of a proc pointer and a context ref. The context ref
-points to a garbage collected record that contains the needed variables.
-An example:
-
-.. code-block:: Nimrod
-
-  type
-    TListRec = record
-      data: string
-      next: ref TListRec
-
-  proc forEach(head: ref TListRec, visitor: proc (s: string) {.closure.}) =
-    var it = head
-    while it != nil:
-      visit(it.data)
-      it = it.next
-
-  proc sayHello() =
-    var L = new List(["hallo", "Andreas"])
-    var temp = "jup\xff"
-    forEach(L, lambda(s: string) =
-                 io.write(temp)
-                 io.write(s)
-           )
-
-
-This should become the following in C:
-
-.. code-block:: C
-  typedef struct ... /* List type */
-
-  typedef struct closure {
-    void (*PrcPart)(string, void*);
-    void* ClPart;
-  }
-
-  typedef struct Tcl_data {
-    string temp; // all accessed variables are put in here!
-  }
-
-  void forEach(TListRec* head, const closure visitor) {
-    TListRec* it = head;
-    while (it != NIM_NULL) {
-      visitor.prc(it->data, visitor->cl_data);
-      it = it->next;
-    }
-  }
-
-  void printStr(string s, void* cl_data) {
-    Tcl_data* x = (Tcl_data*) cl_data;
-    io_write(x->temp);
-    io_write(s);
-  }
-
-  void sayhello() {
-    Tcl_data* data = new(...);
-    asgnRef(&data->temp, "jup\xff");
-    ...
-
-    closure cl;
-    cl.prc = printStr;
-    cl.cl_data = data;
-    foreach(L, cl);
-  }
-
-
-What about nested closure? - There's not much difference: Just put all used
-variables in the data record.