diff options
author | Andreas Rumpf <andreas@andi> | 2008-06-22 16:14:11 +0200 |
---|---|---|
committer | Andreas Rumpf <andreas@andi> | 2008-06-22 16:14:11 +0200 |
commit | 405b86068e6a3d39970b9129ceec0a9108464b28 (patch) | |
tree | c0449946f54baae6ea88baf453157ddd7faa8f86 /doc/intern.txt | |
download | Nim-405b86068e6a3d39970b9129ceec0a9108464b28.tar.gz |
Initial import
Diffstat (limited to 'doc/intern.txt')
-rwxr-xr-x | doc/intern.txt | 575 |
1 files changed, 575 insertions, 0 deletions
diff --git a/doc/intern.txt b/doc/intern.txt new file mode 100755 index 000000000..4f1a7a15f --- /dev/null +++ b/doc/intern.txt @@ -0,0 +1,575 @@ +========================================= + Internals of the Nimrod Compiler +========================================= + + +:Author: Andreas Rumpf +:Version: |nimrodversion| + +.. contents:: + + +Directory structure +=================== + +The Nimrod project's directory structure is: + +============ ============================================== +Path Purpose +============ ============================================== +``bin`` binary files go into here +``nim`` Pascal sources of the Nimrod compiler; this + should be modified, not the Nimrod version in + ``rod``! +``rod`` Nimrod sources of the Nimrod compiler; + automatically generated from the Pascal + version +``data`` data files that are used for generating source + code go into here +``doc`` the documentation lives here; it is a bunch of + reStructuredText files +``dist`` download packages as zip archives go into here +``config`` configuration files for Nimrod go into here +``lib`` the Nimrod library lives here; ``rod`` depends + on it! +``web`` website of Nimrod; generated by ``genweb.py`` + from the ``*.txt`` and ``*.tmpl`` files +``koch`` the Koch Build System (written for Nimrod) +``obj`` generated ``*.obj`` files go into here +============ ============================================== + + +Bootstrapping the compiler +========================== + +The compiler is written in a subset of Pascal with special annotations so +that it can be translated to Nimrod code automatically. This conversion is +done by Nimrod itself via the undocumented ``boot`` command. Thus both Nimrod +and Free Pascal can compile the Nimrod compiler. + +Requirements for bootstrapping: + +- Free Pascal (I used version 2.2); it may not be needed +- Python (should work with 2.4 or higher) and the code generator *cog* + (included in this distribution!) + +- C compiler -- one of: + + * win32-lcc + * Borland C++ (tested with 5.5) + * Microsoft C++ + * Digital Mars C++ + * Watcom C++ (currently broken; a fix is welcome!) + * GCC + * Intel C++ + * Pelles C + * llvm-gcc + +| Compiling the compiler is a simple matter of running: +| ``koch.py boot`` +| Or you can compile by hand, this is not difficult. + +If you want to debug the compiler, use the command:: + + koch.py boot --debugger:on + +The ``koch.py`` script is Nimrod's maintainance script: Everything that has +been automated is accessible with it. It is a replacement for make and shell +scripting with the advantage that it is more portable and is easier to read. + + +Coding standards +================ + +The compiler is written in a subset of Pascal with special annotations so +that it can be translated to Nimrod code automatically. As a generell rule, +Pascal code that does not translate to Nimrod automatically is forbidden. + + +Porting to new platforms +======================== + +Porting Nimrod to a new architecture is pretty easy, since C is the most +portable programming language (within certain limits) and Nimrod generates +C code, porting the code generator is not necessary. + +POSIX-compliant systems on conventional hardware are usually pretty easy to +port: Add the platform to ``platform`` (if it is not already listed there), +check that the OS, System modules work and recompile Nimrod. + +The only case where things aren't as easy is when the garbage +collector needs some assembler tweaking to work. The standard +version of the GC uses C's ``setjmp`` function to store all registers +on the hardware stack. It may be that the new platform needs to +replace this generic code by some assembler code. + + +Runtime type information +======================== + +*Runtime type information* (RTTI) is needed for several aspects of the Nimrod +programming language: + +Garbage collection + The most important reason for RTTI. Generating + traversal procedures produces bigger code and is likely to be slower on + modern hardware as dynamic procedure binding is hard to predict. + +Complex assignments + Sequences and strings are implemented as + pointers to resizeable buffers, but Nimrod requires copying for + assignments. Apart from RTTI the compiler could generate copy procedures + for any type that needs one. However, this would make the code bigger and + the RTTI is likely already there for the GC. + +We already knew the type information as a graph in the compiler. +Thus we need to serialize this graph as RTTI for C code generation. +Look at the files ``lib/typeinfo.nim``, ``lib/hti.nim`` for more information. + +However, generating type information proved to be difficult and the format +wastes memory. Variant records make problems too. We use a mix of iterator +procedures and constant data structures: + +.. code-block:: Nimrod + type + TNimTypeSlot {.export.} = record + offset: int + typ: int + name: CString + TSlotIterator = proc (obj: pointer, field: int): ptr TNimTypeSlot + TNimType {.export.} = record + Kind: TNimTypeKind + baseType, indexType: int + size, len: int + slots: TSlotIterator # instead of: ptr array [0..10_000, TNimTypeSlot] + +This is not easy to understand either. Best is to use just the ``rodgen`` +module and store type information as string constants. + +After thinking I came to the conclusion that this is again premature +optimization. We should just construct the type graph at runtime. In the init +section new types should be constructed and registered: + +.. code-block:: Nimrod + type + TSlotTriple = record + offset: int + typ: PRTL_Type + name: Cstring + PSlots = ptr TSlots + TSlots = record + case kind + of linear: + fields: array [TSlotTriple] + of nested: + discriminant: TSlotTriple + otherSlots: array [discriminant, PSlots] + + TTypeKind = enum ... + RTL_Type = record + size: int + base: PRTL_Type + case kind + of tyArray, tySequence: + elemSize: int + of tyRecord, tyObject, tyEnum: + slots: PSlots + + +The Garbage Collector +===================== + +Introduction +------------ + +We use the term *cell* here to refer to everything that is traced +(sequences, refs, strings). +This section describes how the new GC works. The old algorithms +all had the same problem: Too complex to get them right. This one +tries to find the right compromise. + +The basic algorithm is *Deferrent reference counting* with cycle detection. +References in the stack are not counted for better performance and easier C +code generation. The GC starts by traversing the hardware stack and increments +the reference count (RC) of every cell that it encounters. After the GC has +done its work the stack is traversed again and the RC of every cell +that it encounters are decremented again. Thus no marking bits in the RC are +needed. Between these stack traversals the GC has a complete accurate view over +the RCs. + +Each cell has a header consisting of a RC and a pointer to its type +descriptor. However the program does not know about these, so they are placed at +negative offsets. In the GC code the type ``PCell`` denotes a pointer +decremented by the right offset, so that the header can be accessed easily. It +is extremely important that ``pointer`` is not confused with a ``PCell`` +as this would lead to a memory corruption. + + +When to trigger a collection +---------------------------- + +Since there are really two different garbage collectors (reference counting +and mark and sweep) we use two different heuristics when to run the passes. +The RC-GC pass is fairly cheap: Thus we use an additive increase (7 pages) +for the RC_Threshold and a multiple increase for the CycleThreshold. + + +The AT and ZCT sets +------------------- + +The GC maintains two sets throughout the lifetime of +the program (plus two temporary ones). The AT (*any table*) simply contains +every cell. The ZCT (*zero count table*) contains every cell whose RC is +zero. This is used to reclaim most cells fast. + +The ZCT contains redundant information -- the AT alone would suffice. +However, traversing the AT and look if the RC is zero would touch every living +cell in the heap! That's why the ZCT is updated whenever a RC drops to zero. +The ZCT is not updated when a RC is incremented from zero to one, as +this would be too costly. + + +The CellSet data structure +-------------------------- + +The AT and ZCT depend on an extremely efficient datastructure for storing a +set of pointers - this is called a ``PCellSet`` in the source code. +Inserting, deleting and searching are done in constant time. However, +modifying a ``PCellSet`` during traversation leads to undefined behaviour. + +.. code-block:: Nimrod + type + PCellSet # hidden + + proc allocCellSet: PCellSet # make a new set + proc deallocCellSet(s: PCellSet) # empty the set and free its memory + proc incl(s: PCellSet, elem: PCell) # include an element + proc excl(s: PCellSet, elem: PCell) # exclude an element + + proc `in`(elem: PCell, s: PCellSet): bool + + iterator elements(s: PCellSet): (elem: PCell) + + +All the operations have to be performed efficiently. Because a Cellset can +become huge (the AT contains every allocated cell!) a hash table is not +suitable for this. + +We use a mixture of bitset and patricia tree for this. One node in the +patricia tree contains a bitset that decribes a page of the operating system +(not always, but that doesn't matter). +So including a cell is done as follows: + +- Find the page descriptor for the page the cell belongs to. +- Set the appropriate bit in the page descriptor indicating that the + cell points to the start of a memory block. + +Removing a cell is analogous - the bit has to be set to zero. +Single page descriptors are never deleted from the tree. Typically a page +descriptor is only 19 words big, so it does not waste much by not deleting +it. Apart from that the AT and ZCT are rebuilt frequently, so removing a +single page descriptor from the tree is never necessary. + +Complete traversal is done like so:: + + for each page decriptor d: + for each bit in d: + if bit == 1: + traverse the pointer belonging to this bit + + +Further complications +--------------------- + +In Nimrod the compiler cannot always know if a reference +is stored on the stack or not. This is caused by var parameters. +Consider this example: + +.. code-block:: Nimrod + proc setRef(r: var ref TNode) = + new(r) + + proc usage = + var + r: ref TNode + setRef(r) # here we should not update the reference counts, because + # r is on the stack + setRef(r.left) # here we should update the refcounts! + +Though it would be possible to produce code updating the refcounts (if +necessary) before and after the call to ``setRef``, it is a complex task to +do so in the code generator. So we don't and instead decide at runtime +whether the reference is on the stack or not. The generated code looks +roughly like this: + +.. code-block:: C + void setref(TNode** ref) { + unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode))) + } + void usage(void) { + setRef(&r) + setRef(&r->left) + } + +Note that for systems with a continous stack (which most systems have) +the check whether the ref is on the stack is very cheap (only two +comparisons). Another advantage of this scheme is that the code produced is +a tiny bit smaller. + + +The algorithm in pseudo-code +---------------------------- +Now we come to the nitty-gritty. The algorithm works in several phases. + +Phase 1 - Consider references from stack +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:: + + for each pointer p in the stack: incRef(p) + +This is necessary because references in the hardware stack are not traced for +better performance. After Phase 1 the RCs are accurate. + + +Phase 2 - Free the ZCT +~~~~~~~~~~~~~~~~~~~~~~ +This is how things used to (not) work:: + + for p in elements(ZCT): + if RC(p) == 0: + call finalizer of p + for c in children(p): decRef(c) # free its children recursively + # if necessary; the childrens RC >= 1, BUT they may still be in the ZCT! + free(p) + else: + remove p from the ZCT + +Instead we do it this way. Note that the recursion is gone too! +:: + + newZCT = nil + for p in elements(ZCT): + if RC(p) == 0: + call finalizer of p + for c in children(p): + assert(RC(c) > 0) + dec(RC(c)) + if RC(c) == 0: + if newZCT == nil: newZCT = allocCellSet() + incl(newZCT, c) + free(p) + else: + # nothing to do! We will use the newZCS + + deallocCellSet(ZCT) + ZCT = newZCT + +This phase is repeated until enough memory is available or the ZCT is nil. +If still not enough memory is available the cyclic detector gets its chance +to do something. + + +Phase 3 - Cycle detection +~~~~~~~~~~~~~~~~~~~~~~~~~ +Cycle detection works by subtracting internal reference counts:: + + newAT = allocCellSet() + + for y in elements(AT): + # pretend that y is dead: + for c in children(y): + dec(RC(c)) + # note that this should not be done recursively as we have all needed + # pointers in the AT! This makes it more efficient too! + + proc restore(y: PCell) = + # unfortunately, the recursion here cannot be eliminated easily + if y not_in newAT: + incl(newAT, y) + for c in children(y): + inc(RC(c)) # restore proper reference counts! + restore(c) + + for y in elements(AT) with rc > 0: + restore(y) + + for y in elements(AT) with rc == 0: + free(y) # if pretending worked, it was part of a cycle + + AT = newAT + + +Phase 4 - Ignore references from stack again +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:: + + for each pointer p in the stack: + dec(RC(p)) + if RC(p) == 0: incl(ZCT, p) + +Now the RCs correctly discard any references from the stack. One can also see +this as a temporary marking operation. Things that are referenced from stack +are marked during the GC's operarion and now have to be unmarked. + + + +The compiler's architecture +=========================== + +Nimrod uses the classic compiler architecture: A scanner feds tokens to a +parser. The parser builds a syntax tree that is used by the code generator. +This syntax tree is the interface between the parser and the code generator. +It is essential to understand most of the compiler's code. + +In order to compile Nimrod correctly, type-checking has to be seperated from +parsing. Otherwise generics would not work. Code generation is done for a +whole module only after it has been checked for semantics. + +.. include:: filelist.txt + +The first command line argument selects the backend. Thus the backend is +responsible for calling the parser and semantic checker. However, when +compiling ``import`` or ``include`` statements, the semantic checker needs to +call the backend, this is done by embedding a PBackend into a TContext. + + +The syntax tree +--------------- +The synax tree consists of nodes which may have an arbitrary number of +children. Types and symbols are represented by other nodes, because they +may contain cycles. The AST changes its shape after semantic checking. This +is needed to make life easier for the code generators. See the "ast" module +for the type definitions. + +We use the notation ``nodeKind(fields, [sons])`` for describing +nodes. ``nodeKind[sons]`` is a short-cut for ``nodeKind([sons])``. +XXX: Description of the language's syntax and the corresponding trees. + + +How the RTL is compiled +======================= + +The system module contains the part of the RTL which needs support by +compiler magic (and the stuff that needs to be in it because the spec +says so). The C code generator generates the C code for it just like any other +module. However, calls to some procedures like ``addInt`` are inserted by +the CCG. Therefore the module ``magicsys`` contains a table +(``compilerprocs``) with all symbols that are marked as ``compilerproc``. + + +How separate compilation will work +================================== + +Soon compiling from scratch every module that's needed will become too slow as +programs grow. For easier cleaning all generated files are generated in the +directory: ``$base/rod_gen``. This cannot be changed. The generated C files +get the names of the modules they result from. A compiled Nimrod module has the +extension ``.rod`` and is a binary file. The format may change from release +to release. The rod-file is mostly a binary representation of the parse trees. + +Nimrod currently compiles any module into its own C file. Some things like +type-information, common string literals, common constant sets need to be +shared though. We deal with this problem by writing the shared data +in the main C file. Only "headers" are generated in the other modules. However, +each precompiled Nimrod module lists the shared data it depends on. The same +holds for procedures that have to generated from generics. + +A big problem is that the data must get the same name each time it is compiled. + +The C compiler is only called for the C files, that changed after the last +compilation (or if their object file does not exist anymore). To work +reliably, in the header comment of the C file these things are listed, so +that the C compiler is called again should they change: + +* Nimrod's Version +* the target CC +* the target OS +* the target CPU + +The version is questionable: If the resulting C file is the same, it does not +matter that Nimrods's version has increased. We do it anyway to be on the safe +side. + + +Generation of dynamic link libraries +==================================== + +Generation of dynamic link libraries or shared libraries is not difficult; the +underlying C compiler already does all the hard work for us. The problem is the +common runtime library, especially the memory manager. Note that Borland's +Delphi had exactly the same problem. The workaround is to not link the GC with +the Dll and provide an extra runtime dll that needs to be initialized. + + + +How to implement closures +========================= + +A closure is a record of a proc pointer and a context ref. The context ref +points to a garbage collected record that contains the needed variables. +An example: + +.. code-block:: Nimrod + + type + TListRec = record + data: string + next: ref TListRec + + proc forEach(head: ref TListRec, visitor: proc (s: string) {.closure.}) = + var it = head + while it != nil: + visit(it.data) + it = it.next + + proc sayHello() = + var L = new List(["hallo", "Andreas"]) + var temp = "jup\xff" + forEach(L, lambda(s: string) = + io.write(temp) + io.write(s) + ) + + +This should become the following in C: + +.. code-block:: C + typedef struct ... /* List type */ + + typedef struct closure { + void (*PrcPart)(string, void*); + void* ClPart; + } + + typedef struct Tcl_data { + string temp; // all accessed variables are put in here! + } + + void forEach(TListRec* head, const closure visitor) { + TListRec* it = head; + while (it != NIM_NULL) { + visitor.prc(it->data, visitor->cl_data); + it = it->next; + } + } + + void printStr(string s, void* cl_data) { + Tcl_data* x = (Tcl_data*) cl_data; + io_write(x->temp); + io_write(s); + } + + void sayhello() { + Tcl_data* data = new(...); + asgnRef(&data->temp, "jup\xff"); + ... + + closure cl; + cl.prc = printStr; + cl.cl_data = data; + foreach(L, cl); + } + + +What about nested closure? - There's not much difference: Just put all used +variables in the data record. |