summary refs log tree commit diff stats
path: root/doc/intern.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/intern.txt')
-rwxr-xr-xdoc/intern.txt417
1 files changed, 0 insertions, 417 deletions
diff --git a/doc/intern.txt b/doc/intern.txt
deleted file mode 100755
index c45765b23..000000000
--- a/doc/intern.txt
+++ /dev/null
@@ -1,417 +0,0 @@
-=========================================
-    Internals of the Nimrod Compiler
-=========================================
-
-
-:Author: Andreas Rumpf
-:Version: |nimrodversion|
-
-.. contents::
-
-  Abstraction is layering ignorance on top of reality. -- unknown
-
-
-Directory structure
-===================
-
-The Nimrod project's directory structure is:
-
-============   ==============================================
-Path           Purpose
-============   ==============================================
-``bin``        generated binary files
-``build``      generated C code for the installation
-``nim``        Pascal sources of the Nimrod compiler; this
-               should be modified, not the Nimrod version in
-               ``rod``!
-``rod``        Nimrod sources of the Nimrod compiler;
-               automatically generated from the Pascal
-               version
-``data``       data files that are used for generating source
-               code
-``doc``        the documentation lives here; it is a bunch of
-               reStructuredText files
-``dist``       additional packages for the distribution
-``config``     configuration files for Nimrod
-``lib``        the Nimrod library lives here; ``rod`` depends
-               on it!
-``web``        website of Nimrod; generated by ``koch.py``
-               from the ``*.txt`` and ``*.tmpl`` files
-``obj``        generated ``*.obj`` files
-============   ==============================================
-
-
-Bootstrapping the compiler
-==========================
-
-The compiler is written in a subset of Pascal with special annotations so
-that it can be translated to Nimrod code automatically. This conversion is
-done by Nimrod itself via the undocumented ``boot`` command. Thus both Nimrod
-and Free Pascal can compile the Nimrod compiler. However, the Pascal version
-has no garbage collector and leaks memory! So the Pascal version should only 
-be used for bootstrapping.
-
-Requirements for bootstrapping:
-
-- Python (should work with version 1.5 or higher) (optional)
-- supported C compiler
-
-Compiling the compiler is a simple matter of running::
-
-  koch.py boot
-
-For a release version use::
-
-  koch.py boot -d:release
-
-The ``koch.py`` script is Nimrod's maintainance script. It is a replacement for
-make and shell scripting with the advantage that it is much more portable.
-
-If you don't have Python, there is a ``boot`` Nimrod program which does roughly
-the same::
-
-  nimrod cc boot.nim
-  ./boot [-d:release]
-
-
-Pascal annotations
-==================
-There are some annotations that the Pascal sources use so that they can
-be converted to Nimrod automatically:
-
-``{@discard} <expr>``
-    Tells the compiler that a ``discard`` statement is needed for Nimrod
-    here.
-
-``{@cast}typ(expr)``
-    Tells the compiler that the Pascal conversion is a ``cast`` in Nimrod.
-
-``{@emit <code>}``
-    Emits ``<code>``. The code fragment needs to be in Pascal syntax.
-
-``{@ignore} <codeA> {@emit <codeB>}``
-    Ignores ``<codeA>`` and instead emits ``<codeB>`` which needs to be in
-    Pascal syntax. An empty ``{@emit}`` is possible too (it then only closes
-    the ``<codeA>`` part).
-
-``record {@tuple}``
-    Is used to tell the compiler that the record type should be transformed
-    to a Nimrod tuple type.
-
-``^ {@ptr}``
-    Is used to tell the compiler that the pointer type should be transformed
-    to a Nimrod ``ptr`` type. The default is a ``ref`` type.
-
-``'a' + ''``
-    The idiom ``+''`` is used to tell the compiler that it is a string
-    literal and not a character literal. (Pascal does not distinguish between
-    character literals and string literals of length 1.)
-
-``+{&}``
-    This tells the compiler that Pascal's ``+`` here is a string concatenation
-    and thus should be converted to ``&``. Note that this is not needed if
-    any of the operands is a string literal because the compiler then can
-    figure this out by itself.
-
-``{@set}['a', 'b', 'c']``
-    Tells the compiler that Pascal's ``[]`` constructor is a set and not an
-    array. This is only needed if the compiler cannot figure this out for
-    itself.
-
-
-Porting to new platforms
-========================
-
-Porting Nimrod to a new architecture is pretty easy, since C is the most
-portable programming language (within certain limits) and Nimrod generates
-C code, porting the code generator is not necessary.
-
-POSIX-compliant systems on conventional hardware are usually pretty easy to
-port: Add the platform to ``platform`` (if it is not already listed there),
-check that the OS, System modules work and recompile Nimrod.
-
-The only case where things aren't as easy is when the garbage
-collector needs some assembler tweaking to work. The standard
-version of the GC uses C's ``setjmp`` function to store all registers
-on the hardware stack. It may be necessary that the new platform needs to
-replace this generic code by some assembler code.
-
-
-Runtime type information
-========================
-
-*Runtime type information* (RTTI) is needed for several aspects of the Nimrod
-programming language:
-
-Garbage collection
-  The most important reason for RTTI. Generating
-  traversal procedures produces bigger code and is likely to be slower on
-  modern hardware as dynamic procedure binding is hard to predict.
-
-Complex assignments
-  Sequences and strings are implemented as
-  pointers to resizeable buffers, but Nimrod requires copying for
-  assignments. Apart from RTTI the compiler could generate copy procedures
-  for any type that needs one. However, this would make the code bigger and
-  the RTTI is likely already there for the GC.
-
-We already know the type information as a graph in the compiler.
-Thus we need to serialize this graph as RTTI for C code generation.
-Look at the file ``lib/system/hti.nim`` for more information.
-
-
-The Garbage Collector
-=====================
-
-Introduction
-------------
-
-I use the term *cell* here to refer to everything that is traced
-(sequences, refs, strings).
-This section describes how the new GC works.
-
-The basic algorithm is *Deferrent Reference Counting* with cycle detection.
-References in the stack are not counted for better performance and easier C
-code generation.
-
-Each cell has a header consisting of a RC and a pointer to its type
-descriptor. However the program does not know about these, so they are placed at
-negative offsets. In the GC code the type ``PCell`` denotes a pointer
-decremented by the right offset, so that the header can be accessed easily. It
-is extremely important that ``pointer`` is not confused with a ``PCell``
-as this would lead to a memory corruption.
-
-
-The CellSet data structure
---------------------------
-
-The GC depends on an extremely efficient datastructure for storing a
-set of pointers - this is called a ``TCellSet`` in the source code.
-Inserting, deleting and searching are done in constant time. However,
-modifying a ``TCellSet`` during traversation leads to undefined behaviour.
-
-.. code-block:: Nimrod
-  type
-    TCellSet # hidden
-
-  proc CellSetInit(s: var TCellSet) # initialize a new set
-  proc CellSetDeinit(s: var TCellSet) # empty the set and free its memory
-  proc incl(s: var TCellSet, elem: PCell) # include an element
-  proc excl(s: var TCellSet, elem: PCell) # exclude an element
-
-  proc `in`(elem: PCell, s: TCellSet): bool # tests membership
-
-  iterator elements(s: TCellSet): (elem: PCell)
-
-
-All the operations have to perform efficiently. Because a Cellset can
-become huge a hash table alone is not suitable for this.
-
-We use a mixture of bitset and hash table for this. The hash table maps *pages*
-to a page descriptor. The page descriptor contains a bit for any possible cell
-address within this page. So including a cell is done as follows:
-
-- Find the page descriptor for the page the cell belongs to.
-- Set the appropriate bit in the page descriptor indicating that the
-  cell points to the start of a memory block.
-
-Removing a cell is analogous - the bit has to be set to zero.
-Single page descriptors are never deleted from the hash table. This is not
-needed as the data structures needs to be rebuilt periodically anyway.
-
-Complete traversal is done in this way::
-
-  for each page decriptor d:
-    for each bit in d:
-      if bit == 1:
-        traverse the pointer belonging to this bit
-
-
-Further complications
----------------------
-
-In Nimrod the compiler cannot always know if a reference
-is stored on the stack or not. This is caused by var parameters.
-Consider this example:
-
-.. code-block:: Nimrod
-  proc setRef(r: var ref TNode) =
-    new(r)
-
-  proc usage =
-    var
-      r: ref TNode
-    setRef(r) # here we should not update the reference counts, because
-              # r is on the stack
-    setRef(r.left) # here we should update the refcounts!
-
-We have to decide at runtime whether the reference is on the stack or not. 
-The generated code looks roughly like this:
-
-.. code-block:: C
-  void setref(TNode** ref) {
-    unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
-  }
-  void usage(void) {
-    setRef(&r)
-    setRef(&r->left)
-  }
-
-Note that for systems with a continous stack (which most systems have)
-the check whether the ref is on the stack is very cheap (only two
-comparisons).
-
-
-The compiler's architecture
-===========================
-
-Nimrod uses the classic compiler architecture: A scanner feds tokens to a
-parser. The parser builds a syntax tree that is used by the code generator.
-This syntax tree is the interface between the parser and the code generator.
-It is essential to understand most of the compiler's code.
-
-In order to compile Nimrod correctly, type-checking has to be seperated from
-parsing. Otherwise generics would not work.
-
-.. include:: filelist.txt
-
-
-The syntax tree
----------------
-The synax tree consists of nodes which may have an arbitrary number of
-children. Types and symbols are represented by other nodes, because they
-may contain cycles. The AST changes its shape after semantic checking. This
-is needed to make life easier for the code generators. See the "ast" module
-for the type definitions. The `macros <macros.html>`_ module contains many
-examples how the AST represents each syntactic structure. 
-
-
-How the RTL is compiled
-=======================
-
-The ``system`` module contains the part of the RTL which needs support by
-compiler magic (and the stuff that needs to be in it because the spec
-says so). The C code generator generates the C code for it just like any other
-module. However, calls to some procedures like ``addInt`` are inserted by
-the CCG. Therefore the module ``magicsys`` contains a table (``compilerprocs``)
-with all symbols that are marked as ``compilerproc``. ``compilerprocs`` are
-needed by the code generator. A ``magic`` proc is not the same as a
-``compilerproc``: A ``magic`` is a proc that needs compiler magic for its
-semantic checking, a ``compilerproc`` is a proc that is used by the code
-generator.
-
-
-Debugging Nimrod's memory management
-====================================
-
-The following paragraphs are mostly a reminder for myself. Things to keep
-in mind:
-
-* Segmentation faults can have multiple reasons: One that is frequently
-  forgotten is that *stack overflow* can trigger one!
-* If an assertion in Nimrod's memory manager or GC fails, the stack trace
-  keeps allocating memory! Thus a stack overflow may happen, hiding the
-  real issue. 
-* What seem to be C code generation problems is often a bug resulting from
-  not producing prototypes, so that some types default to ``cint``. Testing
-  without the ``-w`` option helps!
-
-
-Generation of dynamic link libraries
-====================================
-
-Generation of dynamic link libraries or shared libraries is not difficult; the
-underlying C compiler already does all the hard work for us. The problem is the
-common runtime library, especially the memory manager. Note that Borland's
-Delphi had exactly the same problem. The workaround is to not link the GC with
-the Dll and provide an extra runtime dll that needs to be initialized.
-
-
-Code generation for closures
-============================
-
-Example code:
-
-.. code-block:: nimrod
-  proc add(x: int): proc (y: int): int {.closure.} = 
-    return lambda (y: int): int = 
-      return x + y
-
-  var add2 = add(2)
-  echo add2(5) #OUT 7
-
-This should produce roughly this code:
-
-.. code-block:: nimrod
-  type
-    PClosure = ref object 
-      fn: proc (x: int, c: PClosure): int
-      x: int # data
-  
-  proc wasLambda(y: int, c: PClosure): int = 
-    return y + c.x
-  
-  proc add(x: int): PClosure = 
-    var c: PClosure
-    new(c)
-    c.x = x
-    c.fn = wasLambda
-  
-  var add2 = add(2)
-  echo add2.fn(5, add2)
-
-
-Beware of nesting:
-
-.. code-block:: nimrod
-  proc add(x: int): proc (y: int): proc (z: int): int {.closure.} {.closure.} = 
-    return lamba (y: int): proc (z: int): int {.closure.} = 
-      return lambda (z: int): int = 
-        return x + y + z
-
-  var add24 = add(2)(4)
-  echo add24(5) #OUT 11
-
-This should produce roughly this code:
-
-.. code-block:: nimrod
-  type
-    PClosure1 = ref object 
-      fn: proc (x: int, c: PClosure1): int
-      x: int # data
-      
-    PClosure2 = ref object 
-      fn: proc (x: int, c: PClosure2): int
-      y: int
-      c1: PClosure1
-      
-  
-  proc innerLambda(z: int, c2: PClosure2): int = 
-    return c2.c1.x + c2.y + z
-  
-  proc outerLambda1(y: int, c1: PClosure1): PClosure2 = 
-    new(result)
-    result.c1 = c1
-    result.y = y
-    result.fn = innerLambda
-  
-  proc add(x: int): PClosure1 = 
-    new(result)
-    result.x = x
-    result.fn = outerLambda
-  
-  var tmp = add(2)
-  var tmp2 = tmp.fn(4, tmp)
-  var add24 = tmp2.fn(4, tmp2)
-  echo add24(5)
-
-
-Accumulator
------------
-
-.. code-block:: nimrod
-  proc GetAccumulator(start: int): proc (): int {.closure} = 
-    var i = start
-    return lambda: int = 
-      inc i
-      return i