summary refs log tree commit diff stats
path: root/doc/intern.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/intern.txt')
-rw-r--r--doc/intern.txt568
1 files changed, 0 insertions, 568 deletions
diff --git a/doc/intern.txt b/doc/intern.txt
deleted file mode 100644
index d0aaa283a..000000000
--- a/doc/intern.txt
+++ /dev/null
@@ -1,568 +0,0 @@
-=========================================
-    Internals of the Nim Compiler
-=========================================
-
-
-:Author: Andreas Rumpf
-:Version: |nimversion|
-
-.. contents::
-
-  "Abstraction is layering ignorance on top of reality." -- unknown
-
-
-Directory structure
-===================
-
-The Nim project's directory structure is:
-
-============   ==============================================
-Path           Purpose
-============   ==============================================
-``bin``        generated binary files
-``build``      generated C code for the installation
-``compiler``   the Nim compiler itself; note that this
-               code has been translated from a bootstrapping
-               version written in Pascal, so the code is **not**
-               a poster child of good Nim code
-``config``     configuration files for Nim
-``dist``       additional packages for the distribution
-``doc``        the documentation; it is a bunch of
-               reStructuredText files
-``lib``        the Nim library
-``web``        website of Nim; generated by ``nimweb``
-               from the ``*.txt`` and ``*.tmpl`` files
-============   ==============================================
-
-
-Bootstrapping the compiler
-==========================
-
-As of version 0.8.5 the compiler is maintained in Nim. (The first versions
-have been implemented in Object Pascal.) The Python-based build system has
-been rewritten in Nim too.
-
-Compiling the compiler is a simple matter of running::
-
-  nim c koch.nim
-  ./koch boot
-
-For a release version use::
-
-  nim c koch.nim
-  ./koch boot -d:release
-
-And for a debug version compatible with GDB::
-
-  nim c koch.nim
-  ./koch boot --debuginfo --linedir:on
-
-The ``koch`` program is Nim's maintenance script. It is a replacement for
-make and shell scripting with the advantage that it is much more portable.
-More information about its options can be found in the `koch <koch.html>`_
-documentation.
-
-
-Coding Guidelines
-=================
-
-* Use CamelCase, not underscored_identifiers.
-* Indent with two spaces.
-* Max line length is 80 characters.
-* Provide spaces around binary operators if that enhances readability.
-* Use a space after a colon, but not before it.
-* Start types with a capital ``T``, unless they are pointers/references which
-  start with ``P``.
-
-See also the `API naming design <apis.html>`_ document.
-
-
-Porting to new platforms
-========================
-
-Porting Nim to a new architecture is pretty easy, since C is the most
-portable programming language (within certain limits) and Nim generates
-C code, porting the code generator is not necessary.
-
-POSIX-compliant systems on conventional hardware are usually pretty easy to
-port: Add the platform to ``platform`` (if it is not already listed there),
-check that the OS, System modules work and recompile Nim.
-
-The only case where things aren't as easy is when the garbage
-collector needs some assembler tweaking to work. The standard
-version of the GC uses C's ``setjmp`` function to store all registers
-on the hardware stack. It may be necessary that the new platform needs to
-replace this generic code by some assembler code.
-
-
-Runtime type information
-========================
-
-*Runtime type information* (RTTI) is needed for several aspects of the Nim
-programming language:
-
-Garbage collection
-  The most important reason for RTTI. Generating
-  traversal procedures produces bigger code and is likely to be slower on
-  modern hardware as dynamic procedure binding is hard to predict.
-
-Complex assignments
-  Sequences and strings are implemented as
-  pointers to resizeable buffers, but Nim requires copying for
-  assignments. Apart from RTTI the compiler could generate copy procedures
-  for any type that needs one. However, this would make the code bigger and
-  the RTTI is likely already there for the GC.
-
-We already know the type information as a graph in the compiler.
-Thus we need to serialize this graph as RTTI for C code generation.
-Look at the file ``lib/system/hti.nim`` for more information.
-
-
-Debugging the compiler
-======================
-
-You can of course use GDB or Visual Studio to debug the
-compiler (via ``--debuginfo --lineDir:on``). However, there
-are also lots of procs that aid in debugging:
-
-
-.. code-block:: nim
-  # pretty prints the Nim AST
-  echo renderTree(someNode)
-  # outputs some JSON representation
-  debug(someNode)
-  # pretty prints some type
-  echo typeToString(someType)
-  debug(someType)
-  echo symbol.name.s
-  debug(symbol)
-  # pretty prints the Nim ast, but annotates symbol IDs:
-  echo renderTree(someNode, {renderIds})
-  if n.info ?? "temp.nim":
-    # only output when it comes from "temp.nim"
-    echo renderTree(n)
-  if n.info ?? "temp.nim":
-    # why does it process temp.nim here?
-    writeStackTrace()
-
-To create a new compiler for each run, use ``koch temp``::
-
-  ./koch temp c /tmp/test.nim
-
-``koch temp`` creates a debug build of the compiler, which is useful
-to create stacktraces for compiler debugging.
-
-``koch temp`` returns 125 as the exit code in case the compiler
-compilation fails. This exit code tells ``git bisect`` to skip the
-current commit.::
-
-  git bisect start bad-commit good-commit
-  git bisect ./koch -r c test-source.nim
-
-The compiler's architecture
-===========================
-
-Nim uses the classic compiler architecture: A lexer/scanner feds tokens to a
-parser. The parser builds a syntax tree that is used by the code generator.
-This syntax tree is the interface between the parser and the code generator.
-It is essential to understand most of the compiler's code.
-
-In order to compile Nim correctly, type-checking has to be separated from
-parsing. Otherwise generics cannot work.
-
-.. include:: filelist.txt
-
-
-The syntax tree
----------------
-The syntax tree consists of nodes which may have an arbitrary number of
-children. Types and symbols are represented by other nodes, because they
-may contain cycles. The AST changes its shape after semantic checking. This
-is needed to make life easier for the code generators. See the "ast" module
-for the type definitions. The `macros <macros.html>`_ module contains many
-examples how the AST represents each syntactic structure.
-
-
-How the RTL is compiled
-=======================
-
-The ``system`` module contains the part of the RTL which needs support by
-compiler magic (and the stuff that needs to be in it because the spec
-says so). The C code generator generates the C code for it just like any other
-module. However, calls to some procedures like ``addInt`` are inserted by
-the CCG. Therefore the module ``magicsys`` contains a table (``compilerprocs``)
-with all symbols that are marked as ``compilerproc``. ``compilerprocs`` are
-needed by the code generator. A ``magic`` proc is not the same as a
-``compilerproc``: A ``magic`` is a proc that needs compiler magic for its
-semantic checking, a ``compilerproc`` is a proc that is used by the code
-generator.
-
-
-Compilation cache
-=================
-
-The implementation of the compilation cache is tricky: There are lots
-of issues to be solved for the front- and backend. In the following
-sections *global* means *shared between modules* or *property of the whole
-program*.
-
-
-Frontend issues
----------------
-
-Methods and type converters
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Nim contains language features that are *global*. The best example for that
-are multi methods: Introducing a new method with the same name and some
-compatible object parameter means that the method's dispatcher needs to take
-the new method into account. So the dispatching logic is only completely known
-after the whole program has been translated!
-
-Other features that are *implicitly* triggered cause problems for modularity
-too. Type converters fall into this category:
-
-.. code-block:: nim
-  # module A
-  converter toBool(x: int): bool =
-    result = x != 0
-
-.. code-block:: nim
-  # module B
-  import A
-
-  if 1:
-    echo "ugly, but should work"
-
-If in the above example module ``B`` is re-compiled, but ``A`` is not then
-``B`` needs to be aware of ``toBool`` even though  ``toBool`` is not referenced
-in ``B`` *explicitly*.
-
-Both the multi method and the type converter problems are solved by storing
-them in special sections in the ROD file that are loaded *unconditionally*
-when the ROD file is read.
-
-Generics
-~~~~~~~~
-
-If we generate an instance of a generic, we'd like to re-use that
-instance if possible across module boundaries. However, this is not
-possible if the compilation cache is enabled. So we give up then and use
-the caching of generics only per module, not per project. This means that
-``--symbolFiles:on`` hurts a bit for efficiency. A better solution would
-be to persist the instantiations in a global cache per project. This might be
-implemented in later versions.
-
-
-Backend issues
---------------
-
-- Init procs must not be "forgotten" to be called.
-- Files must not be "forgotten" to be linked.
-- Anything that is contained in ``nim__dat.c`` is shared between modules
-  implicitly.
-- Method dispatchers are global.
-- DLL loading via ``dlsym`` is global.
-- Emulated thread vars are global.
-
-
-However the biggest problem is that dead code elimination breaks modularity!
-To see why, consider this scenario: The module ``G`` (for example the huge
-Gtk2 module...) is compiled with dead code elimination turned on. So none
-of ``G``'s procs is generated at all.
-
-Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem,
-``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``.
-
-Then module ``A`` (that depends onto ``B`` and ``G``) is compiled and ``B``
-and ``G`` are left unchanged. ``A`` requires ``G.P2``.
-
-So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even
-loaded ``P1`` from the symbol file, nor do we want to because we then quickly
-would restore large parts of the whole program. But we also don't want to
-store ``P1`` in ``B.c`` because that would mean to store every symbol where
-it is referred from which ultimately means the main module and putting
-everything in a single C file.
-
-There is however another solution: The old file ``G.c`` containing ``P1`` is
-**merged** with the new file ``G.c`` containing ``P2``. This is the solution
-that is implemented in the C code generator (have a look at the ``ccgmerge``
-module). The merging may lead to *cruft* (aka dead code) in generated C code
-which can only be removed by recompiling a project with the compilation cache
-turned off. Nevertheless the merge solution is way superior to the
-cheap solution "turn off dead code elimination if the compilation cache is
-turned on".
-
-
-
-Debugging Nim's memory management
-=================================
-
-The following paragraphs are mostly a reminder for myself. Things to keep
-in mind:
-
-* If an assertion in Nim's memory manager or GC fails, the stack trace
-  keeps allocating memory! Thus a stack overflow may happen, hiding the
-  real issue.
-* What seem to be C code generation problems is often a bug resulting from
-  not producing prototypes, so that some types default to ``cint``. Testing
-  without the ``-w`` option helps!
-
-
-The Garbage Collector
-=====================
-
-Introduction
-------------
-
-I use the term *cell* here to refer to everything that is traced
-(sequences, refs, strings).
-This section describes how the new GC works.
-
-The basic algorithm is *Deferrent Reference Counting* with cycle detection.
-References on the stack are not counted for better performance and easier C
-code generation.
-
-Each cell has a header consisting of a RC and a pointer to its type
-descriptor. However the program does not know about these, so they are placed at
-negative offsets. In the GC code the type ``PCell`` denotes a pointer
-decremented by the right offset, so that the header can be accessed easily. It
-is extremely important that ``pointer`` is not confused with a ``PCell``
-as this would lead to a memory corruption.
-
-
-The CellSet data structure
---------------------------
-
-The GC depends on an extremely efficient datastructure for storing a
-set of pointers - this is called a ``TCellSet`` in the source code.
-Inserting, deleting and searching are done in constant time. However,
-modifying a ``TCellSet`` during traversation leads to undefined behaviour.
-
-.. code-block:: Nim
-  type
-    TCellSet # hidden
-
-  proc cellSetInit(s: var TCellSet) # initialize a new set
-  proc cellSetDeinit(s: var TCellSet) # empty the set and free its memory
-  proc incl(s: var TCellSet, elem: PCell) # include an element
-  proc excl(s: var TCellSet, elem: PCell) # exclude an element
-
-  proc `in`(elem: PCell, s: TCellSet): bool # tests membership
-
-  iterator elements(s: TCellSet): (elem: PCell)
-
-
-All the operations have to perform efficiently. Because a Cellset can
-become huge a hash table alone is not suitable for this.
-
-We use a mixture of bitset and hash table for this. The hash table maps *pages*
-to a page descriptor. The page descriptor contains a bit for any possible cell
-address within this page. So including a cell is done as follows:
-
-- Find the page descriptor for the page the cell belongs to.
-- Set the appropriate bit in the page descriptor indicating that the
-  cell points to the start of a memory block.
-
-Removing a cell is analogous - the bit has to be set to zero.
-Single page descriptors are never deleted from the hash table. This is not
-needed as the data structures needs to be rebuilt periodically anyway.
-
-Complete traversal is done in this way::
-
-  for each page descriptor d:
-    for each bit in d:
-      if bit == 1:
-        traverse the pointer belonging to this bit
-
-
-Further complications
----------------------
-
-In Nim the compiler cannot always know if a reference
-is stored on the stack or not. This is caused by var parameters.
-Consider this example:
-
-.. code-block:: Nim
-  proc setRef(r: var ref TNode) =
-    new(r)
-
-  proc usage =
-    var
-      r: ref TNode
-    setRef(r) # here we should not update the reference counts, because
-              # r is on the stack
-    setRef(r.left) # here we should update the refcounts!
-
-We have to decide at runtime whether the reference is on the stack or not.
-The generated code looks roughly like this:
-
-.. code-block:: C
-  void setref(TNode** ref) {
-    unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
-  }
-  void usage(void) {
-    setRef(&r)
-    setRef(&r->left)
-  }
-
-Note that for systems with a continuous stack (which most systems have)
-the check whether the ref is on the stack is very cheap (only two
-comparisons).
-
-
-Code generation for closures
-============================
-
-Code generation for closures is implemented by `lambda lifting`:idx:.
-
-Design
-------
-
-A ``closure`` proc var can call ordinary procs of the default Nim calling
-convention. But not the other way round! A closure is implemented as a
-``tuple[prc, env]``. ``env`` can be nil implying a call without a closure.
-This means that a call through a closure generates an ``if`` but the
-interoperability is worth the cost of the ``if``. Thunk generation would be
-possible too, but it's slightly more effort to implement.
-
-Tests with GCC on Amd64 showed that it's really beneficical if the
-'environment' pointer is passed as the last argument, not as the first argument.
-
-Proper thunk generation is harder because the proc that is to wrap
-could stem from a complex expression:
-
-.. code-block:: nim
-  receivesClosure(returnsDefaultCC[i])
-
-A thunk would need to call 'returnsDefaultCC[i]' somehow and that would require
-an *additional* closure generation... Ok, not really, but it requires to pass
-the function to call. So we'd end up with 2 indirect calls instead of one.
-Another much more severe problem which this solution is that it's not GC-safe
-to pass a proc pointer around via a generic ``ref`` type.
-
-
-Example code:
-
-.. code-block:: nim
-  proc add(x: int): proc (y: int): int {.closure.} =
-    return proc (y: int): int =
-      return x + y
-
-  var add2 = add(2)
-  echo add2(5) #OUT 7
-
-This should produce roughly this code:
-
-.. code-block:: nim
-  type
-    PEnv = ref object
-      x: int # data
-
-  proc anon(y: int, c: PEnv): int =
-    return y + c.x
-
-  proc add(x: int): tuple[prc, data] =
-    var env: PEnv
-    new env
-    env.x = x
-    result = (anon, env)
-
-  var add2 = add(2)
-  let tmp = if add2.data == nil: add2.prc(5) else: add2.prc(5, add2.data)
-  echo tmp
-
-
-Beware of nesting:
-
-.. code-block:: nim
-  proc add(x: int): proc (y: int): proc (z: int): int {.closure.} {.closure.} =
-    return lamba (y: int): proc (z: int): int {.closure.} =
-      return lambda (z: int): int =
-        return x + y + z
-
-  var add24 = add(2)(4)
-  echo add24(5) #OUT 11
-
-This should produce roughly this code:
-
-.. code-block:: nim
-  type
-    PEnvX = ref object
-      x: int # data
-
-    PEnvY = ref object
-      y: int
-      ex: PEnvX
-
-  proc lambdaZ(z: int, ey: PEnvY): int =
-    return ey.ex.x + ey.y + z
-
-  proc lambdaY(y: int, ex: PEnvX): tuple[prc, data: PEnvY] =
-    var ey: PEnvY
-    new ey
-    ey.y = y
-    ey.ex = ex
-    result = (lambdaZ, ey)
-
-  proc add(x: int): tuple[prc, data: PEnvX] =
-    var ex: PEnvX
-    ex.x = x
-    result = (labmdaY, ex)
-
-  var tmp = add(2)
-  var tmp2 = tmp.fn(4, tmp.data)
-  var add24 = tmp2.fn(4, tmp2.data)
-  echo add24(5)
-
-
-We could get rid of nesting environments by always inlining inner anon procs.
-More useful is escape analysis and stack allocation of the environment,
-however.
-
-
-Alternative
------------
-
-Process the closure of all inner procs in one pass and accumulate the
-environments. This is however not always possible.
-
-
-Accumulator
------------
-
-.. code-block:: nim
-  proc getAccumulator(start: int): proc (): int {.closure} =
-    var i = start
-    return lambda: int =
-      inc i
-      return i
-
-  proc p =
-    var delta = 7
-    proc accumulator(start: int): proc(): int =
-      var x = start-1
-      result = proc (): int =
-        x = x + delta
-        inc delta
-        return x
-
-    var a = accumulator(3)
-    var b = accumulator(4)
-    echo a() + b()
-
-
-Internals
----------
-
-Lambda lifting is implemented as part of the ``transf`` pass. The ``transf``
-pass generates code to setup the environment and to pass it around. However,
-this pass does not change the types! So we have some kind of mismatch here; on
-the one hand the proc expression becomes an explicit tuple, on the other hand
-the tyProc(ccClosure) type is not changed. For C code generation it's also
-important the hidden formal param is ``void*`` and not something more
-specialized. However the more specialized env type needs to passed to the
-backend somehow. We deal with this by modifying ``s.ast[paramPos]`` to contain
-the formal hidden parameter, but not ``s.typ``!
-
-