summary refs log tree commit diff stats
path: root/doc/intern.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/intern.txt')
-rw-r--r--doc/intern.txt130
1 files changed, 93 insertions, 37 deletions
diff --git a/doc/intern.txt b/doc/intern.txt
index dadb0eb05..a4545583e 100644
--- a/doc/intern.txt
+++ b/doc/intern.txt
@@ -38,10 +38,6 @@ Path           Purpose
 Bootstrapping the compiler
 ==========================
 
-As of version 0.8.5 the compiler is maintained in Nim. (The first versions
-have been implemented in Object Pascal.) The Python-based build system has
-been rewritten in Nim too.
-
 Compiling the compiler is a simple matter of running::
 
   nim c koch.nim
@@ -202,16 +198,86 @@ Compilation cache
 =================
 
 The implementation of the compilation cache is tricky: There are lots
-of issues to be solved for the front- and backend. In the following
-sections *global* means *shared between modules* or *property of the whole
-program*.
+of issues to be solved for the front- and backend.
+
+
+General approach: AST replay
+----------------------------
+
+We store a module's AST of a successful semantic check in a SQLite
+database. There are plenty of features that require a sub sequence
+to be re-applied, for example:
+
+.. code-block:: nim
+  {.compile: "foo.c".} # even if the module is loaded from the DB,
+                       # "foo.c" needs to be compiled/linked.
+
+The solution is to **re-play** the module's top level statements.
+This solves the problem without having to special case the logic
+that fills the internal seqs which are affected by the pragmas.
+
+In fact, this decribes how the AST should be stored in the database,
+as a "shallow" tree. Let's assume we compile module ``m`` with the
+following contents:
+
+.. code-block:: nim
+  import strutils
+
+  var x*: int = 90
+  {.compile: "foo.c".}
+  proc p = echo "p"
+  proc q = echo "q"
+  static:
+    echo "static"
+
+Conceptually this is the AST we store for the module:
+
+.. code-block:: nim
+  import strutils
+
+  var x*
+  {.compile: "foo.c".}
+  proc p
+  proc q
+  static:
+    echo "static"
+
+The symbol's ``ast`` field is loaded lazily, on demand. This is where most
+savings come from, only the shallow outer AST is reconstructed immediately.
+
+It is also important that the replay involves the ``import`` statement so
+that the dependencies are resolved properly.
+
+
+Shared global compiletime state
+-------------------------------
+
+Nim allows ``.global, compiletime`` variables that can be filled by macro
+invokations across different modules. This feature breaks modularity in a
+severe way. Plenty of different solutions have been proposed:
+
+- Restrict the types of global compiletime variables to ``Set[T]`` or
+  similar unordered, only-growable collections so that we can track
+  the module's write effects to these variables and reapply the changes
+  in a different order.
+- In every module compilation, reset the variable to its default value.
+- Provide a restrictive API that can load/save the compiletime state to
+  a file.
+
+(These solutions are not mutually exclusive.)
+
+Since we adopt the "replay the top level statements" idea, the natural
+solution to this problem is to emit pseudo top level statements that
+reflect the mutations done to the global variable.
 
 
-Frontend issues
----------------
 
 Methods and type converters
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+---------------------------
+
+In the following
+sections *global* means *shared between modules* or *property of the whole
+program*.
 
 Nim contains language features that are *global*. The best example for that
 are multi methods: Introducing a new method with the same name and some
@@ -238,20 +304,17 @@ If in the above example module ``B`` is re-compiled, but ``A`` is not then
 ``B`` needs to be aware of ``toBool`` even though  ``toBool`` is not referenced
 in ``B`` *explicitly*.
 
-Both the multi method and the type converter problems are solved by storing
-them in special sections in the ROD file that are loaded *unconditionally*
-when the ROD file is read.
+Both the multi method and the type converter problems are solved by the
+AST replay implementation.
+
 
 Generics
 ~~~~~~~~
 
-If we generate an instance of a generic, we'd like to re-use that
-instance if possible across module boundaries. However, this is not
-possible if the compilation cache is enabled. So we give up then and use
-the caching of generics only per module, not per project. This means that
-``--symbolFiles:on`` hurts a bit for efficiency. A better solution would
-be to persist the instantiations in a global cache per project. This might be
-implemented in later versions.
+We cache generic instantiations and need to ensure this caching works
+well with the incremental compilation feature. Since the cache is
+attached to the ``PSym`` datastructure, it should work without any
+special logic.
 
 
 Backend issues
@@ -259,13 +322,10 @@ Backend issues
 
 - Init procs must not be "forgotten" to be called.
 - Files must not be "forgotten" to be linked.
-- Anything that is contained in ``nim__dat.c`` is shared between modules
-  implicitly.
 - Method dispatchers are global.
 - DLL loading via ``dlsym`` is global.
 - Emulated thread vars are global.
 
-
 However the biggest problem is that dead code elimination breaks modularity!
 To see why, consider this scenario: The module ``G`` (for example the huge
 Gtk2 module...) is compiled with dead code elimination turned on. So none
@@ -274,25 +334,21 @@ of ``G``'s procs is generated at all.
 Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem,
 ``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``.
 
-Then module ``A`` (that depends onto ``B`` and ``G``) is compiled and ``B``
+Then module ``A`` (that depends on ``B`` and ``G``) is compiled and ``B``
 and ``G`` are left unchanged. ``A`` requires ``G.P2``.
 
 So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even
 loaded ``P1`` from the symbol file, nor do we want to because we then quickly
-would restore large parts of the whole program. But we also don't want to
-store ``P1`` in ``B.c`` because that would mean to store every symbol where
-it is referred from which ultimately means the main module and putting
-everything in a single C file.
+would restore large parts of the whole program.
 
-There is however another solution: The old file ``G.c`` containing ``P1`` is
-**merged** with the new file ``G.c`` containing ``P2``. This is the solution
-that is implemented in the C code generator (have a look at the ``ccgmerge``
-module). The merging may lead to *cruft* (aka dead code) in generated C code
-which can only be removed by recompiling a project with the compilation cache
-turned off. Nevertheless the merge solution is way superior to the
-cheap solution "turn off dead code elimination if the compilation cache is
-turned on".
+Solution
+~~~~~~~~ 
 
+The backend must have some logic so that if the currently processed module
+is from the compilation cache, the ``ast`` field is not accessed. Instead
+the generated C(++) for the symbol's body needs to be cached too and
+inserted back into the produced C file. This approach seems to deal with
+all the outlined problems above.
 
 
 Debugging Nim's memory management
@@ -317,7 +373,7 @@ Introduction
 
 I use the term *cell* here to refer to everything that is traced
 (sequences, refs, strings).
-This section describes how the new GC works.
+This section describes how the GC works.
 
 The basic algorithm is *Deferrent Reference Counting* with cycle detection.
 References on the stack are not counted for better performance and easier C