diff options
Diffstat (limited to 'doc/intern.txt')
-rw-r--r-- | doc/intern.txt | 130 |
1 files changed, 93 insertions, 37 deletions
diff --git a/doc/intern.txt b/doc/intern.txt index dadb0eb05..a4545583e 100644 --- a/doc/intern.txt +++ b/doc/intern.txt @@ -38,10 +38,6 @@ Path Purpose Bootstrapping the compiler ========================== -As of version 0.8.5 the compiler is maintained in Nim. (The first versions -have been implemented in Object Pascal.) The Python-based build system has -been rewritten in Nim too. - Compiling the compiler is a simple matter of running:: nim c koch.nim @@ -202,16 +198,86 @@ Compilation cache ================= The implementation of the compilation cache is tricky: There are lots -of issues to be solved for the front- and backend. In the following -sections *global* means *shared between modules* or *property of the whole -program*. +of issues to be solved for the front- and backend. + + +General approach: AST replay +---------------------------- + +We store a module's AST of a successful semantic check in a SQLite +database. There are plenty of features that require a sub sequence +to be re-applied, for example: + +.. code-block:: nim + {.compile: "foo.c".} # even if the module is loaded from the DB, + # "foo.c" needs to be compiled/linked. + +The solution is to **re-play** the module's top level statements. +This solves the problem without having to special case the logic +that fills the internal seqs which are affected by the pragmas. + +In fact, this decribes how the AST should be stored in the database, +as a "shallow" tree. Let's assume we compile module ``m`` with the +following contents: + +.. code-block:: nim + import strutils + + var x*: int = 90 + {.compile: "foo.c".} + proc p = echo "p" + proc q = echo "q" + static: + echo "static" + +Conceptually this is the AST we store for the module: + +.. code-block:: nim + import strutils + + var x* + {.compile: "foo.c".} + proc p + proc q + static: + echo "static" + +The symbol's ``ast`` field is loaded lazily, on demand. This is where most +savings come from, only the shallow outer AST is reconstructed immediately. + +It is also important that the replay involves the ``import`` statement so +that the dependencies are resolved properly. + + +Shared global compiletime state +------------------------------- + +Nim allows ``.global, compiletime`` variables that can be filled by macro +invokations across different modules. This feature breaks modularity in a +severe way. Plenty of different solutions have been proposed: + +- Restrict the types of global compiletime variables to ``Set[T]`` or + similar unordered, only-growable collections so that we can track + the module's write effects to these variables and reapply the changes + in a different order. +- In every module compilation, reset the variable to its default value. +- Provide a restrictive API that can load/save the compiletime state to + a file. + +(These solutions are not mutually exclusive.) + +Since we adopt the "replay the top level statements" idea, the natural +solution to this problem is to emit pseudo top level statements that +reflect the mutations done to the global variable. -Frontend issues ---------------- Methods and type converters -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------------------- + +In the following +sections *global* means *shared between modules* or *property of the whole +program*. Nim contains language features that are *global*. The best example for that are multi methods: Introducing a new method with the same name and some @@ -238,20 +304,17 @@ If in the above example module ``B`` is re-compiled, but ``A`` is not then ``B`` needs to be aware of ``toBool`` even though ``toBool`` is not referenced in ``B`` *explicitly*. -Both the multi method and the type converter problems are solved by storing -them in special sections in the ROD file that are loaded *unconditionally* -when the ROD file is read. +Both the multi method and the type converter problems are solved by the +AST replay implementation. + Generics ~~~~~~~~ -If we generate an instance of a generic, we'd like to re-use that -instance if possible across module boundaries. However, this is not -possible if the compilation cache is enabled. So we give up then and use -the caching of generics only per module, not per project. This means that -``--symbolFiles:on`` hurts a bit for efficiency. A better solution would -be to persist the instantiations in a global cache per project. This might be -implemented in later versions. +We cache generic instantiations and need to ensure this caching works +well with the incremental compilation feature. Since the cache is +attached to the ``PSym`` datastructure, it should work without any +special logic. Backend issues @@ -259,13 +322,10 @@ Backend issues - Init procs must not be "forgotten" to be called. - Files must not be "forgotten" to be linked. -- Anything that is contained in ``nim__dat.c`` is shared between modules - implicitly. - Method dispatchers are global. - DLL loading via ``dlsym`` is global. - Emulated thread vars are global. - However the biggest problem is that dead code elimination breaks modularity! To see why, consider this scenario: The module ``G`` (for example the huge Gtk2 module...) is compiled with dead code elimination turned on. So none @@ -274,25 +334,21 @@ of ``G``'s procs is generated at all. Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem, ``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``. -Then module ``A`` (that depends onto ``B`` and ``G``) is compiled and ``B`` +Then module ``A`` (that depends on ``B`` and ``G``) is compiled and ``B`` and ``G`` are left unchanged. ``A`` requires ``G.P2``. So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even loaded ``P1`` from the symbol file, nor do we want to because we then quickly -would restore large parts of the whole program. But we also don't want to -store ``P1`` in ``B.c`` because that would mean to store every symbol where -it is referred from which ultimately means the main module and putting -everything in a single C file. +would restore large parts of the whole program. -There is however another solution: The old file ``G.c`` containing ``P1`` is -**merged** with the new file ``G.c`` containing ``P2``. This is the solution -that is implemented in the C code generator (have a look at the ``ccgmerge`` -module). The merging may lead to *cruft* (aka dead code) in generated C code -which can only be removed by recompiling a project with the compilation cache -turned off. Nevertheless the merge solution is way superior to the -cheap solution "turn off dead code elimination if the compilation cache is -turned on". +Solution +~~~~~~~~ +The backend must have some logic so that if the currently processed module +is from the compilation cache, the ``ast`` field is not accessed. Instead +the generated C(++) for the symbol's body needs to be cached too and +inserted back into the produced C file. This approach seems to deal with +all the outlined problems above. Debugging Nim's memory management @@ -317,7 +373,7 @@ Introduction I use the term *cell* here to refer to everything that is traced (sequences, refs, strings). -This section describes how the new GC works. +This section describes how the GC works. The basic algorithm is *Deferrent Reference Counting* with cycle detection. References on the stack are not counted for better performance and easier C |