diff options
author | Andreas Rumpf <rumpf_a@web.de> | 2018-06-01 22:11:32 +0200 |
---|---|---|
committer | Andreas Rumpf <rumpf_a@web.de> | 2018-06-01 22:11:32 +0200 |
commit | cae19738562f14fbb76004748bed8d2f337d6f0b (patch) | |
tree | a2d965f68a1e0d2d5617b74166c7798bc77d69f0 | |
parent | 61fb83ecbb4c691c03d500f6c71499e59a67cef2 (diff) | |
download | Nim-cae19738562f14fbb76004748bed8d2f337d6f0b.tar.gz |
document how the incremental compilation scheme could work
-rw-r--r-- | compiler/lineinfos.nim | 6 | ||||
-rw-r--r-- | compiler/modulegraphs.nim | 2 | ||||
-rw-r--r-- | compiler/options.nim | 23 | ||||
-rw-r--r-- | doc/intern.txt | 130 |
4 files changed, 110 insertions, 51 deletions
diff --git a/compiler/lineinfos.nim b/compiler/lineinfos.nim index 0384fda26..cad1fe6aa 100644 --- a/compiler/lineinfos.nim +++ b/compiler/lineinfos.nim @@ -246,10 +246,10 @@ const trackPosInvalidFileIdx* = FileIndex(-2) # special marker so that no sugges # are produced within comments and string literals type - MsgConfig* = object + MsgConfig* = object ## does not need to be stored in the incremental cache trackPos*: TLineInfo - trackPosAttached*: bool ## whether the tracking position was attached to some - ## close token. + trackPosAttached*: bool ## whether the tracking position was attached to + ## some close token. errorOutputs*: TErrorOutputs msgContext*: seq[TLineInfo] diff --git a/compiler/modulegraphs.nim b/compiler/modulegraphs.nim index 7c9837f54..02307ca9f 100644 --- a/compiler/modulegraphs.nim +++ b/compiler/modulegraphs.nim @@ -47,7 +47,7 @@ type doStopCompile*: proc(): bool {.closure.} usageSym*: PSym # for nimsuggest owners*: seq[PSym] - methods*: seq[tuple[methods: TSymSeq, dispatcher: PSym]] + methods*: seq[tuple[methods: TSymSeq, dispatcher: PSym]] # needs serialization! systemModule*: PSym sysTypes*: array[TTypeKind, PType] compilerprocs*: TStrTable diff --git a/compiler/options.nim b/compiler/options.nim index cb4f1e885..044461b55 100644 --- a/compiler/options.nim +++ b/compiler/options.nim @@ -156,24 +156,27 @@ type version*: int Suggestions* = seq[Suggest] - ConfigRef* = ref object ## eventually all global configuration should be moved here - target*: Target + ConfigRef* = ref object ## every global configuration + ## fields marked with '*' are subject to + ## the incremental compilation mechanisms + ## (+) means "part of the dependency" + target*: Target # (+) linesCompiled*: int # all lines that have been compiled - options*: TOptions - globalOptions*: TGlobalOptions + options*: TOptions # (+) + globalOptions*: TGlobalOptions # (+) m*: MsgConfig evalTemplateCounter*: int evalMacroCounter*: int exitcode*: int8 cmd*: TCommands # the command - selectedGC*: TGCMode # the selected GC + selectedGC*: TGCMode # the selected GC (+) verbosity*: int # how verbose the compiler is numberOfProcessors*: int # number of processors evalExpr*: string # expression for idetools --eval lastCmdTime*: float # when caas is enabled, we measure each command symbolFiles*: SymbolFilesOption - cppDefines*: HashSet[string] + cppDefines*: HashSet[string] # (*) headerFile*: string features*: set[Feature] arguments*: string ## the arguments to be passed to the program that @@ -220,13 +223,13 @@ type cLinkedLibs*: seq[string] # libraries to link externalToLink*: seq[string] # files to link in addition to the file - # we compiled + # we compiled (*) linkOptionsCmd*: string compileOptionsCmd*: seq[string] - linkOptions*: string - compileOptions*: string + linkOptions*: string # (*) + compileOptions*: string # (*) ccompilerpath*: string - toCompile*: CfileList + toCompile*: CfileList # (*) suggestionResultHook*: proc (result: Suggest) {.closure.} suggestVersion*: int suggestMaxResults*: int diff --git a/doc/intern.txt b/doc/intern.txt index dadb0eb05..a4545583e 100644 --- a/doc/intern.txt +++ b/doc/intern.txt @@ -38,10 +38,6 @@ Path Purpose Bootstrapping the compiler ========================== -As of version 0.8.5 the compiler is maintained in Nim. (The first versions -have been implemented in Object Pascal.) The Python-based build system has -been rewritten in Nim too. - Compiling the compiler is a simple matter of running:: nim c koch.nim @@ -202,16 +198,86 @@ Compilation cache ================= The implementation of the compilation cache is tricky: There are lots -of issues to be solved for the front- and backend. In the following -sections *global* means *shared between modules* or *property of the whole -program*. +of issues to be solved for the front- and backend. + + +General approach: AST replay +---------------------------- + +We store a module's AST of a successful semantic check in a SQLite +database. There are plenty of features that require a sub sequence +to be re-applied, for example: + +.. code-block:: nim + {.compile: "foo.c".} # even if the module is loaded from the DB, + # "foo.c" needs to be compiled/linked. + +The solution is to **re-play** the module's top level statements. +This solves the problem without having to special case the logic +that fills the internal seqs which are affected by the pragmas. + +In fact, this decribes how the AST should be stored in the database, +as a "shallow" tree. Let's assume we compile module ``m`` with the +following contents: + +.. code-block:: nim + import strutils + + var x*: int = 90 + {.compile: "foo.c".} + proc p = echo "p" + proc q = echo "q" + static: + echo "static" + +Conceptually this is the AST we store for the module: + +.. code-block:: nim + import strutils + + var x* + {.compile: "foo.c".} + proc p + proc q + static: + echo "static" + +The symbol's ``ast`` field is loaded lazily, on demand. This is where most +savings come from, only the shallow outer AST is reconstructed immediately. + +It is also important that the replay involves the ``import`` statement so +that the dependencies are resolved properly. + + +Shared global compiletime state +------------------------------- + +Nim allows ``.global, compiletime`` variables that can be filled by macro +invokations across different modules. This feature breaks modularity in a +severe way. Plenty of different solutions have been proposed: + +- Restrict the types of global compiletime variables to ``Set[T]`` or + similar unordered, only-growable collections so that we can track + the module's write effects to these variables and reapply the changes + in a different order. +- In every module compilation, reset the variable to its default value. +- Provide a restrictive API that can load/save the compiletime state to + a file. + +(These solutions are not mutually exclusive.) + +Since we adopt the "replay the top level statements" idea, the natural +solution to this problem is to emit pseudo top level statements that +reflect the mutations done to the global variable. -Frontend issues ---------------- Methods and type converters -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------------------- + +In the following +sections *global* means *shared between modules* or *property of the whole +program*. Nim contains language features that are *global*. The best example for that are multi methods: Introducing a new method with the same name and some @@ -238,20 +304,17 @@ If in the above example module ``B`` is re-compiled, but ``A`` is not then ``B`` needs to be aware of ``toBool`` even though ``toBool`` is not referenced in ``B`` *explicitly*. -Both the multi method and the type converter problems are solved by storing -them in special sections in the ROD file that are loaded *unconditionally* -when the ROD file is read. +Both the multi method and the type converter problems are solved by the +AST replay implementation. + Generics ~~~~~~~~ -If we generate an instance of a generic, we'd like to re-use that -instance if possible across module boundaries. However, this is not -possible if the compilation cache is enabled. So we give up then and use -the caching of generics only per module, not per project. This means that -``--symbolFiles:on`` hurts a bit for efficiency. A better solution would -be to persist the instantiations in a global cache per project. This might be -implemented in later versions. +We cache generic instantiations and need to ensure this caching works +well with the incremental compilation feature. Since the cache is +attached to the ``PSym`` datastructure, it should work without any +special logic. Backend issues @@ -259,13 +322,10 @@ Backend issues - Init procs must not be "forgotten" to be called. - Files must not be "forgotten" to be linked. -- Anything that is contained in ``nim__dat.c`` is shared between modules - implicitly. - Method dispatchers are global. - DLL loading via ``dlsym`` is global. - Emulated thread vars are global. - However the biggest problem is that dead code elimination breaks modularity! To see why, consider this scenario: The module ``G`` (for example the huge Gtk2 module...) is compiled with dead code elimination turned on. So none @@ -274,25 +334,21 @@ of ``G``'s procs is generated at all. Then module ``B`` is compiled that requires ``G.P1``. Ok, no problem, ``G.P1`` is loaded from the symbol file and ``G.c`` now contains ``G.P1``. -Then module ``A`` (that depends onto ``B`` and ``G``) is compiled and ``B`` +Then module ``A`` (that depends on ``B`` and ``G``) is compiled and ``B`` and ``G`` are left unchanged. ``A`` requires ``G.P2``. So now ``G.c`` MUST contain both ``P1`` and ``P2``, but we haven't even loaded ``P1`` from the symbol file, nor do we want to because we then quickly -would restore large parts of the whole program. But we also don't want to -store ``P1`` in ``B.c`` because that would mean to store every symbol where -it is referred from which ultimately means the main module and putting -everything in a single C file. +would restore large parts of the whole program. -There is however another solution: The old file ``G.c`` containing ``P1`` is -**merged** with the new file ``G.c`` containing ``P2``. This is the solution -that is implemented in the C code generator (have a look at the ``ccgmerge`` -module). The merging may lead to *cruft* (aka dead code) in generated C code -which can only be removed by recompiling a project with the compilation cache -turned off. Nevertheless the merge solution is way superior to the -cheap solution "turn off dead code elimination if the compilation cache is -turned on". +Solution +~~~~~~~~ +The backend must have some logic so that if the currently processed module +is from the compilation cache, the ``ast`` field is not accessed. Instead +the generated C(++) for the symbol's body needs to be cached too and +inserted back into the produced C file. This approach seems to deal with +all the outlined problems above. Debugging Nim's memory management @@ -317,7 +373,7 @@ Introduction I use the term *cell* here to refer to everything that is traced (sequences, refs, strings). -This section describes how the new GC works. +This section describes how the GC works. The basic algorithm is *Deferrent Reference Counting* with cycle detection. References on the stack are not counted for better performance and easier C |