summary refs log tree commit diff stats
path: root/lib
diff options
context:
space:
mode:
authorAndreas Rumpf <rumpf_a@web.de>2021-03-26 16:27:55 +0100
committerGitHub <noreply@github.com>2021-03-26 16:27:55 +0100
commit3e03f6733548fecb328214e0c8632fa4916992ed (patch)
tree1605cc1a34f2f94abd396a6f6d5d7cbd9e0d370a /lib
parentadd771b051128b5eef09b4d77015fbeacf994700 (diff)
downloadNim-3e03f6733548fecb328214e0c8632fa4916992ed.tar.gz
cleaned up the internal documentation (#17524)
Diffstat (limited to 'lib')
-rw-r--r--lib/system/cellsets.nim35
-rw-r--r--lib/system/gc.nim47
2 files changed, 81 insertions, 1 deletions
diff --git a/lib/system/cellsets.nim b/lib/system/cellsets.nim
index 779f1a91f..a0f1fabf9 100644
--- a/lib/system/cellsets.nim
+++ b/lib/system/cellsets.nim
@@ -7,7 +7,40 @@
 #    distribution, for details about the copyright.
 #
 
-# Efficient set of pointers for the GC (and repr)
+
+#[
+
+Efficient set of pointers for the GC (and repr)
+-----------------------------------------------
+
+The GC depends on an extremely efficient datastructure for storing a
+set of pointers - this is called a `CellSet` in the source code.
+Inserting, deleting and searching are done in constant time. However,
+modifying a `CellSet` during traversal leads to undefined behaviour.
+
+All operations on a CellSet have to perform efficiently. Because a Cellset can
+become huge a hash table alone is not suitable for this.
+
+We use a mixture of bitset and hash table for this. The hash table maps *pages*
+to a page descriptor. The page descriptor contains a bit for any possible cell
+address within this page. So including a cell is done as follows:
+
+- Find the page descriptor for the page the cell belongs to.
+- Set the appropriate bit in the page descriptor indicating that the
+  cell points to the start of a memory block.
+
+Removing a cell is analogous - the bit has to be set to zero.
+Single page descriptors are never deleted from the hash table. This is not
+needed as the data structures needs to be rebuilt periodically anyway.
+
+Complete traversal is done in this way::
+
+  for each page descriptor d:
+    for each bit in d:
+      if bit == 1:
+        traverse the pointer belonging to this bit
+
+]#
 
 when defined(gcOrc) or defined(gcArc):
   type
diff --git a/lib/system/gc.nim b/lib/system/gc.nim
index 1f7164266..ae29f3466 100644
--- a/lib/system/gc.nim
+++ b/lib/system/gc.nim
@@ -12,6 +12,53 @@
 # Refcounting + Mark&Sweep. Complex algorithms avoided.
 # Been there, done that, didn't work.
 
+#[
+
+A *cell* is anything that is traced by the GC
+(sequences, refs, strings, closures).
+
+The basic algorithm is *Deferrent Reference Counting* with cycle detection.
+References on the stack are not counted for better performance and easier C
+code generation.
+
+Each cell has a header consisting of a RC and a pointer to its type
+descriptor. However the program does not know about these, so they are placed at
+negative offsets. In the GC code the type `PCell` denotes a pointer
+decremented by the right offset, so that the header can be accessed easily. It
+is extremely important that `pointer` is not confused with a `PCell`.
+
+In Nim the compiler cannot always know if a reference
+is stored on the stack or not. This is caused by var parameters.
+Consider this example:
+
+.. code-block:: Nim
+  proc setRef(r: var ref TNode) =
+    new(r)
+
+  proc usage =
+    var
+      r: ref TNode
+    setRef(r) # here we should not update the reference counts, because
+              # r is on the stack
+    setRef(r.left) # here we should update the refcounts!
+
+We have to decide at runtime whether the reference is on the stack or not.
+The generated code looks roughly like this:
+
+.. code-block:: C
+  void setref(TNode** ref) {
+    unsureAsgnRef(ref, newObj(TNode_TI, sizeof(TNode)))
+  }
+  void usage(void) {
+    setRef(&r)
+    setRef(&r->left)
+  }
+
+Note that for systems with a continuous stack (which most systems have)
+the check whether the ref is on the stack is very cheap (only two
+comparisons).
+]#
+
 {.push profiler:off.}
 
 const