From 3cb645ab500487b62d82c9d02c0e7f21b2cf1413 Mon Sep 17 00:00:00 2001 From: Andreas Rumpf Date: Wed, 27 Mar 2019 14:40:47 +0100 Subject: move more stuff into manual_experimental --- doc/manual.rst | 386 +-------------------------------------------------------- 1 file changed, 4 insertions(+), 382 deletions(-) (limited to 'doc/manual.rst') diff --git a/doc/manual.rst b/doc/manual.rst index 59bec9c90..8771f8fe9 100644 --- a/doc/manual.rst +++ b/doc/manual.rst @@ -6842,15 +6842,14 @@ To enable thread support the ``--threads:on`` command line switch needs to be used. The ``system`` module then contains several threading primitives. See the `threads `_ and `channels `_ modules for the low level thread API. There are also high level parallelism constructs -available. See `spawn <#parallel-amp-spawn>`_ for further details. +available. See `spawn `_ for +further details. Nim's memory model for threads is quite different than that of other common programming languages (C, Pascal, Java): Each thread has its own (garbage collected) heap and sharing of memory is restricted to global variables. This helps to prevent race conditions. GC efficiency is improved quite a lot, because the GC never has to stop other threads and see what they reference. -Memory allocation requires no lock at all! This design easily scales to massive -multicore processors that are becoming the norm. Thread pragma @@ -6876,9 +6875,8 @@ directly or indirectly through a call to a GC unsafe proc. The `gcsafe`:idx: annotation can be used to mark a proc to be gcsafe, otherwise this property is inferred by the compiler. Note that ``noSideEffect`` implies ``gcsafe``. The only way to create a thread is via ``spawn`` or -``createThread``. ``spawn`` is usually the preferable method. Either way -the invoked proc must not use ``var`` parameters nor must any of its parameters -contain a ``ref`` or ``closure`` type. This enforces +``createThread``. The invoked proc must not use ``var`` parameters nor must +any of its parameters contain a ``ref`` or ``closure`` type. This enforces the *no heap sharing restriction*. Routines that are imported from C are always assumed to be ``gcsafe``. @@ -6928,379 +6926,3 @@ in one thread cannot affect any other thread. However, an *unhandled* exception in one thread terminates the whole *process*! - -Parallel & Spawn -================ - -Nim has two flavors of parallelism: -1) `Structured`:idx: parallelism via the ``parallel`` statement. -2) `Unstructured`:idx: parallelism via the standalone ``spawn`` statement. - -Nim has a builtin thread pool that can be used for CPU intensive tasks. For -IO intensive tasks the ``async`` and ``await`` features should be -used instead. Both parallel and spawn need the `threadpool `_ -module to work. - -Somewhat confusingly, ``spawn`` is also used in the ``parallel`` statement -with slightly different semantics. ``spawn`` always takes a call expression of -the form ``f(a, ...)``. Let ``T`` be ``f``'s return type. If ``T`` is ``void`` -then ``spawn``'s return type is also ``void`` otherwise it is ``FlowVar[T]``. - -Within a ``parallel`` section sometimes the ``FlowVar[T]`` is eliminated -to ``T``. This happens when ``T`` does not contain any GC'ed memory. -The compiler can ensure the location in ``location = spawn f(...)`` is not -read prematurely within a ``parallel`` section and so there is no need for -the overhead of an indirection via ``FlowVar[T]`` to ensure correctness. - -**Note**: Currently exceptions are not propagated between ``spawn``'ed tasks! - - -Spawn statement ---------------- - -`spawn`:idx: can be used to pass a task to the thread pool: - -.. code-block:: nim - import threadpool - - proc processLine(line: string) = - discard "do some heavy lifting here" - - for x in lines("myinput.txt"): - spawn processLine(x) - sync() - -For reasons of type safety and implementation simplicity the expression -that ``spawn`` takes is restricted: - -* It must be a call expression ``f(a, ...)``. -* ``f`` must be ``gcsafe``. -* ``f`` must not have the calling convention ``closure``. -* ``f``'s parameters may not be of type ``var``. - This means one has to use raw ``ptr``'s for data passing reminding the - programmer to be careful. -* ``ref`` parameters are deeply copied which is a subtle semantic change and - can cause performance problems but ensures memory safety. This deep copy - is performed via ``system.deepCopy`` and so can be overridden. -* For *safe* data exchange between ``f`` and the caller a global ``TChannel`` - needs to be used. However, since spawn can return a result, often no further - communication is required. - - -``spawn`` executes the passed expression on the thread pool and returns -a `data flow variable`:idx: ``FlowVar[T]`` that can be read from. The reading -with the ``^`` operator is **blocking**. However, one can use ``blockUntilAny`` to -wait on multiple flow variables at the same time: - -.. code-block:: nim - import threadpool, ... - - # wait until 2 out of 3 servers received the update: - proc main = - var responses = newSeq[FlowVarBase](3) - for i in 0..2: - responses[i] = spawn tellServer(Update, "key", "value") - var index = blockUntilAny(responses) - assert index >= 0 - responses.del(index) - discard blockUntilAny(responses) - -Data flow variables ensure that no data races -are possible. Due to technical limitations not every type ``T`` is possible in -a data flow variable: ``T`` has to be of the type ``ref``, ``string``, ``seq`` -or of a type that doesn't contain a type that is garbage collected. This -restriction is not hard to work-around in practice. - - - -Parallel statement ------------------- - -Example: - -.. code-block:: nim - :test: "nim c --threads:on $1" - - # Compute PI in an inefficient way - import strutils, math, threadpool - {.experimental: "parallel".} - - proc term(k: float): float = 4 * math.pow(-1, k) / (2*k + 1) - - proc pi(n: int): float = - var ch = newSeq[float](n+1) - parallel: - for k in 0..ch.high: - ch[k] = spawn term(float(k)) - for k in 0..ch.high: - result += ch[k] - - echo formatFloat(pi(5000)) - - -The parallel statement is the preferred mechanism to introduce parallelism in a -Nim program. A subset of the Nim language is valid within a ``parallel`` -section. This subset is checked during semantic analysis to be free of data -races. A sophisticated `disjoint checker`:idx: ensures that no data races are -possible even though shared memory is extensively supported! - -The subset is in fact the full language with the following -restrictions / changes: - -* ``spawn`` within a ``parallel`` section has special semantics. -* Every location of the form ``a[i]`` and ``a[i..j]`` and ``dest`` where - ``dest`` is part of the pattern ``dest = spawn f(...)`` has to be - provably disjoint. This is called the *disjoint check*. -* Every other complex location ``loc`` that is used in a spawned - proc (``spawn f(loc)``) has to be immutable for the duration of - the ``parallel`` section. This is called the *immutability check*. Currently - it is not specified what exactly "complex location" means. We need to make - this an optimization! -* Every array access has to be provably within bounds. This is called - the *bounds check*. -* Slices are optimized so that no copy is performed. This optimization is not - yet performed for ordinary slices outside of a ``parallel`` section. - - -Guards and locks -================ - -Apart from ``spawn`` and ``parallel`` Nim also provides all the common low level -concurrency mechanisms like locks, atomic intrinsics or condition variables. - -Nim significantly improves on the safety of these features via additional -pragmas: - -1) A `guard`:idx: annotation is introduced to prevent data races. -2) Every access of a guarded memory location needs to happen in an - appropriate `locks`:idx: statement. -3) Locks and routines can be annotated with `lock levels`:idx: to allow - potential deadlocks to be detected during semantic analysis. - - -Guards and the locks section ----------------------------- - -Protecting global variables -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Object fields and global variables can be annotated via a ``guard`` pragma: - -.. code-block:: nim - var glock: TLock - var gdata {.guard: glock.}: int - -The compiler then ensures that every access of ``gdata`` is within a ``locks`` -section: - -.. code-block:: nim - proc invalid = - # invalid: unguarded access: - echo gdata - - proc valid = - # valid access: - {.locks: [glock].}: - echo gdata - -Top level accesses to ``gdata`` are always allowed so that it can be initialized -conveniently. It is *assumed* (but not enforced) that every top level statement -is executed before any concurrent action happens. - -The ``locks`` section deliberately looks ugly because it has no runtime -semantics and should not be used directly! It should only be used in templates -that also implement some form of locking at runtime: - -.. code-block:: nim - template lock(a: TLock; body: untyped) = - pthread_mutex_lock(a) - {.locks: [a].}: - try: - body - finally: - pthread_mutex_unlock(a) - - -The guard does not need to be of any particular type. It is flexible enough to -model low level lockfree mechanisms: - -.. code-block:: nim - var dummyLock {.compileTime.}: int - var atomicCounter {.guard: dummyLock.}: int - - template atomicRead(x): untyped = - {.locks: [dummyLock].}: - memoryReadBarrier() - x - - echo atomicRead(atomicCounter) - - -The ``locks`` pragma takes a list of lock expressions ``locks: [a, b, ...]`` -in order to support *multi lock* statements. Why these are essential is -explained in the `lock levels <#guards-and-locks-lock-levels>`_ section. - - -Protecting general locations -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``guard`` annotation can also be used to protect fields within an object. -The guard then needs to be another field within the same object or a -global variable. - -Since objects can reside on the heap or on the stack this greatly enhances the -expressivity of the language: - -.. code-block:: nim - type - ProtectedCounter = object - v {.guard: L.}: int - L: TLock - - proc incCounters(counters: var openArray[ProtectedCounter]) = - for i in 0..counters.high: - lock counters[i].L: - inc counters[i].v - -The access to field ``x.v`` is allowed since its guard ``x.L`` is active. -After template expansion, this amounts to: - -.. code-block:: nim - proc incCounters(counters: var openArray[ProtectedCounter]) = - for i in 0..counters.high: - pthread_mutex_lock(counters[i].L) - {.locks: [counters[i].L].}: - try: - inc counters[i].v - finally: - pthread_mutex_unlock(counters[i].L) - -There is an analysis that checks that ``counters[i].L`` is the lock that -corresponds to the protected location ``counters[i].v``. This analysis is called -`path analysis`:idx: because it deals with paths to locations -like ``obj.field[i].fieldB[j]``. - -The path analysis is **currently unsound**, but that doesn't make it useless. -Two paths are considered equivalent if they are syntactically the same. - -This means the following compiles (for now) even though it really should not: - -.. code-block:: nim - {.locks: [a[i].L].}: - inc i - access a[i].v - - - -Lock levels ------------ - -Lock levels are used to enforce a global locking order in order to detect -potential deadlocks during semantic analysis. A lock level is an constant -integer in the range 0..1_000. Lock level 0 means that no lock is acquired at -all. - -If a section of code holds a lock of level ``M`` than it can also acquire any -lock of level ``N < M``. Another lock of level ``M`` cannot be acquired. Locks -of the same level can only be acquired *at the same time* within a -single ``locks`` section: - -.. code-block:: nim - var a, b: TLock[2] - var x: TLock[1] - # invalid locking order: TLock[1] cannot be acquired before TLock[2]: - {.locks: [x].}: - {.locks: [a].}: - ... - # valid locking order: TLock[2] acquired before TLock[1]: - {.locks: [a].}: - {.locks: [x].}: - ... - - # invalid locking order: TLock[2] acquired before TLock[2]: - {.locks: [a].}: - {.locks: [b].}: - ... - - # valid locking order, locks of the same level acquired at the same time: - {.locks: [a, b].}: - ... - - -Here is how a typical multilock statement can be implemented in Nim. Note how -the runtime check is required to ensure a global ordering for two locks ``a`` -and ``b`` of the same lock level: - -.. code-block:: nim - template multilock(a, b: ptr TLock; body: untyped) = - if cast[ByteAddress](a) < cast[ByteAddress](b): - pthread_mutex_lock(a) - pthread_mutex_lock(b) - else: - pthread_mutex_lock(b) - pthread_mutex_lock(a) - {.locks: [a, b].}: - try: - body - finally: - pthread_mutex_unlock(a) - pthread_mutex_unlock(b) - - -Whole routines can also be annotated with a ``locks`` pragma that takes a lock -level. This then means that the routine may acquire locks of up to this level. -This is essential so that procs can be called within a ``locks`` section: - -.. code-block:: nim - proc p() {.locks: 3.} = discard - - var a: TLock[4] - {.locks: [a].}: - # p's locklevel (3) is strictly less than a's (4) so the call is allowed: - p() - - -As usual ``locks`` is an inferred effect and there is a subtype -relation: ``proc () {.locks: N.}`` is a subtype of ``proc () {.locks: M.}`` -iff (M <= N). - -The ``locks`` pragma can also take the special value ``"unknown"``. This -is useful in the context of dynamic method dispatching. In the following -example, the compiler can infer a lock level of 0 for the ``base`` case. -However, one of the overloaded methods calls a procvar which is -potentially locking. Thus, the lock level of calling ``g.testMethod`` -cannot be inferred statically, leading to compiler warnings. By using -``{.locks: "unknown".}``, the base method can be marked explicitly as -having unknown lock level as well: - -.. code-block:: nim - type SomeBase* = ref object of RootObj - type SomeDerived* = ref object of SomeBase - memberProc*: proc () - - method testMethod(g: SomeBase) {.base, locks: "unknown".} = discard - method testMethod(g: SomeDerived) = - if g.memberProc != nil: - g.memberProc() - - -Taint mode -========== - -The Nim compiler and most parts of the standard library support -a taint mode. Input strings are declared with the `TaintedString`:idx: -string type declared in the ``system`` module. - -If the taint mode is turned on (via the ``--taintMode:on`` command line -option) it is a distinct string type which helps to detect input -validation errors: - -.. code-block:: nim - echo "your name: " - var name: TaintedString = stdin.readline - # it is safe here to output the name without any input validation, so - # we simply convert `name` to string to make the compiler happy: - echo "hi, ", name.string - -If the taint mode is turned off, ``TaintedString`` is simply an alias for -``string``. -- cgit 1.4.1-2-gfad0