1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
|
Threads
=======
To enable thread support the ``--threads:on`` command line switch needs to
be used. The ``system`` module then contains several threading primitives.
See the `threads <threads.html>`_ and `channels <channels.html>`_ modules
for the low level thread API. There are also high level parallelism constructs
available. See `spawn`_ for further details.
Nim's memory model for threads is quite different than that of other common
programming languages (C, Pascal, Java): Each thread has its own (garbage
collected) heap and sharing of memory is restricted to global variables. This
helps to prevent race conditions. GC efficiency is improved quite a lot,
because the GC never has to stop other threads and see what they reference.
Memory allocation requires no lock at all! This design easily scales to massive
multicore processors that are becoming the norm.
Thread pragma
-------------
A proc that is executed as a new thread of execution should be marked by the
``thread`` pragma for reasons of readability. The compiler checks for
violations of the `no heap sharing restriction`:idx:\: This restriction implies
that it is invalid to construct a data structure that consists of memory
allocated from different (thread local) heaps.
A thread proc is passed to ``createThread`` or ``spawn`` and invoked
indirectly; so the ``thread`` pragma implies ``procvar``.
GC safety
---------
We call a proc ``p`` `GC safe`:idx: when it doesn't access any global variable
that contains GC'ed memory (``string``, ``seq``, ``ref`` or a closure) either
directly or indirectly through a call to a GC unsafe proc.
The `gcsafe`:idx: annotation can be used to mark a proc to be gcsafe,
otherwise this property is inferred by the compiler. Note that ``noSideEfect``
implies ``gcsafe``. The only way to create a thread is via ``spawn`` or
``createThead``. ``spawn`` is usually the preferable method. Either way
the invoked proc must not use ``var`` parameters nor must any of its parameters
contain a ``ref`` or ``closure`` type. This enforces
the *no heap sharing restriction*.
Routines that are imported from C are always assumed to be ``gcsafe``.
To enable the GC-safety checking the ``--threadAnalysis:on`` command line
switch must be used. This is a temporary workaround to ease the porting effort
from old code to the new threading model. In the future the thread analysis
will always be performed.
Future directions:
- A shared GC'ed heap might be provided.
Threadvar pragma
----------------
A global variable can be marked with the ``threadvar`` pragma; it is
a `thread-local`:idx: variable then:
.. code-block:: nim
var checkpoints* {.threadvar.}: seq[string]
Due to implementation restrictions thread local variables cannot be
initialized within the ``var`` section. (Every thread local variable needs to
be replicated at thread creation.)
Threads and exceptions
----------------------
The interaction between threads and exceptions is simple: A *handled* exception
in one thread cannot affect any other thread. However, an *unhandled* exception
in one thread terminates the whole *process*!
Parallel & Spawn
================
Nim has two flavors of parallelism:
1) `Structured`:idx: parallelism via the ``parallel`` statement.
2) `Unstructured`:idx: parallelism via the standalone ``spawn`` statement.
Nim has a builtin thread pool that can be used for CPU intensive tasks. For
IO intensive tasks the ``async`` and ``await`` features should be
used instead. Both parallel and spawn need the `threadpool <threadpool.html>`_
module to work.
Somewhat confusingly, ``spawn`` is also used in the ``parallel`` statement
with slightly different semantics. ``spawn`` always takes a call expression of
the form ``f(a, ...)``. Let ``T`` be ``f``'s return type. If ``T`` is ``void``
then ``spawn``'s return type is also ``void`` otherwise it is ``FlowVar[T]``.
Within a ``parallel`` section sometimes the ``FlowVar[T]`` is eliminated
to ``T``. This happens when ``T`` does not contain any GC'ed memory.
The compiler can ensure the location in ``location = spawn f(...)`` is not
read prematurely within a ``parallel`` section and so there is no need for
the overhead of an indirection via ``FlowVar[T]`` to ensure correctness.
**Note**: Currently exceptions are not propagated between ``spawn``'ed tasks!
Spawn statement
---------------
`spawn`:idx: can be used to pass a task to the thread pool:
.. code-block:: nim
import threadpool
proc processLine(line: string) =
discard "do some heavy lifting here"
for x in lines("myinput.txt"):
spawn processLine(x)
sync()
For reasons of type safety and implementation simplicity the expression
that ``spawn`` takes is restricted:
* It must be a call expression ``f(a, ...)``.
* ``f`` must be ``gcsafe``.
* ``f`` must not have the calling convention ``closure``.
* ``f``'s parameters may not be of type ``var``.
This means one has to use raw ``ptr``'s for data passing reminding the
programmer to be careful.
* ``ref`` parameters are deeply copied which is a subtle semantic change and
can cause performance problems but ensures memory safety. This deep copy
is performed via ``system.deepCopy`` and so can be overriden.
* For *safe* data exchange between ``f`` and the caller a global ``TChannel``
needs to be used. However, since spawn can return a result, often no further
communication is required.
``spawn`` executes the passed expression on the thread pool and returns
a `data flow variable`:idx: ``FlowVar[T]`` that can be read from. The reading
with the ``^`` operator is **blocking**. However, one can use ``awaitAny`` to
wait on multiple flow variables at the same time:
.. code-block:: nim
import threadpool, ...
# wait until 2 out of 3 servers received the update:
proc main =
var responses = newSeq[RawFlowVar](3)
for i in 0..2:
responses[i] = spawn tellServer(Update, "key", "value")
var index = awaitAny(responses)
assert index >= 0
responses.del(index)
discard awaitAny(responses)
Data flow variables ensure that no data races
are possible. Due to technical limitations not every type ``T`` is possible in
a data flow variable: ``T`` has to be of the type ``ref``, ``string``, ``seq``
or of a type that doesn't contain a type that is garbage collected. This
restriction is not hard to work-around in practice.
Parallel statement
------------------
Example:
.. code-block:: nim
# Compute PI in an inefficient way
import strutils, math, threadpool
proc term(k: float): float = 4 * math.pow(-1, k) / (2*k + 1)
proc pi(n: int): float =
var ch = newSeq[float](n+1)
parallel:
for k in 0..ch.high:
ch[k] = spawn term(float(k))
for k in 0..ch.high:
result += ch[k]
echo formatFloat(pi(5000))
The parallel statement is the preferred mechanism to introduce parallelism
in a Nim program. A subset of the Nim language is valid within a
``parallel`` section. This subset is checked to be free of data races at
compile time. A sophisticated `disjoint checker`:idx: ensures that no data
races are possible even though shared memory is extensively supported!
The subset is in fact the full language with the following
restrictions / changes:
* ``spawn`` within a ``parallel`` section has special semantics.
* Every location of the form ``a[i]`` and ``a[i..j]`` and ``dest`` where
``dest`` is part of the pattern ``dest = spawn f(...)`` has to be
provably disjoint. This is called the *disjoint check*.
* Every other complex location ``loc`` that is used in a spawned
proc (``spawn f(loc)``) has to be immutable for the duration of
the ``parallel`` section. This is called the *immutability check*. Currently
it is not specified what exactly "complex location" means. We need to make
this an optimization!
* Every array access has to be provably within bounds. This is called
the *bounds check*.
* Slices are optimized so that no copy is performed. This optimization is not
yet performed for ordinary slices outside of a ``parallel`` section. Slices
are also special in that they currently do not support negative indexes!
|