1
1
# Rayon FAQ
2
2
3
- This file is for general questions that don't fit into the README or
4
- crate docs.
3
+ This file is for general questions that don't fit into the README or crate docs.
5
4
6
5
## How many threads will Rayon spawn?
7
6
8
- By default, Rayon uses the same number of threads as the number of
9
- CPUs available. Note that on systems with hyperthreading enabled this
10
- equals the number of logical cores and not the physical ones.
7
+ By default, Rayon uses the same number of threads as the number of CPUs
8
+ available. Note that on systems with hyperthreading enabled this equals the
9
+ number of logical cores and not the physical ones.
11
10
12
11
If you want to alter the number of threads spawned, you can set the
13
- environmental variable ` RAYON_NUM_THREADS ` to the desired number of
14
- threads or use the
12
+ environmental variable ` RAYON_NUM_THREADS ` to the desired number of threads or
13
+ use the
15
14
[ ` ThreadPoolBuilder::build_global ` function] ( https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global )
16
15
method.
17
16
18
17
## How does Rayon balance work between threads?
19
18
20
- Behind the scenes, Rayon uses a technique called ** work stealing** to
21
- try and dynamically ascertain how much parallelism is available and
22
- exploit it. The idea is very simple: we always have a pool of worker
23
- threads available, waiting for some work to do. When you call ` join `
24
- the first time, we shift over into that pool of threads. But if you
25
- call ` join(a, b) ` from a worker thread W, then W will place ` b ` into
26
- its work queue, advertising that this is work that other worker
27
- threads might help out with. W will then start executing ` a ` .
28
-
29
- While W is busy with ` a ` , other threads might come along and take ` b `
30
- from its queue. That is called * stealing* ` b ` . Once ` a ` is done, W
31
- checks whether ` b ` was stolen by another thread and, if not, executes
32
- ` b ` itself. If W runs out of jobs in its own queue, it will look
33
- through the other threads' queues and try to steal work from them.
34
-
35
- This technique is not new. It was first introduced by the
36
- [ Cilk project] [ cilk ] , done at MIT in the late nineties. The name Rayon
37
- is an homage to that work.
19
+ Behind the scenes, Rayon uses a technique called ** work stealing** to try and
20
+ dynamically ascertain how much parallelism is available and exploit it. The idea
21
+ is very simple: we always have a pool of worker threads available, waiting for
22
+ some work to do. When you call ` join ` the first time, we shift over into that
23
+ pool of threads. But if you call ` join(a, b) ` from a worker thread W, then W
24
+ will place ` b ` into its work queue, advertising that this is work that other
25
+ worker threads might help out with. W will then start executing ` a ` .
26
+
27
+ While W is busy with ` a ` , other threads might come along and take ` b ` from its
28
+ queue. That is called * stealing* ` b ` . Once ` a ` is done, W checks whether ` b ` was
29
+ stolen by another thread and, if not, executes ` b ` itself. If W runs out of jobs
30
+ in its own queue, it will look through the other threads' queues and try to
31
+ steal work from them.
32
+
33
+ This technique is not new. It was first introduced by the [ Cilk project] [ cilk ] ,
34
+ done at MIT in the late nineties. The name Rayon is an homage to that work.
38
35
39
36
[ cilk ] : http://supertech.csail.mit.edu/cilk/
40
37
41
38
## What should I do if I use ` Rc ` , ` Cell ` , ` RefCell ` or other non-Send-and-Sync types?
42
39
43
- There are a number of non-threadsafe types in the Rust standard
44
- library, and if your code is using them, you will not be able to
45
- combine it with Rayon. Similarly, even if you don't have such types,
46
- but you try to have multiple closures mutating the same state, you
47
- will get compilation errors; for example, this function won't work,
48
- because both closures access ` slice ` :
40
+ There are a number of non-threadsafe types in the Rust standard library, and if
41
+ your code is using them, you will not be able to combine it with Rayon.
42
+ Similarly, even if you don't have such types, but you try to have multiple
43
+ closures mutating the same state, you will get compilation errors; for example,
44
+ this function won't work, because both closures access ` slice ` :
49
45
50
46
``` rust
51
47
/// Increment all values in slice.
@@ -54,46 +50,45 @@ fn increment_all(slice: &mut [i32]) {
54
50
}
55
51
```
56
52
57
- The correct way to resolve such errors will depend on the case. Some
58
- cases are easy: for example, uses of [ ` Rc ` ] can typically be replaced
59
- with [ ` Arc ` ] , which is basically equivalent, but thread-safe.
53
+ The correct way to resolve such errors will depend on the case. Some cases are
54
+ easy: for example, uses of [ ` Rc ` ] can typically be replaced with [ ` Arc ` ] , which
55
+ is basically equivalent, but thread-safe.
60
56
61
- Code that uses ` Cell ` or ` RefCell ` , however, can be somewhat more
62
- complicated. If you can refactor your code to avoid those types, that
63
- is often the best way forward, but otherwise, you can try to replace
64
- those types with their threadsafe equivalents:
57
+ Code that uses ` Cell ` or ` RefCell ` , however, can be somewhat more complicated.
58
+ If you can refactor your code to avoid those types, that is often the best way
59
+ forward, but otherwise, you can try to replace those types with their threadsafe
60
+ equivalents:
65
61
66
62
- ` Cell ` -- replacement: ` AtomicUsize ` , ` AtomicBool ` , etc
67
63
- ` RefCell ` -- replacement: ` RwLock ` , or perhaps ` Mutex `
68
64
69
- However, you have to be wary! The parallel versions of these types
70
- have different atomicity guarantees. For example, with a ` Cell ` , you
71
- can increment a counter like so:
65
+ However, you have to be wary! The parallel versions of these types have
66
+ different atomicity guarantees. For example, with a ` Cell ` , you can increment a
67
+ counter like so:
72
68
73
69
``` rust
74
70
let value = counter . get ();
75
71
counter . set (value + 1 );
76
72
```
77
73
78
- But when you use the equivalent ` AtomicUsize ` methods, you are
79
- actually introducing a potential race condition (not a data race,
80
- technically, but it can be an awfully fine distinction):
74
+ But when you use the equivalent ` AtomicUsize ` methods, you are actually
75
+ introducing a potential race condition (not a data race, technically, but it can
76
+ be an awfully fine distinction):
81
77
82
78
``` rust
83
79
let value = tscounter . load (Ordering :: SeqCst );
84
80
tscounter . store (value + 1 , Ordering :: SeqCst );
85
81
```
86
82
87
- You can already see that the ` AtomicUsize ` API is a bit more complex,
88
- as it requires you to specify an
89
- [ ordering] ( https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html ) .
90
- (I won't go into the details on ordering here, but suffice to say that
91
- if you don't know what an ordering is, and probably even if you do,
92
- you should use ` Ordering::SeqCst ` .) The danger in this parallel
93
- version of the counter is that other threads might be running at the
94
- same time and they could cause our counter to get out of sync. For
95
- example, if we have two threads, then they might both execute the
96
- "load" before either has a chance to execute the "store":
83
+ You can already see that the ` AtomicUsize ` API is a bit more complex, as it
84
+ requires you to specify an
85
+ [ ordering] ( https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html ) . (I
86
+ won't go into the details on ordering here, but suffice to say that if you don't
87
+ know what an ordering is, and probably even if you do, you should use
88
+ ` Ordering::SeqCst ` .) The danger in this parallel version of the counter is that
89
+ other threads might be running at the same time and they could cause our counter
90
+ to get out of sync. For example, if we have two threads, then they might both
91
+ execute the "load" before either has a chance to execute the "store":
97
92
98
93
```
99
94
Thread 1 Thread 2
@@ -104,26 +99,23 @@ tscounter.store(value+1); tscounter.store(value+1);
104
99
// tscounter = X+1 // tscounter = X+1
105
100
```
106
101
107
- Now even though we've had two increments, we'll only increase the
108
- counter by one! Even though we've got no data race, this is still
109
- probably not the result we wanted. The problem here is that the ` Cell `
110
- API doesn't make clear the scope of a "transaction" -- that is, the
111
- set of reads/writes that should occur atomically. In this case, we
112
- probably wanted the get/set to occur together.
113
-
114
- In fact, when using the ` Atomic ` types, you very rarely want a plain
115
- ` load ` or plain ` store ` . You probably want the more complex
116
- operations. A counter, for example, would use ` fetch_add ` to
117
- atomically load and increment the value in one step. Compare-and-swap
118
- is another popular building block.
119
-
120
- A similar problem can arise when converting ` RefCell ` to ` RwLock ` , but
121
- it is somewhat less likely, because the ` RefCell ` API does in fact
122
- have a notion of a transaction: the scope of the handle returned by
123
- ` borrow ` or ` borrow_mut ` . So if you convert each call to ` borrow ` to
124
- ` read ` (and ` borrow_mut ` to ` write ` ), things will mostly work fine in
125
- a parallel setting, but there can still be changes in behavior.
126
- Consider using a ` handle: RefCell<Vec<i32>> ` like:
102
+ Now even though we've had two increments, we'll only increase the counter by
103
+ one! Even though we've got no data race, this is still probably not the result
104
+ we wanted. The problem here is that the ` Cell ` API doesn't make clear the scope
105
+ of a "transaction" -- that is, the set of reads/writes that should occur
106
+ atomically. In this case, we probably wanted the get/set to occur together.
107
+
108
+ In fact, when using the ` Atomic ` types, you very rarely want a plain ` load ` or
109
+ plain ` store ` . You probably want the more complex operations. A counter, for
110
+ example, would use ` fetch_add ` to atomically load and increment the value in one
111
+ step. Compare-and-swap is another popular building block.
112
+
113
+ A similar problem can arise when converting ` RefCell ` to ` RwLock ` , but it is
114
+ somewhat less likely, because the ` RefCell ` API does in fact have a notion of a
115
+ transaction: the scope of the handle returned by ` borrow ` or ` borrow_mut ` . So if
116
+ you convert each call to ` borrow ` to ` read ` (and ` borrow_mut ` to ` write ` ),
117
+ things will mostly work fine in a parallel setting, but there can still be
118
+ changes in behavior. Consider using a ` handle: RefCell<Vec<i32>> ` like:
127
119
128
120
``` rust
129
121
let len = handle . borrow (). len ();
@@ -133,13 +125,12 @@ for i in 0 .. len {
133
125
}
134
126
```
135
127
136
- In sequential code, we know that this loop is safe. But if we convert
137
- this to parallel code with an ` RwLock ` , we do not: this is because
138
- another thread could come along and do
139
- ` handle.write().unwrap().pop() ` , and thus change the length of the
140
- vector. In fact, even in * sequential* code, using very small borrow
141
- sections like this is an anti-pattern: you ought to be enclosing the
142
- entire transaction together, like so:
128
+ In sequential code, we know that this loop is safe. But if we convert this to
129
+ parallel code with an ` RwLock ` , we do not: this is because another thread could
130
+ come along and do ` handle.write().unwrap().pop() ` , and thus change the length of
131
+ the vector. In fact, even in * sequential* code, using very small borrow sections
132
+ like this is an anti-pattern: you ought to be enclosing the entire transaction
133
+ together, like so:
143
134
144
135
``` rust
145
136
let vec = handle . borrow ();
@@ -159,11 +150,10 @@ for data in vec {
159
150
}
160
151
```
161
152
162
- There are several reasons to prefer one borrow over many. The most
163
- obvious is that it is more efficient, since each borrow has to perform
164
- some safety checks. But it's also more reliable: suppose we modified
165
- the loop above to not just print things out, but also call into a
166
- helper function:
153
+ There are several reasons to prefer one borrow over many. The most obvious is
154
+ that it is more efficient, since each borrow has to perform some safety checks.
155
+ But it's also more reliable: suppose we modified the loop above to not just
156
+ print things out, but also call into a helper function:
167
157
168
158
``` rust
169
159
let vec = handle . borrow ();
@@ -172,45 +162,42 @@ for data in vec {
172
162
}
173
163
```
174
164
175
- And now suppose, independently, this helper fn evolved and had to pop
176
- something off of the vector:
165
+ And now suppose, independently, this helper fn evolved and had to pop something
166
+ off of the vector:
177
167
178
168
``` rust
179
169
fn helper (... ) {
180
170
handle . borrow_mut (). pop ();
181
171
}
182
172
```
183
173
184
- Under the old model, where we did lots of small borrows, this would
185
- yield precisely the same error that we saw in parallel land using an
186
- ` RwLock ` : the length would be out of sync and our indexing would fail
187
- (note that in neither case would there be an actual * data race* and
188
- hence there would never be undefined behavior). But now that we use a
189
- single borrow, we'll see a borrow error instead, which is much easier
190
- to diagnose, since it occurs at the point of the ` borrow_mut ` , rather
191
- than downstream. Similarly, if we move to an ` RwLock ` , we'll find that
192
- the code either deadlocks (if the write is on the same thread as the
193
- read) or, if the write is on another thread, works just fine. Both of
194
- these are preferable to random failures in my experience.
174
+ Under the old model, where we did lots of small borrows, this would yield
175
+ precisely the same error that we saw in parallel land using an ` RwLock ` : the
176
+ length would be out of sync and our indexing would fail (note that in neither
177
+ case would there be an actual * data race* and hence there would never be
178
+ undefined behavior). But now that we use a single borrow, we'll see a borrow
179
+ error instead, which is much easier to diagnose, since it occurs at the point of
180
+ the ` borrow_mut ` , rather than downstream. Similarly, if we move to an ` RwLock ` ,
181
+ we'll find that the code either deadlocks (if the write is on the same thread as
182
+ the read) or, if the write is on another thread, works just fine. Both of these
183
+ are preferable to random failures in my experience.
195
184
196
185
## But wait, isn't Rust supposed to free me from this kind of thinking?
197
186
198
- You might think that Rust is supposed to mean that you don't have to
199
- think about atomicity at all. In fact, if you avoid interior
200
- mutability (` Cell ` and ` RefCell ` in a sequential setting, or
201
- ` AtomicUsize ` , ` RwLock ` , ` Mutex ` , et al. in parallel code), then this
202
- is true: the type system will basically guarantee that you don't have
203
- to think about atomicity at all. But often there are times when you
204
- WANT threads to interleave in the ways I showed above.
205
-
206
- Consider for example when you are conducting a search in parallel, say
207
- to find the shortest route. To avoid fruitless search, you might want
208
- to keep a cell with the shortest route you've found thus far. This
209
- way, when you are searching down some path that's already longer than
210
- this shortest route, you can just stop and avoid wasted effort. In
211
- sequential land, you might model this "best result" as a shared value
212
- like ` Rc<Cell<usize>> ` (here the ` usize ` represents the length of best
213
- path found so far); in parallel land, you'd use a ` Arc<AtomicUsize> ` .
187
+ You might think that Rust is supposed to mean that you don't have to think about
188
+ atomicity at all. In fact, if you avoid interior mutability (` Cell ` and
189
+ ` RefCell ` in a sequential setting, or ` AtomicUsize ` , ` RwLock ` , ` Mutex ` , et al.
190
+ in parallel code), then this is true: the type system will basically guarantee
191
+ that you don't have to think about atomicity at all. But often there are times
192
+ when you WANT threads to interleave in the ways I showed above.
193
+
194
+ Consider for example when you are conducting a search in parallel, say to find
195
+ the shortest route. To avoid fruitless search, you might want to keep a cell
196
+ with the shortest route you've found thus far. This way, when you are searching
197
+ down some path that's already longer than this shortest route, you can just stop
198
+ and avoid wasted effort. In sequential land, you might model this "best result"
199
+ as a shared value like ` Rc<Cell<usize>> ` (here the ` usize ` represents the length
200
+ of best path found so far); in parallel land, you'd use a ` Arc<AtomicUsize> ` .
214
201
215
202
``` rust
216
203
fn search (path : & Path , cost_so_far : usize , best_cost : & AtomicUsize ) {
@@ -222,5 +209,5 @@ fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) {
222
209
}
223
210
```
224
211
225
- Now in this case, we really WANT to see results from other threads
226
- interjected into our execution!
212
+ Now in this case, we really WANT to see results from other threads interjected
213
+ into our execution!
0 commit comments