Skip to content

Commit f56b38b

Browse files
committed
Re-wrap the FAQ to 80 characters width
1 parent 77cfb14 commit f56b38b

File tree

1 file changed

+104
-117
lines changed

1 file changed

+104
-117
lines changed

FAQ.md

+104-117
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,47 @@
11
# Rayon FAQ
22

3-
This file is for general questions that don't fit into the README or
4-
crate docs.
3+
This file is for general questions that don't fit into the README or crate docs.
54

65
## How many threads will Rayon spawn?
76

8-
By default, Rayon uses the same number of threads as the number of
9-
CPUs available. Note that on systems with hyperthreading enabled this
10-
equals the number of logical cores and not the physical ones.
7+
By default, Rayon uses the same number of threads as the number of CPUs
8+
available. Note that on systems with hyperthreading enabled this equals the
9+
number of logical cores and not the physical ones.
1110

1211
If you want to alter the number of threads spawned, you can set the
13-
environmental variable `RAYON_NUM_THREADS` to the desired number of
14-
threads or use the
12+
environmental variable `RAYON_NUM_THREADS` to the desired number of threads or
13+
use the
1514
[`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global)
1615
method.
1716

1817
## How does Rayon balance work between threads?
1918

20-
Behind the scenes, Rayon uses a technique called **work stealing** to
21-
try and dynamically ascertain how much parallelism is available and
22-
exploit it. The idea is very simple: we always have a pool of worker
23-
threads available, waiting for some work to do. When you call `join`
24-
the first time, we shift over into that pool of threads. But if you
25-
call `join(a, b)` from a worker thread W, then W will place `b` into
26-
its work queue, advertising that this is work that other worker
27-
threads might help out with. W will then start executing `a`.
28-
29-
While W is busy with `a`, other threads might come along and take `b`
30-
from its queue. That is called *stealing* `b`. Once `a` is done, W
31-
checks whether `b` was stolen by another thread and, if not, executes
32-
`b` itself. If W runs out of jobs in its own queue, it will look
33-
through the other threads' queues and try to steal work from them.
34-
35-
This technique is not new. It was first introduced by the
36-
[Cilk project][cilk], done at MIT in the late nineties. The name Rayon
37-
is an homage to that work.
19+
Behind the scenes, Rayon uses a technique called **work stealing** to try and
20+
dynamically ascertain how much parallelism is available and exploit it. The idea
21+
is very simple: we always have a pool of worker threads available, waiting for
22+
some work to do. When you call `join` the first time, we shift over into that
23+
pool of threads. But if you call `join(a, b)` from a worker thread W, then W
24+
will place `b` into its work queue, advertising that this is work that other
25+
worker threads might help out with. W will then start executing `a`.
26+
27+
While W is busy with `a`, other threads might come along and take `b` from its
28+
queue. That is called *stealing* `b`. Once `a` is done, W checks whether `b` was
29+
stolen by another thread and, if not, executes `b` itself. If W runs out of jobs
30+
in its own queue, it will look through the other threads' queues and try to
31+
steal work from them.
32+
33+
This technique is not new. It was first introduced by the [Cilk project][cilk],
34+
done at MIT in the late nineties. The name Rayon is an homage to that work.
3835

3936
[cilk]: http://supertech.csail.mit.edu/cilk/
4037

4138
## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types?
4239

43-
There are a number of non-threadsafe types in the Rust standard
44-
library, and if your code is using them, you will not be able to
45-
combine it with Rayon. Similarly, even if you don't have such types,
46-
but you try to have multiple closures mutating the same state, you
47-
will get compilation errors; for example, this function won't work,
48-
because both closures access `slice`:
40+
There are a number of non-threadsafe types in the Rust standard library, and if
41+
your code is using them, you will not be able to combine it with Rayon.
42+
Similarly, even if you don't have such types, but you try to have multiple
43+
closures mutating the same state, you will get compilation errors; for example,
44+
this function won't work, because both closures access `slice`:
4945

5046
```rust
5147
/// Increment all values in slice.
@@ -54,46 +50,45 @@ fn increment_all(slice: &mut [i32]) {
5450
}
5551
```
5652

57-
The correct way to resolve such errors will depend on the case. Some
58-
cases are easy: for example, uses of [`Rc`] can typically be replaced
59-
with [`Arc`], which is basically equivalent, but thread-safe.
53+
The correct way to resolve such errors will depend on the case. Some cases are
54+
easy: for example, uses of [`Rc`] can typically be replaced with [`Arc`], which
55+
is basically equivalent, but thread-safe.
6056

61-
Code that uses `Cell` or `RefCell`, however, can be somewhat more
62-
complicated. If you can refactor your code to avoid those types, that
63-
is often the best way forward, but otherwise, you can try to replace
64-
those types with their threadsafe equivalents:
57+
Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated.
58+
If you can refactor your code to avoid those types, that is often the best way
59+
forward, but otherwise, you can try to replace those types with their threadsafe
60+
equivalents:
6561

6662
- `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc
6763
- `RefCell` -- replacement: `RwLock`, or perhaps `Mutex`
6864

69-
However, you have to be wary! The parallel versions of these types
70-
have different atomicity guarantees. For example, with a `Cell`, you
71-
can increment a counter like so:
65+
However, you have to be wary! The parallel versions of these types have
66+
different atomicity guarantees. For example, with a `Cell`, you can increment a
67+
counter like so:
7268

7369
```rust
7470
let value = counter.get();
7571
counter.set(value + 1);
7672
```
7773

78-
But when you use the equivalent `AtomicUsize` methods, you are
79-
actually introducing a potential race condition (not a data race,
80-
technically, but it can be an awfully fine distinction):
74+
But when you use the equivalent `AtomicUsize` methods, you are actually
75+
introducing a potential race condition (not a data race, technically, but it can
76+
be an awfully fine distinction):
8177

8278
```rust
8379
let value = tscounter.load(Ordering::SeqCst);
8480
tscounter.store(value + 1, Ordering::SeqCst);
8581
```
8682

87-
You can already see that the `AtomicUsize` API is a bit more complex,
88-
as it requires you to specify an
89-
[ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html).
90-
(I won't go into the details on ordering here, but suffice to say that
91-
if you don't know what an ordering is, and probably even if you do,
92-
you should use `Ordering::SeqCst`.) The danger in this parallel
93-
version of the counter is that other threads might be running at the
94-
same time and they could cause our counter to get out of sync. For
95-
example, if we have two threads, then they might both execute the
96-
"load" before either has a chance to execute the "store":
83+
You can already see that the `AtomicUsize` API is a bit more complex, as it
84+
requires you to specify an
85+
[ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I
86+
won't go into the details on ordering here, but suffice to say that if you don't
87+
know what an ordering is, and probably even if you do, you should use
88+
`Ordering::SeqCst`.) The danger in this parallel version of the counter is that
89+
other threads might be running at the same time and they could cause our counter
90+
to get out of sync. For example, if we have two threads, then they might both
91+
execute the "load" before either has a chance to execute the "store":
9792

9893
```
9994
Thread 1 Thread 2
@@ -104,26 +99,23 @@ tscounter.store(value+1); tscounter.store(value+1);
10499
// tscounter = X+1 // tscounter = X+1
105100
```
106101

107-
Now even though we've had two increments, we'll only increase the
108-
counter by one! Even though we've got no data race, this is still
109-
probably not the result we wanted. The problem here is that the `Cell`
110-
API doesn't make clear the scope of a "transaction" -- that is, the
111-
set of reads/writes that should occur atomically. In this case, we
112-
probably wanted the get/set to occur together.
113-
114-
In fact, when using the `Atomic` types, you very rarely want a plain
115-
`load` or plain `store`. You probably want the more complex
116-
operations. A counter, for example, would use `fetch_add` to
117-
atomically load and increment the value in one step. Compare-and-swap
118-
is another popular building block.
119-
120-
A similar problem can arise when converting `RefCell` to `RwLock`, but
121-
it is somewhat less likely, because the `RefCell` API does in fact
122-
have a notion of a transaction: the scope of the handle returned by
123-
`borrow` or `borrow_mut`. So if you convert each call to `borrow` to
124-
`read` (and `borrow_mut` to `write`), things will mostly work fine in
125-
a parallel setting, but there can still be changes in behavior.
126-
Consider using a `handle: RefCell<Vec<i32>>` like:
102+
Now even though we've had two increments, we'll only increase the counter by
103+
one! Even though we've got no data race, this is still probably not the result
104+
we wanted. The problem here is that the `Cell` API doesn't make clear the scope
105+
of a "transaction" -- that is, the set of reads/writes that should occur
106+
atomically. In this case, we probably wanted the get/set to occur together.
107+
108+
In fact, when using the `Atomic` types, you very rarely want a plain `load` or
109+
plain `store`. You probably want the more complex operations. A counter, for
110+
example, would use `fetch_add` to atomically load and increment the value in one
111+
step. Compare-and-swap is another popular building block.
112+
113+
A similar problem can arise when converting `RefCell` to `RwLock`, but it is
114+
somewhat less likely, because the `RefCell` API does in fact have a notion of a
115+
transaction: the scope of the handle returned by `borrow` or `borrow_mut`. So if
116+
you convert each call to `borrow` to `read` (and `borrow_mut` to `write`),
117+
things will mostly work fine in a parallel setting, but there can still be
118+
changes in behavior. Consider using a `handle: RefCell<Vec<i32>>` like:
127119

128120
```rust
129121
let len = handle.borrow().len();
@@ -133,13 +125,12 @@ for i in 0 .. len {
133125
}
134126
```
135127

136-
In sequential code, we know that this loop is safe. But if we convert
137-
this to parallel code with an `RwLock`, we do not: this is because
138-
another thread could come along and do
139-
`handle.write().unwrap().pop()`, and thus change the length of the
140-
vector. In fact, even in *sequential* code, using very small borrow
141-
sections like this is an anti-pattern: you ought to be enclosing the
142-
entire transaction together, like so:
128+
In sequential code, we know that this loop is safe. But if we convert this to
129+
parallel code with an `RwLock`, we do not: this is because another thread could
130+
come along and do `handle.write().unwrap().pop()`, and thus change the length of
131+
the vector. In fact, even in *sequential* code, using very small borrow sections
132+
like this is an anti-pattern: you ought to be enclosing the entire transaction
133+
together, like so:
143134

144135
```rust
145136
let vec = handle.borrow();
@@ -159,11 +150,10 @@ for data in vec {
159150
}
160151
```
161152

162-
There are several reasons to prefer one borrow over many. The most
163-
obvious is that it is more efficient, since each borrow has to perform
164-
some safety checks. But it's also more reliable: suppose we modified
165-
the loop above to not just print things out, but also call into a
166-
helper function:
153+
There are several reasons to prefer one borrow over many. The most obvious is
154+
that it is more efficient, since each borrow has to perform some safety checks.
155+
But it's also more reliable: suppose we modified the loop above to not just
156+
print things out, but also call into a helper function:
167157

168158
```rust
169159
let vec = handle.borrow();
@@ -172,45 +162,42 @@ for data in vec {
172162
}
173163
```
174164

175-
And now suppose, independently, this helper fn evolved and had to pop
176-
something off of the vector:
165+
And now suppose, independently, this helper fn evolved and had to pop something
166+
off of the vector:
177167

178168
```rust
179169
fn helper(...) {
180170
handle.borrow_mut().pop();
181171
}
182172
```
183173

184-
Under the old model, where we did lots of small borrows, this would
185-
yield precisely the same error that we saw in parallel land using an
186-
`RwLock`: the length would be out of sync and our indexing would fail
187-
(note that in neither case would there be an actual *data race* and
188-
hence there would never be undefined behavior). But now that we use a
189-
single borrow, we'll see a borrow error instead, which is much easier
190-
to diagnose, since it occurs at the point of the `borrow_mut`, rather
191-
than downstream. Similarly, if we move to an `RwLock`, we'll find that
192-
the code either deadlocks (if the write is on the same thread as the
193-
read) or, if the write is on another thread, works just fine. Both of
194-
these are preferable to random failures in my experience.
174+
Under the old model, where we did lots of small borrows, this would yield
175+
precisely the same error that we saw in parallel land using an `RwLock`: the
176+
length would be out of sync and our indexing would fail (note that in neither
177+
case would there be an actual *data race* and hence there would never be
178+
undefined behavior). But now that we use a single borrow, we'll see a borrow
179+
error instead, which is much easier to diagnose, since it occurs at the point of
180+
the `borrow_mut`, rather than downstream. Similarly, if we move to an `RwLock`,
181+
we'll find that the code either deadlocks (if the write is on the same thread as
182+
the read) or, if the write is on another thread, works just fine. Both of these
183+
are preferable to random failures in my experience.
195184

196185
## But wait, isn't Rust supposed to free me from this kind of thinking?
197186

198-
You might think that Rust is supposed to mean that you don't have to
199-
think about atomicity at all. In fact, if you avoid interior
200-
mutability (`Cell` and `RefCell` in a sequential setting, or
201-
`AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this
202-
is true: the type system will basically guarantee that you don't have
203-
to think about atomicity at all. But often there are times when you
204-
WANT threads to interleave in the ways I showed above.
205-
206-
Consider for example when you are conducting a search in parallel, say
207-
to find the shortest route. To avoid fruitless search, you might want
208-
to keep a cell with the shortest route you've found thus far. This
209-
way, when you are searching down some path that's already longer than
210-
this shortest route, you can just stop and avoid wasted effort. In
211-
sequential land, you might model this "best result" as a shared value
212-
like `Rc<Cell<usize>>` (here the `usize` represents the length of best
213-
path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
187+
You might think that Rust is supposed to mean that you don't have to think about
188+
atomicity at all. In fact, if you avoid interior mutability (`Cell` and
189+
`RefCell` in a sequential setting, or `AtomicUsize`, `RwLock`, `Mutex`, et al.
190+
in parallel code), then this is true: the type system will basically guarantee
191+
that you don't have to think about atomicity at all. But often there are times
192+
when you WANT threads to interleave in the ways I showed above.
193+
194+
Consider for example when you are conducting a search in parallel, say to find
195+
the shortest route. To avoid fruitless search, you might want to keep a cell
196+
with the shortest route you've found thus far. This way, when you are searching
197+
down some path that's already longer than this shortest route, you can just stop
198+
and avoid wasted effort. In sequential land, you might model this "best result"
199+
as a shared value like `Rc<Cell<usize>>` (here the `usize` represents the length
200+
of best path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`.
214201

215202
```rust
216203
fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) {
@@ -222,5 +209,5 @@ fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) {
222209
}
223210
```
224211

225-
Now in this case, we really WANT to see results from other threads
226-
interjected into our execution!
212+
Now in this case, we really WANT to see results from other threads interjected
213+
into our execution!

0 commit comments

Comments
 (0)