| # Rayon FAQ |
| |
| This file is for general questions that don't fit into the README or |
| crate docs. |
| |
| ## How many threads will Rayon spawn? |
| |
| By default, Rayon uses the same number of threads as the number of |
| CPUs available. Note that on systems with hyperthreading enabled this |
| equals the number of logical cores and not the physical ones. |
| |
| If you want to alter the number of threads spawned, you can set the |
| environmental variable `RAYON_NUM_THREADS` to the desired number of |
| threads or use the |
| [`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global) |
| method. |
| |
| ## How does Rayon balance work between threads? |
| |
| Behind the scenes, Rayon uses a technique called **work stealing** to |
| try and dynamically ascertain how much parallelism is available and |
| exploit it. The idea is very simple: we always have a pool of worker |
| threads available, waiting for some work to do. When you call `join` |
| the first time, we shift over into that pool of threads. But if you |
| call `join(a, b)` from a worker thread W, then W will place `b` into |
| its work queue, advertising that this is work that other worker |
| threads might help out with. W will then start executing `a`. |
| |
| While W is busy with `a`, other threads might come along and take `b` |
| from its queue. That is called *stealing* `b`. Once `a` is done, W |
| checks whether `b` was stolen by another thread and, if not, executes |
| `b` itself. If W runs out of jobs in its own queue, it will look |
| through the other threads' queues and try to steal work from them. |
| |
| This technique is not new. It was first introduced by the |
| [Cilk project][cilk], done at MIT in the late nineties. The name Rayon |
| is an homage to that work. |
| |
| [cilk]: http://supertech.csail.mit.edu/cilk/ |
| |
| ## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types? |
| |
| There are a number of non-threadsafe types in the Rust standard library, |
| and if your code is using them, you will not be able to combine it |
| with Rayon. Similarly, even if you don't have such types, but you try |
| to have multiple closures mutating the same state, you will get |
| compilation errors; for example, this function won't work, because |
| both closures access `slice`: |
| |
| ```rust |
| /// Increment all values in slice. |
| fn increment_all(slice: &mut [i32]) { |
| rayon::join(|| process(slice), || process(slice)); |
| } |
| ``` |
| |
| The correct way to resolve such errors will depend on the case. Some |
| cases are easy: for example, uses of [`Rc`] can typically be replaced |
| with [`Arc`], which is basically equivalent, but thread-safe. |
| |
| Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated. |
| If you can refactor your code to avoid those types, that is often the best way |
| forward, but otherwise, you can try to replace those types with their threadsafe |
| equivalents: |
| |
| - `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc |
| - `RefCell` -- replacement: `RwLock`, or perhaps `Mutex` |
| |
| However, you have to be wary! The parallel versions of these types |
| have different atomicity guarantees. For example, with a `Cell`, you |
| can increment a counter like so: |
| |
| ```rust |
| let value = counter.get(); |
| counter.set(value + 1); |
| ``` |
| |
| But when you use the equivalent `AtomicUsize` methods, you are |
| actually introducing a potential race condition (not a data race, |
| technically, but it can be an awfully fine distinction): |
| |
| ```rust |
| let value = tscounter.load(Ordering::SeqCst); |
| tscounter.store(value + 1, Ordering::SeqCst); |
| ``` |
| |
| You can already see that the `AtomicUsize` API is a bit more complex, |
| as it requires you to specify an |
| [ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I |
| won't go into the details on ordering here, but suffice to say that if |
| you don't know what an ordering is, and probably even if you do, you |
| should use `Ordering::SeqCst`.) The danger in this parallel version of |
| the counter is that other threads might be running at the same time |
| and they could cause our counter to get out of sync. For example, if |
| we have two threads, then they might both execute the "load" before |
| either has a chance to execute the "store": |
| |
| ``` |
| Thread 1 Thread 2 |
| let value = tscounter.load(Ordering::SeqCst); |
| // value = X let value = tscounter.load(Ordering::SeqCst); |
| // value = X |
| tscounter.store(value+1); tscounter.store(value+1); |
| // tscounter = X+1 // tscounter = X+1 |
| ``` |
| |
| Now even though we've had two increments, we'll only increase the |
| counter by one! Even though we've got no data race, this is still |
| probably not the result we wanted. The problem here is that the `Cell` |
| API doesn't make clear the scope of a "transaction" -- that is, the |
| set of reads/writes that should occur atomically. In this case, we |
| probably wanted the get/set to occur together. |
| |
| In fact, when using the `Atomic` types, you very rarely want a plain |
| `load` or plain `store`. You probably want the more complex |
| operations. A counter, for example, would use `fetch_add` to |
| atomically load and increment the value in one step. Compare-and-swap |
| is another popular building block. |
| |
| A similar problem can arise when converting `RefCell` to `RwLock`, but |
| it is somewhat less likely, because the `RefCell` API does in fact |
| have a notion of a transaction: the scope of the handle returned by |
| `borrow` or `borrow_mut`. So if you convert each call to `borrow` to |
| `read` (and `borrow_mut` to `write`), things will mostly work fine in |
| a parallel setting, but there can still be changes in behavior. |
| Consider using a `handle: RefCell<Vec<i32>>` like: |
| |
| ```rust |
| let len = handle.borrow().len(); |
| for i in 0 .. len { |
| let data = handle.borrow()[i]; |
| println!("{}", data); |
| } |
| ``` |
| |
| In sequential code, we know that this loop is safe. But if we convert |
| this to parallel code with an `RwLock`, we do not: this is because |
| another thread could come along and do |
| `handle.write().unwrap().pop()`, and thus change the length of the |
| vector. In fact, even in *sequential* code, using very small borrow |
| sections like this is an anti-pattern: you ought to be enclosing the |
| entire transaction together, like so: |
| |
| ```rust |
| let vec = handle.borrow(); |
| let len = vec.len(); |
| for i in 0 .. len { |
| let data = vec[i]; |
| println!("{}", data); |
| } |
| ``` |
| |
| Or, even better, using an iterator instead of indexing: |
| |
| ```rust |
| let vec = handle.borrow(); |
| for data in vec { |
| println!("{}", data); |
| } |
| ``` |
| |
| There are several reasons to prefer one borrow over many. The most |
| obvious is that it is more efficient, since each borrow has to perform |
| some safety checks. But it's also more reliable: suppose we modified |
| the loop above to not just print things out, but also call into a |
| helper function: |
| |
| ```rust |
| let vec = handle.borrow(); |
| for data in vec { |
| helper(...); |
| } |
| ``` |
| |
| And now suppose, independently, this helper fn evolved and had to pop |
| something off of the vector: |
| |
| ```rust |
| fn helper(...) { |
| handle.borrow_mut().pop(); |
| } |
| ``` |
| |
| Under the old model, where we did lots of small borrows, this would |
| yield precisely the same error that we saw in parallel land using an |
| `RwLock`: the length would be out of sync and our indexing would fail |
| (note that in neither case would there be an actual *data race* and |
| hence there would never be undefined behavior). But now that we use a |
| single borrow, we'll see a borrow error instead, which is much easier |
| to diagnose, since it occurs at the point of the `borrow_mut`, rather |
| than downstream. Similarly, if we move to an `RwLock`, we'll find that |
| the code either deadlocks (if the write is on the same thread as the |
| read) or, if the write is on another thread, works just fine. Both of |
| these are preferable to random failures in my experience. |
| |
| ## But wait, isn't Rust supposed to free me from this kind of thinking? |
| |
| You might think that Rust is supposed to mean that you don't have to |
| think about atomicity at all. In fact, if you avoid interior |
| mutability (`Cell` and `RefCell` in a sequential setting, or |
| `AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this |
| is true: the type system will basically guarantee that you don't have |
| to think about atomicity at all. But often there are times when you |
| WANT threads to interleave in the ways I showed above. |
| |
| Consider for example when you are conducting a search in parallel, say |
| to find the shortest route. To avoid fruitless search, you might want |
| to keep a cell with the shortest route you've found thus far. This |
| way, when you are searching down some path that's already longer than |
| this shortest route, you can just stop and avoid wasted effort. In |
| sequential land, you might model this "best result" as a shared value |
| like `Rc<Cell<usize>>` (here the `usize` represents the length of best |
| path found so far); in parallel land, you'd use a `Arc<AtomicUsize>`. |
| Now we can make our search function look like: |
| |
| ```rust |
| fn search(path: &Path, cost_so_far: usize, best_cost: &Arc<AtomicUsize>) { |
| if cost_so_far >= best_cost.load(Ordering::SeqCst) { |
| return; |
| } |
| ... |
| best_cost.store(...); |
| } |
| ``` |
| |
| Now in this case, we really WANT to see results from other threads |
| interjected into our execution! |