| # Combiner Explanation |
| ## Talk by ctiller, notes by vjpai |
| |
| Typical way of doing critical section |
| |
| ``` |
| mu.lock() |
| do_stuff() |
| mu.unlock() |
| ``` |
| |
| An alternative way of doing it is |
| |
| ``` |
| class combiner { |
| run(f) { |
| mu.lock() |
| f() |
| mu.unlock() |
| } |
| mutex mu; |
| } |
| |
| combiner.run(do_stuff) |
| ``` |
| |
| If you have two threads calling combiner, there will be some kind of |
| queuing in place. It's called `combiner` because you can pass in more |
| than one do_stuff at once and they will run under a common `mu`. |
| |
| The implementation described above has the issue that you're blocking a thread |
| for a period of time, and this is considered harmful because it's an application thread that you're blocking. |
| |
| Instead, get a new property: |
| * Keep things running in serial execution |
| * Don't ever sleep the thread |
| * But maybe allow things to end up running on a different thread from where they were started |
| * This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked |
| |
| ``` |
| class combiner { |
| mpscq q; // multi-producer single-consumer queue can be made non-blocking |
| state s; // is it empty or executing |
| |
| run(f) { |
| if (q.push(f)) { |
| // q.push returns true if it's the first thing |
| while (q.pop(&f)) { // modulo some extra work to avoid races |
| f(); |
| } |
| } |
| } |
| } |
| ``` |
| |
| The basic idea is that the first one to push onto the combiner |
| executes the work and then keeps executing functions from the queue |
| until the combiner is drained. |
| |
| Our combiner does some additional work, with the motivation of write-batching. |
| |
| We have a second tier of `run` called `run_finally`. Anything queued |
| onto `run_finally` runs after we have drained the queue. That means |
| that there is essentially a finally-queue. This is not guaranteed to |
| be final, but it's best-effort. In the process of running the finally |
| item, we might put something onto the main combiner queue and so we'll |
| need to re-enter. |
| |
| `chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write. |
| |
| ``` |
| class combiner { |
| mpscq q; // multi-producer single-consumer queue can be made non-blocking |
| state s; // is it empty or executing |
| queue finally; // you can only do run_finally when you are already running something from the combiner |
| |
| run(f) { |
| if (q.push(f)) { |
| // q.push returns true if it's the first thing |
| loop: |
| while (q.pop(&f)) { // modulo some extra work to avoid races |
| f(); |
| } |
| while (finally.pop(&f)) { |
| f(); |
| } |
| goto loop; |
| } |
| } |
| } |
| ``` |
| |
| So that explains how combiners work in general. In gRPC, there is |
| `start_batch(..., tag)` and then work only gets activated by somebody |
| calling `cq::next` which returns a tag. This gives an API-level |
| guarantee that there will be a thread doing polling to actually make |
| work happen. However, some operations are not covered by a poller |
| thread, such as cancellation that doesn't have a completion. Other |
| callbacks that don't have a completion are the internal work that gets |
| done before the batch gets completed. We need a condition called |
| `covered_by_poller` that means that the item will definitely need some |
| thread at some point to call `cq::next` . This includes those |
| callbacks that directly cause a completion but also those that are |
| indirectly required before getting a completion. If we can't tell for |
| sure for a specific path, we have to assumed it is not covered by |
| poller. |
| |
| The above combiner has the problem that it keeps draining for a |
| potentially infinite amount of time and that can lead to a huge tail |
| latency for some operations. So we can tweak it by returning to the application |
| if we know that it is valid to do so: |
| |
| ``` |
| while (q.pop(&f)) { |
| f(); |
| if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) { |
| offload_combiner_work_to_some_other_thread(); |
| } |
| } |
| ``` |
| |
| `offload` is more than `break`; it does `break` but also causes some |
| other thread that is currently waiting on a poll to break out of its |
| poll. This is done by setting up a per-polling-island work-queue |
| (distributor) wakeup FD. The work-queue is the converse of the combiner; it |
| tries to spray events onto as many threads as possible to get as much concurrency as possible. |
| |
| So `offload` really does: |
| |
| ``` |
| workqueue.run(continue_from_while_loop); |
| break; |
| ``` |
| |
| This needs us to add another class variable for a `workqueue` |
| (which is really conceptually a distributor). |
| |
| ``` |
| workqueue::run(f) { |
| q.push(f) |
| eventfd.wakeup() |
| } |
| |
| workqueue::readable() { |
| eventfd.consume(); |
| q.pop(&f); |
| f(); |
| if (!q.empty()) { |
| eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue |
| } |
| } |
| ``` |
| |
| In principle, `run_finally` could get starved, but this hasn't |
| happened in practice. If we were concerned about this, we could put a |
| limit on how many things come off the regular `q` before the `finally` |
| queue gets processed. |
| |