Vijay Pai | 14ea772 | 2017-03-17 10:29:20 -0700 | [diff] [blame] | 1 | # Combiner Explanation |
| 2 | ## Talk by ctiller, notes by vjpai |
| 3 | |
| 4 | Typical way of doing critical section |
| 5 | |
| 6 | ``` |
| 7 | mu.lock() |
| 8 | do_stuff() |
| 9 | mu.unlock() |
| 10 | ``` |
| 11 | |
| 12 | An alternative way of doing it is |
| 13 | |
| 14 | ``` |
| 15 | class combiner { |
| 16 | run(f) { |
| 17 | mu.lock() |
| 18 | f() |
| 19 | mu.unlock() |
| 20 | } |
| 21 | mutex mu; |
| 22 | } |
| 23 | |
| 24 | combiner.run(do_stuff) |
| 25 | ``` |
| 26 | |
| 27 | If you have two threads calling combiner, there will be some kind of |
| 28 | queuing in place. It's called `combiner` because you can pass in more |
| 29 | than one do_stuff at once and they will run under a common `mu`. |
| 30 | |
| 31 | The implementation described above has the issue that you're blocking a thread |
| 32 | for a period of time, and this is considered harmful because it's an application thread that you're blocking. |
| 33 | |
| 34 | Instead, get a new property: |
| 35 | * Keep things running in serial execution |
| 36 | * Don't ever sleep the thread |
| 37 | * But maybe allow things to end up running on a different thread from where they were started |
| 38 | * This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked |
| 39 | |
| 40 | ``` |
| 41 | class combiner { |
| 42 | mpscq q; // multi-producer single-consumer queue can be made non-blocking |
| 43 | state s; // is it empty or executing |
| 44 | |
| 45 | run(f) { |
| 46 | if (q.push(f)) { |
| 47 | // q.push returns true if it's the first thing |
| 48 | while (q.pop(&f)) { // modulo some extra work to avoid races |
| 49 | f(); |
| 50 | } |
| 51 | } |
| 52 | } |
| 53 | } |
| 54 | ``` |
| 55 | |
| 56 | The basic idea is that the first one to push onto the combiner |
| 57 | executes the work and then keeps executing functions from the queue |
| 58 | until the combiner is drained. |
| 59 | |
| 60 | Our combiner does some additional work, with the motivation of write-batching. |
| 61 | |
| 62 | We have a second tier of `run` called `run_finally`. Anything queued |
| 63 | onto `run_finally` runs after we have drained the queue. That means |
| 64 | that there is essentially a finally-queue. This is not guaranteed to |
| 65 | be final, but it's best-effort. In the process of running the finally |
| 66 | item, we might put something onto the main combiner queue and so we'll |
| 67 | need to re-enter. |
| 68 | |
| 69 | `chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write. |
| 70 | |
| 71 | ``` |
| 72 | class combiner { |
| 73 | mpscq q; // multi-producer single-consumer queue can be made non-blocking |
| 74 | state s; // is it empty or executing |
| 75 | queue finally; // you can only do run_finally when you are already running something from the combiner |
| 76 | |
| 77 | run(f) { |
| 78 | if (q.push(f)) { |
| 79 | // q.push returns true if it's the first thing |
| 80 | loop: |
| 81 | while (q.pop(&f)) { // modulo some extra work to avoid races |
| 82 | f(); |
| 83 | } |
| 84 | while (finally.pop(&f)) { |
| 85 | f(); |
| 86 | } |
| 87 | goto loop; |
| 88 | } |
| 89 | } |
| 90 | } |
| 91 | ``` |
| 92 | |
| 93 | So that explains how combiners work in general. In gRPC, there is |
| 94 | `start_batch(..., tag)` and then work only gets activated by somebody |
Vijay Pai | 18de3d72 | 2017-03-17 15:28:13 -0700 | [diff] [blame] | 95 | calling `cq::next` which returns a tag. This gives an API-level |
Vijay Pai | 14ea772 | 2017-03-17 10:29:20 -0700 | [diff] [blame] | 96 | guarantee that there will be a thread doing polling to actually make |
| 97 | work happen. However, some operations are not covered by a poller |
| 98 | thread, such as cancellation that doesn't have a completion. Other |
| 99 | callbacks that don't have a completion are the internal work that gets |
| 100 | done before the batch gets completed. We need a condition called |
| 101 | `covered_by_poller` that means that the item will definitely need some |
| 102 | thread at some point to call `cq::next` . This includes those |
| 103 | callbacks that directly cause a completion but also those that are |
| 104 | indirectly required before getting a completion. If we can't tell for |
| 105 | sure for a specific path, we have to assumed it is not covered by |
| 106 | poller. |
| 107 | |
| 108 | The above combiner has the problem that it keeps draining for a |
| 109 | potentially infinite amount of time and that can lead to a huge tail |
| 110 | latency for some operations. So we can tweak it by returning to the application |
| 111 | if we know that it is valid to do so: |
| 112 | |
| 113 | ``` |
| 114 | while (q.pop(&f)) { |
| 115 | f(); |
| 116 | if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) { |
| 117 | offload_combiner_work_to_some_other_thread(); |
| 118 | } |
| 119 | } |
| 120 | ``` |
| 121 | |
| 122 | `offload` is more than `break`; it does `break` but also causes some |
| 123 | other thread that is currently waiting on a poll to break out of its |
| 124 | poll. This is done by setting up a per-polling-island work-queue |
| 125 | (distributor) wakeup FD. The work-queue is the converse of the combiner; it |
| 126 | tries to spray events onto as many threads as possible to get as much concurrency as possible. |
| 127 | |
| 128 | So `offload` really does: |
| 129 | |
| 130 | ``` |
Vijay Pai | 18de3d72 | 2017-03-17 15:28:13 -0700 | [diff] [blame] | 131 | workqueue.run(continue_from_while_loop); |
Vijay Pai | 14ea772 | 2017-03-17 10:29:20 -0700 | [diff] [blame] | 132 | break; |
| 133 | ``` |
| 134 | |
Vijay Pai | 18de3d72 | 2017-03-17 15:28:13 -0700 | [diff] [blame] | 135 | This needs us to add another class variable for a `workqueue` |
| 136 | (which is really conceptually a distributor). |
Vijay Pai | 14ea772 | 2017-03-17 10:29:20 -0700 | [diff] [blame] | 137 | |
| 138 | ``` |
| 139 | workqueue::run(f) { |
| 140 | q.push(f) |
| 141 | eventfd.wakeup() |
| 142 | } |
| 143 | |
| 144 | workqueue::readable() { |
| 145 | eventfd.consume(); |
| 146 | q.pop(&f); |
| 147 | f(); |
| 148 | if (!q.empty()) { |
| 149 | eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue |
| 150 | } |
| 151 | } |
| 152 | ``` |
| 153 | |
| 154 | In principle, `run_finally` could get starved, but this hasn't |
| 155 | happened in practice. If we were concerned about this, we could put a |
| 156 | limit on how many things come off the regular `q` before the `finally` |
| 157 | queue gets processed. |
| 158 | |