Blame - doc/combiner-explainer.md - platform/external/grpc-grpc

blob: 9e9d077273ec0c2b15eb846e0bee2e188f08b5a8 [file] [log] [blame] [view]

Vijay Pai	14ea772	2017-03-17 10:29:20 -0700	[diff] [blame]	1	# Combiner Explanation
				2	## Talk by ctiller, notes by vjpai
				3
				4	Typical way of doing critical section
				5
				6	```
				7	mu.lock()
				8	do_stuff()
				9	mu.unlock()
				10	```
				11
				12	An alternative way of doing it is
				13
				14	```
				15	class combiner {
				16	run(f) {
				17	mu.lock()
				18	f()
				19	mu.unlock()
				20	}
				21	mutex mu;
				22	}
				23
				24	combiner.run(do_stuff)
				25	```
				26
				27	If you have two threads calling combiner, there will be some kind of
				28	queuing in place. It's called `combiner` because you can pass in more
				29	than one do_stuff at once and they will run under a common `mu`.
				30
				31	The implementation described above has the issue that you're blocking a thread
				32	for a period of time, and this is considered harmful because it's an application thread that you're blocking.
				33
				34	Instead, get a new property:
				35	* Keep things running in serial execution
				36	* Don't ever sleep the thread
				37	* But maybe allow things to end up running on a different thread from where they were started
				38	* This means that `do_stuff` doesn't necessarily run to completion when `combiner.run` is invoked
				39
				40	```
				41	class combiner {
				42	mpscq q; // multi-producer single-consumer queue can be made non-blocking
				43	state s; // is it empty or executing
				44
				45	run(f) {
				46	if (q.push(f)) {
				47	// q.push returns true if it's the first thing
				48	while (q.pop(&f)) { // modulo some extra work to avoid races
				49	f();
				50	}
				51	}
				52	}
				53	}
				54	```
				55
				56	The basic idea is that the first one to push onto the combiner
				57	executes the work and then keeps executing functions from the queue
				58	until the combiner is drained.
				59
				60	Our combiner does some additional work, with the motivation of write-batching.
				61
				62	We have a second tier of `run` called `run_finally`. Anything queued
				63	onto `run_finally` runs after we have drained the queue. That means
				64	that there is essentially a finally-queue. This is not guaranteed to
				65	be final, but it's best-effort. In the process of running the finally
				66	item, we might put something onto the main combiner queue and so we'll
				67	need to re-enter.
				68
				69	`chttp2` runs all ops in the run state except if it sees a write it puts that into a finally. That way anything else that gets put into the combiner can add to that write.
				70
				71	```
				72	class combiner {
				73	mpscq q; // multi-producer single-consumer queue can be made non-blocking
				74	state s; // is it empty or executing
				75	queue finally; // you can only do run_finally when you are already running something from the combiner
				76
				77	run(f) {
				78	if (q.push(f)) {
				79	// q.push returns true if it's the first thing
				80	loop:
				81	while (q.pop(&f)) { // modulo some extra work to avoid races
				82	f();
				83	}
				84	while (finally.pop(&f)) {
				85	f();
				86	}
				87	goto loop;
				88	}
				89	}
				90	}
				91	```
				92
				93	So that explains how combiners work in general. In gRPC, there is
				94	`start_batch(..., tag)` and then work only gets activated by somebody
Vijay Pai	18de3d72	2017-03-17 15:28:13 -0700	[diff] [blame]	95	calling `cq::next` which returns a tag. This gives an API-level
Vijay Pai	14ea772	2017-03-17 10:29:20 -0700	[diff] [blame]	96	guarantee that there will be a thread doing polling to actually make
				97	work happen. However, some operations are not covered by a poller
				98	thread, such as cancellation that doesn't have a completion. Other
				99	callbacks that don't have a completion are the internal work that gets
				100	done before the batch gets completed. We need a condition called
				101	`covered_by_poller` that means that the item will definitely need some
				102	thread at some point to call `cq::next` . This includes those
				103	callbacks that directly cause a completion but also those that are
				104	indirectly required before getting a completion. If we can't tell for
				105	sure for a specific path, we have to assumed it is not covered by
				106	poller.
				107
				108	The above combiner has the problem that it keeps draining for a
				109	potentially infinite amount of time and that can lead to a huge tail
				110	latency for some operations. So we can tweak it by returning to the application
				111	if we know that it is valid to do so:
				112
				113	```
				114	while (q.pop(&f)) {
				115	f();
				116	if (control_can_be_returned && some_still_queued_thing_is_covered_by_poller) {
				117	offload_combiner_work_to_some_other_thread();
				118	}
				119	}
				120	```
				121
				122	`offload` is more than `break`; it does `break` but also causes some
				123	other thread that is currently waiting on a poll to break out of its
				124	poll. This is done by setting up a per-polling-island work-queue
				125	(distributor) wakeup FD. The work-queue is the converse of the combiner; it
				126	tries to spray events onto as many threads as possible to get as much concurrency as possible.
				127
				128	So `offload` really does:
				129
				130	```
Vijay Pai	18de3d72	2017-03-17 15:28:13 -0700	[diff] [blame]	131	workqueue.run(continue_from_while_loop);
Vijay Pai	14ea772	2017-03-17 10:29:20 -0700	[diff] [blame]	132	break;
				133	```
				134
Vijay Pai	18de3d72	2017-03-17 15:28:13 -0700	[diff] [blame]	135	This needs us to add another class variable for a `workqueue`
				136	(which is really conceptually a distributor).
Vijay Pai	14ea772	2017-03-17 10:29:20 -0700	[diff] [blame]	137
				138	```
				139	workqueue::run(f) {
				140	q.push(f)
				141	eventfd.wakeup()
				142	}
				143
				144	workqueue::readable() {
				145	eventfd.consume();
				146	q.pop(&f);
				147	f();
				148	if (!q.empty()) {
				149	eventfd.wakeup(); // spray across as many threads as are waiting on this workqueue
				150	}
				151	}
				152	```
				153
				154	In principle, `run_finally` could get starved, but this hasn't
				155	happened in practice. If we were concerned about this, we could put a
				156	limit on how many things come off the regular `q` before the `finally`
				157	queue gets processed.
				158