| completions - wait for completion handling | 
 | ========================================== | 
 |  | 
 | This document was originally written based on 3.18.0 (linux-next) | 
 |  | 
 | Introduction: | 
 | ------------- | 
 |  | 
 | If you have one or more threads of execution that must wait for some process | 
 | to have reached a point or a specific state, completions can provide a | 
 | race-free solution to this problem. Semantically they are somewhat like a | 
 | pthread_barrier and have similar use-cases. | 
 |  | 
 | Completions are a code synchronization mechanism which is preferable to any | 
 | misuse of locks. Any time you think of using yield() or some quirky | 
 | msleep(1) loop to allow something else to proceed, you probably want to | 
 | look into using one of the wait_for_completion*() calls instead. The | 
 | advantage of using completions is clear intent of the code, but also more | 
 | efficient code as both threads can continue until the result is actually | 
 | needed. | 
 |  | 
 | Completions are built on top of the generic event infrastructure in Linux, | 
 | with the event reduced to a simple flag (appropriately called "done") in | 
 | struct completion that tells the waiting threads of execution if they | 
 | can continue safely. | 
 |  | 
 | As completions are scheduling related, the code is found in | 
 | kernel/sched/completion.c - for details on completion design and | 
 | implementation see completions-design.txt | 
 |  | 
 |  | 
 | Usage: | 
 | ------ | 
 |  | 
 | There are three parts to using completions, the initialization of the | 
 | struct completion, the waiting part through a call to one of the variants of | 
 | wait_for_completion() and the signaling side through a call to complete() | 
 | or complete_all(). Further there are some helper functions for checking the | 
 | state of completions. | 
 |  | 
 | To use completions one needs to include <linux/completion.h> and | 
 | create a variable of type struct completion. The structure used for | 
 | handling of completions is: | 
 |  | 
 | 	struct completion { | 
 | 		unsigned int done; | 
 | 		wait_queue_head_t wait; | 
 | 	}; | 
 |  | 
 | providing the wait queue to place tasks on for waiting and the flag for | 
 | indicating the state of affairs. | 
 |  | 
 | Completions should be named to convey the intent of the waiter. A good | 
 | example is: | 
 |  | 
 | 	wait_for_completion(&early_console_added); | 
 |  | 
 | 	complete(&early_console_added); | 
 |  | 
 | Good naming (as always) helps code readability. | 
 |  | 
 |  | 
 | Initializing completions: | 
 | ------------------------- | 
 |  | 
 | Initialization of dynamically allocated completions, often embedded in | 
 | other structures, is done with: | 
 |  | 
 | 	void init_completion(&done); | 
 |  | 
 | Initialization is accomplished by initializing the wait queue and setting | 
 | the default state to "not available", that is, "done" is set to 0. | 
 |  | 
 | The re-initialization function, reinit_completion(), simply resets the | 
 | done element to "not available", thus again to 0, without touching the | 
 | wait queue. Calling init_completion() twice on the same completion object is | 
 | most likely a bug as it re-initializes the queue to an empty queue and | 
 | enqueued tasks could get "lost" - use reinit_completion() in that case. | 
 |  | 
 | For static declaration and initialization, macros are available. These are: | 
 |  | 
 | 	static DECLARE_COMPLETION(setup_done) | 
 |  | 
 | used for static declarations in file scope. Within functions the static | 
 | initialization should always use: | 
 |  | 
 | 	DECLARE_COMPLETION_ONSTACK(setup_done) | 
 |  | 
 | suitable for automatic/local variables on the stack and will make lockdep | 
 | happy. Note also that one needs to make *sure* the completion passed to | 
 | work threads remains in-scope, and no references remain to on-stack data | 
 | when the initiating function returns. | 
 |  | 
 | Using on-stack completions for code that calls any of the _timeout or | 
 | _interruptible/_killable variants is not advisable as they will require | 
 | additional synchronization to prevent the on-stack completion object in | 
 | the timeout/signal cases from going out of scope. Consider using dynamically | 
 | allocated completions when intending to use the _interruptible/_killable | 
 | or _timeout variants of wait_for_completion(). | 
 |  | 
 |  | 
 | Waiting for completions: | 
 | ------------------------ | 
 |  | 
 | For a thread of execution to wait for some concurrent work to finish, it | 
 | calls wait_for_completion() on the initialized completion structure. | 
 | A typical usage scenario is: | 
 |  | 
 | 	struct completion setup_done; | 
 | 	init_completion(&setup_done); | 
 | 	initialize_work(...,&setup_done,...) | 
 |  | 
 | 	/* run non-dependent code */              /* do setup */ | 
 |  | 
 | 	wait_for_completion(&setup_done);         complete(setup_done) | 
 |  | 
 | This is not implying any temporal order on wait_for_completion() and the | 
 | call to complete() - if the call to complete() happened before the call | 
 | to wait_for_completion() then the waiting side simply will continue | 
 | immediately as all dependencies are satisfied if not it will block until | 
 | completion is signaled by complete(). | 
 |  | 
 | Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), | 
 | so it can only be called safely when you know that interrupts are enabled. | 
 | Calling it from hard-irq or irqs-off atomic contexts will result in | 
 | hard-to-detect spurious enabling of interrupts. | 
 |  | 
 | wait_for_completion(): | 
 |  | 
 | 	void wait_for_completion(struct completion *done): | 
 |  | 
 | The default behavior is to wait without a timeout and to mark the task as | 
 | uninterruptible. wait_for_completion() and its variants are only safe | 
 | in process context (as they can sleep) but not in atomic context, | 
 | interrupt context, with disabled irqs. or preemption is disabled - see also | 
 | try_wait_for_completion() below for handling completion in atomic/interrupt | 
 | context. | 
 |  | 
 | As all variants of wait_for_completion() can (obviously) block for a long | 
 | time, you probably don't want to call this with held mutexes. | 
 |  | 
 |  | 
 | Variants available: | 
 | ------------------- | 
 |  | 
 | The below variants all return status and this status should be checked in | 
 | most(/all) cases - in cases where the status is deliberately not checked you | 
 | probably want to make a note explaining this (e.g. see | 
 | arch/arm/kernel/smp.c:__cpu_up()). | 
 |  | 
 | A common problem that occurs is to have unclean assignment of return types, | 
 | so care should be taken with assigning return-values to variables of proper | 
 | type. Checking for the specific meaning of return values also has been found | 
 | to be quite inaccurate e.g. constructs like | 
 | if (!wait_for_completion_interruptible_timeout(...)) would execute the same | 
 | code path for successful completion and for the interrupted case - which is | 
 | probably not what you want. | 
 |  | 
 | 	int wait_for_completion_interruptible(struct completion *done) | 
 |  | 
 | This function marks the task TASK_INTERRUPTIBLE. If a signal was received | 
 | while waiting it will return -ERESTARTSYS; 0 otherwise. | 
 |  | 
 | 	unsigned long wait_for_completion_timeout(struct completion *done, | 
 | 		unsigned long timeout) | 
 |  | 
 | The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' | 
 | (in jiffies). If timeout occurs it returns 0 else the remaining time in | 
 | jiffies (but at least 1). Timeouts are preferably calculated with | 
 | msecs_to_jiffies() or usecs_to_jiffies(). If the returned timeout value is | 
 | deliberately ignored a comment should probably explain why (e.g. see | 
 | drivers/mfd/wm8350-core.c wm8350_read_auxadc()) | 
 |  | 
 | 	long wait_for_completion_interruptible_timeout( | 
 | 		struct completion *done, unsigned long timeout) | 
 |  | 
 | This function passes a timeout in jiffies and marks the task as | 
 | TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; | 
 | otherwise it returns 0 if the completion timed out or the remaining time in | 
 | jiffies if completion occurred. | 
 |  | 
 | Further variants include _killable which uses TASK_KILLABLE as the | 
 | designated tasks state and will return -ERESTARTSYS if it is interrupted or | 
 | else 0 if completion was achieved.  There is a _timeout variant as well: | 
 |  | 
 | 	long wait_for_completion_killable(struct completion *done) | 
 | 	long wait_for_completion_killable_timeout(struct completion *done, | 
 | 		unsigned long timeout) | 
 |  | 
 | The _io variants wait_for_completion_io() behave the same as the non-_io | 
 | variants, except for accounting waiting time as waiting on IO, which has | 
 | an impact on how the task is accounted in scheduling stats. | 
 |  | 
 | 	void wait_for_completion_io(struct completion *done) | 
 | 	unsigned long wait_for_completion_io_timeout(struct completion *done | 
 | 		unsigned long timeout) | 
 |  | 
 |  | 
 | Signaling completions: | 
 | ---------------------- | 
 |  | 
 | A thread that wants to signal that the conditions for continuation have been | 
 | achieved calls complete() to signal exactly one of the waiters that it can | 
 | continue. | 
 |  | 
 | 	void complete(struct completion *done) | 
 |  | 
 | or calls complete_all() to signal all current and future waiters. | 
 |  | 
 | 	void complete_all(struct completion *done) | 
 |  | 
 | The signaling will work as expected even if completions are signaled before | 
 | a thread starts waiting. This is achieved by the waiter "consuming" | 
 | (decrementing) the done element of struct completion. Waiting threads | 
 | wakeup order is the same in which they were enqueued (FIFO order). | 
 |  | 
 | If complete() is called multiple times then this will allow for that number | 
 | of waiters to continue - each call to complete() will simply increment the | 
 | done element. Calling complete_all() multiple times is a bug though. Both | 
 | complete() and complete_all() can be called in hard-irq/atomic context safely. | 
 |  | 
 | There only can be one thread calling complete() or complete_all() on a | 
 | particular struct completion at any time - serialized through the wait | 
 | queue spinlock. Any such concurrent calls to complete() or complete_all() | 
 | probably are a design bug. | 
 |  | 
 | Signaling completion from hard-irq context is fine as it will appropriately | 
 | lock with spin_lock_irqsave/spin_unlock_irqrestore and it will never sleep. | 
 |  | 
 |  | 
 | try_wait_for_completion()/completion_done(): | 
 | -------------------------------------------- | 
 |  | 
 | The try_wait_for_completion() function will not put the thread on the wait | 
 | queue but rather returns false if it would need to enqueue (block) the thread, | 
 | else it consumes one posted completion and returns true. | 
 |  | 
 | 	bool try_wait_for_completion(struct completion *done) | 
 |  | 
 | Finally, to check the state of a completion without changing it in any way,  | 
 | call completion_done(), which returns false if there are no posted | 
 | completions that were not yet consumed by waiters (implying that there are | 
 | waiters) and true otherwise; | 
 |  | 
 | 	bool completion_done(struct completion *done) | 
 |  | 
 | Both try_wait_for_completion() and completion_done() are safe to be called in | 
 | hard-irq or atomic context. |