| # Android async and non-blocking API guidelines |
| |
| go/android-api-guidelines-async |
| |
| <!--* |
| # Document freshness: For more information, see go/fresh-source. |
| freshness: { owner: 'adamp' reviewed: '2024-02-02' } |
| *--> |
| |
| [TOC] |
| |
| Non-blocking APIs request work to happen and then yield control back to the |
| calling thread so that it can perform other work before the completion of the |
| requested operation. They are useful for cases where the requested work might be |
| long-running or may require waiting for I/O, IPC, highly contended system |
| resources to become available, or even user input before work can proceed. |
| Especially well-behaved APIs will provide a way to *cancel* the operation in |
| progress and stop work from being performed on the original caller's behalf, |
| preserving system health and battery life when the operation is no longer |
| needed. |
| |
| Asynchronous APIs are one way of achieving non-blocking behavior. Async APIs |
| accept some form of continuation or callback that will be notified when the |
| operation is complete, or of other events during the operation's progress. |
| |
| There are two primary motivations for writing an asynchronous API: |
| |
| 1. Executing multiple operations concurrently, where an Nth operation must be |
| initiated before the N-1th operation completes |
| 2. Avoiding blocking a calling thread until an operation is complete |
| |
| Kotlin strongly promotes |
| *[structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)*, |
| a series of principles and APIs built on suspend functions that decouple |
| synchronous/asynchronous execution of code from thread-blocking behavior. |
| Suspend functions are **non-blocking** and **synchronous**. |
| |
| Suspend functions: |
| |
| * Do not block their calling thread and instead yield their execution thread |
| under the hood while awaiting the results of operations executing elsewhere |
| * Execute synchronously and do not require the caller of a non-blocking API to |
| continue executing concurrently with non-blocking work initiated by the API |
| call. |
| |
| This document details a minimum baseline of expectations developers may safely |
| hold when working with non-blocking and asynchronous APIs, followed by a series |
| of recipes for authoring APIs that meet these expectations in the Kotlin or in |
| Java languages, in the Android platform or Jetpack libraries. When in doubt, |
| consider the developer expectations as requirements for any new API surface. |
| |
| ## Developer expectations for async APIs |
| |
| The following expectations are written from the standpoint of non-`suspend` APIs |
| unless otherwise noted. |
| |
| ### APIs that accept callbacks are usually asynchronous |
| |
| If an API accepts a callback that is not documented to only ever be called |
| *in-place*, (that is, called only by the calling thread before the API call |
| itself returns,) the API is assumed to be asynchronous and that API should meet |
| all other expectations documented below. |
| |
| An example of a callback that is only ever called in-place is a higher-order map |
| or filter function that invokes a mapper or predicate on each item in a |
| collection before returning. |
| |
| ### Asynchronous APIs should return as quickly as possible |
| |
| Developers expect async APIs to be *non-blocking* and return quickly after |
| initiating the request for the operation. It should always be safe to call an |
| async API at any time, and calling an async API should never result in janky |
| frames or ANR. |
| |
| Many operations and lifecycle signals can be triggered by the platform or |
| libraries on-demand, and expecting a developer to hold global knowledge of all |
| potential call sites for their code is unsustainable. For example, a `Fragment` |
| can be added to the `FragmentManager` in a synchronous transaction in response |
| to `View` measurement and layout when app content must be populated to fill |
| available space. (e.g. `RecyclerView`.) A `LifecycleObserver` responding to this |
| fragment's `onStart` lifecycle callback may reasonably perform one-time startup |
| operations here, and this may be on a critical code path for producing a frame |
| of animation free of jank. A developer should always feel confident that calling |
| **any** async API in response to these kinds of lifecycle callbacks will not be |
| the cause of a janky frame. |
| |
| This implies that the work performed by an async API before returning must be |
| very lightweight; creating a record of the request and associated callback and |
| registering it with the execution engine that will perform the work at most. If |
| registering for an async operation requires IPC, the API's implementation should |
| take whatever measures are necessary to meet this developer expectation. This |
| may include one or more of: |
| |
| * Implementing an underlying IPC as a oneway binder call |
| * Making a two-way binder call into the system server where completing the |
| registration does not require taking a highly contended lock |
| * Posting the request to a worker thread in the app process to perform a |
| blocking registration over IPC |
| |
| ### Asynchronous APIs should return void and only throw for invalid arguments |
| |
| Async APIs should report all results of the requested operation to the provided |
| callback. This allows the developer to implement a single code path for success |
| and error handling. |
| |
| Async APIs *may* check arguments for null and throw `NullPointerException`, or |
| check that provided arguments are within a valid range and throw |
| `IllegalArgumentException`. e.g. for a function that accepts a `float` in the |
| range of `0`-`1f`, the function may check that the parameter is within this |
| range and throw `IllegalArgumentException` if it is out of range, or a short |
| `String` may be checked for conformance to a valid format such as |
| alphanumerics-only. (Remember that the system server should never trust the app |
| process! Any system service should duplicate these checks in the system service |
| itself.) |
| |
| **All other errors should be reported to the provided callback.** This includes, |
| but is not limited to: |
| |
| * Terminal failure of the requested operation |
| * Security exceptions for missing authorization/permissions required to |
| complete the operation |
| * Exceeded quota for performing the operation |
| * App process is not sufficiently "foreground" to perform the operation |
| * Required hardware has been disconnected |
| * Network failures |
| * Timeouts |
| * Binder death/unavailable remote process |
| |
| ### Asynchronous APIs should provide a cancellation mechanism |
| |
| Async APIs should provide a way to indicate to a running operation that the |
| caller no longer cares about the result. This cancel operation should signal two |
| things: |
| |
| #### Hard references to callbacks provided by the caller should be released |
| |
| Callbacks provided to async APIs may contain hard references to large object |
| graphs, and ongoing work holding a hard reference to that callback can keep |
| those object graphs from being garbage collected. By releasing these callback |
| references on cancellation, these object graphs may become eligible for garbage |
| collection much sooner than if the work were permitted to run to completion. |
| |
| #### The execution engine performing work for the caller may stop that work |
| |
| Work initiated by async API calls may carry a high cost in power consumption or |
| other system resources. APIs that allow callers to signal when this work is no |
| longer needed permit stopping that work before it can consume further system |
| resources. |
| |
| ### Special considerations for Cached or Frozen apps |
| |
| When designing asynchronous APIs where callbacks originate in a system process |
| and are delivered to apps, consider the following: |
| |
| 1. [Processes and app lifecycle](https://developer.android.com/guide/components/activities/process-lifecycle): |
| the recipient app process may be in the cached state. |
| 2. [Cached Apps Freezer](https://source.android.com/docs/core/perf/cached-apps-freezer): |
| the recipient app process may be frozen. |
| |
| When an app process enters the cached state, this means that it's not currently |
| hosting any user-visible components such as Activities and Services. The app is |
| kept in memory in case it becomes user-visible again, but in the meantime should |
| not be doing work. In most cases, you should pause dispatching app callbacks |
| when that app enters the cached state and resume when the app exits the cached |
| state, so as to not induce work in cached app processes. |
| |
| A cached app may also be frozen. When an app is frozen, it receives zero CPU |
| time and is not able to do any work at all. Any calls to that app's registered |
| callbacks will be buffered and delivered when the app is unfrozen. |
| |
| Buffered transactions to app callbacks may be stale by the time that the app is |
| unfrozen and processes them. The buffer is finite, and if overflown would cause |
| the recipient app to crash. To avoid overwhelming apps with stale events or |
| overflowing their buffers, don't dispatch app callbacks while their process is |
| frozen. |
| |
| In review: |
| |
| * You should *consider* pausing dispatching app callbacks while the app's |
| process is cached. |
| * You *MUST* pause dispatching app callbacks while the app's process is |
| frozen. |
| |
| #### Registering for all states |
| |
| To track when apps enters or exit the cached state: |
| |
| ```java |
| mActivityManager.addOnUidImportanceListener( |
| new UidImportanceListener() { ... }, |
| IMPORTANCE_CACHED); |
| ``` |
| |
| For example, see |
| [ag/20754479 Defer sending display events to cached apps](https://googleplex-android-review.git.corp.google.com/c/platform/frameworks/base/+/20754479). |
| |
| To track when apps are frozen or unfrozen: |
| |
| ```java |
| ActivityManager.registerUidFrozenStateChangedCallback(executor, callback); |
| ``` |
| |
| <!-- TODO(shayba): add an example change once such a change exists. --> |
| <!-- TODO(shayba): replace with per-pid APIs after we've added them. --> |
| |
| #### Strategies for resuming dispatching app callbacks |
| |
| Whether you pause dispatching app callbacks when the app enters the cached state |
| or the frozen state, when the app exits the respective state you should resume |
| dispatching the app's registered callbacks once the app exits the respective |
| state until the app has unregistered its callback or the app process dies. |
| |
| Apps often save updates they received via callbacks as a snapshot of the latest |
| state. Consider a hypothetical API for apps to monitor the remaining battery |
| percentage: |
| |
| ```java |
| interface BatteryListener { |
| void onBatteryPercentageChanged(int newPercentage); |
| } |
| ``` |
| |
| Apps may cache the last value seen as the current battery percentage remaining. |
| For this reason, resuming dispatching is not enough; you should also immediately |
| notify the app of the current remaining battery percentage so that it can "catch |
| up." |
| |
| In some cases, you may track the last value delivered to the app so the app |
| doesn't need to be notified of the same value once it is unfrozen. |
| |
| State may be expressed as more complex data. Consider a hypothetical API for |
| apps to be notified of network interfaces: |
| |
| ```java |
| interface NetworkListener { |
| void onAvailable(Network network); |
| void onLost(Network network); |
| void onChanged(Network network); |
| } |
| ``` |
| |
| When pausing notifications to an app, you should remember the set of networks |
| and states that the app had last seen. Upon resuming, it's recommended to notify |
| the app of old networks that were lost, of new networks that became available, |
| and of existing networks whose state had changed - in this order. |
| |
| Do not notify the app of networks that were made available and then lost while |
| callbacks were paused. Apps should not receive a full account of events that |
| happened while they were frozen, and API documentation should not promise to |
| deliver event streams uninterrupted outside of explicit lifecycle states. In |
| this example, if the app needs to continuously monitor network availability then |
| it must remain in a lifecycle state that keeps it from becoming cached or |
| frozen. |
| |
| In review, you should coalesce events that had happened after pausing and before |
| resuming notifications and deliver the latest state to the registered app |
| callbacks succinctly. |
| |
| #### Considerations for developer documentation |
| |
| Delivery of async events may be delayed, either because the sender paused |
| delivery for a period of time as shown above or because the recipient app did |
| not receive enough device resources to process the event in a timely way. |
| |
| Discourage developers from making assumptions on the time between when their app |
| is notified of an event and the time that the event actually happened. |
| |
| ## Developer expectations for suspending APIs |
| |
| Developers familiar with Kotlin's structured concurrency expect the following |
| behaviors from any suspending API: |
| |
| ### Suspend functions should complete all associated work before returning or throwing |
| |
| Results of non-blocking operations are returned as normal function return |
| values, and errors are reported by throwing exceptions. (This often means that |
| callback parameters are unnecessary.) |
| |
| ### Suspend functions should only invoke callback parameters in-place |
| |
| Since suspend functions should always complete all associated work before |
| returning, they should never invoke a provided callback or other function |
| parameter or retain a reference to it after the suspend function has returned. |
| |
| ### Suspend functions that accept callback parameters should be context-preserving unless otherwise documented |
| |
| Calling a function in a suspend function causes it to run in the |
| CoroutineContext of the caller. As suspend functions should complete all |
| associated work before returning or throwing, and should only invoke callback |
| parameters in-place, the default expectation is that any such callbacks are |
| *also* run on the calling CoroutineContext using its associated dispatcher. If |
| the API's purpose is to run a callback outside of the calling CoroutineContext, |
| this behavior should be clearly documented. |
| |
| ### Suspend functions should support kotlinx.coroutines Job cancellation |
| |
| Any suspend function offered should cooperate with job cancellation as defined |
| by kotlinx.coroutines. If the calling Job of an operation in progress is |
| cancelled, the function should resume with a CancellationException as soon as |
| possible so that the caller can clean up and continue as soon as possible. This |
| is handled automatically by suspendCancellableCoroutine and other suspending |
| APIs offered by kotlinx.coroutines. Library implementations generally should not |
| use suspendCoroutine directly, as it does not support this cancellation behavior |
| by default. |
| |
| ### Suspend functions that perform blocking work on a background (non-main or UI thread) must provide a way to configure the dispatcher used |
| |
| It is **not recommended** to make a *blocking* function suspend *entirely* to |
| switch threads. For more information see |
| [Android API guidelines](http://go/androidx-api-guidelines#kotlin-2). |
| |
| Calling a suspend function should not result in the creation of additional |
| threads without permitting the developer to supply their own thread or thread |
| pool to perform that work. For example, a constructor may accept a |
| CoroutineContext that will be used to perform background work for the class's |
| methods. |
| |
| Suspend functions that would accept an optional CoroutineContext or Dispatcher |
| parameter only to switch to that dispatcher to perform blocking work should |
| instead expose the underlying blocking function and recommend that calling |
| developers use their own call to withContext to direct the work to a desired |
| dispatcher. |
| |
| ## Classes launching coroutines |
| |
| Classes that launch coroutines must have a `CoroutineScope` to perform those |
| launch operations. Respecting structured concurrency principles implies |
| the following structural patterns for obtaining and managing that scope. |
| |
| Before writing a class that launches concurrent tasks into another scope, |
| consider alternative patterns: |
| |
| ```kotlin |
| class MyClass { |
| private val requests = Channel<MyRequest>(Channel.UNLIMITED) |
| |
| suspend fun handleRequests() { |
| coroutineScope { |
| for (request in requests) { |
| // Allow requests to be processed concurrently; |
| // alternatively, omit the [launch] and outer [coroutineScope] |
| // to process requests serially |
| launch { |
| processRequest(request) |
| } |
| } |
| } |
| } |
| |
| fun submitRequest(request: MyRequest) { |
| requests.trySend(request).getOrThrow() |
| } |
| } |
| ``` |
| |
| Exposing a `suspend fun` to perform concurrent work allows the caller to invoke |
| the operation in their own context, removing the need to have `MyClass` manage |
| a `CoroutineScope`. Serializing the processing of requests becomes simpler |
| and state can often exist as local variables of `handleRequests` instead of as |
| class properties that would otherwise require additional synchronization. |
| |
| ### Classes that manage coroutines should expose a `close()` and/or `cancel()` method |
| |
| Classes that launch coroutines as implementation details must offer a way to |
| cleanly shut down those ongoing concurrent tasks so that they do not leak |
| uncontrolled concurrent work into a parent scope. Typically this takes the form |
| of creating a child `Job` of a provided `CoroutineContext`: |
| |
| ```kotlin |
| private val myJob = Job(parent = coroutineContext[Job]) |
| private val myScope = CoroutineScope(coroutineContext + myJob) |
| |
| fun cancel() { |
| myJob.cancel() |
| } |
| ``` |
| |
| A `join()` method may also be provided to allow user code to await the |
| completion of any outstanding concurrent work being performed by the object. |
| (This may include cleanup work performed by cancelling an operation.) |
| |
| ```kotlin |
| suspend fun join() { |
| myJob.join() |
| } |
| ``` |
| |
| #### Naming terminal operations |
| |
| The name used for methods that cleanly shut down concurrent tasks owned by |
| an object that are still in progress should reflect the behavioral contract |
| of how shutdown will occur: |
| |
| Use `close()` when operations in progress will be allowed to complete but no new |
| operations may begin after the call to `close()` returns. |
| |
| Use `cancel()` when operations in progress may be cancelled before completing. |
| No new operations may begin after the call to `cancel()` returns. |
| |
| ### Class constructors accept `CoroutineContext`, not `CoroutineScope` |
| |
| When objects are forbidden from launching directly into a provided parent scope, |
| the suitability of `CoroutineScope` as a constructor parameter breaks down: |
| |
| ```kotlin |
| // Don't do this |
| class MyClass(scope: CoroutineScope) { |
| private val myJob = Job(parent = scope.coroutineContext[Job]) |
| private val myScope = CoroutineScope(scope.coroutineContext + myJob) |
| |
| // ... the [scope] constructor parameter is never used again |
| } |
| ``` |
| |
| The `CoroutineScope` becomes an unnecessary and misleading wrapper that in some |
| use cases may be constructed solely to pass as a constructor parameter, only |
| to be discarded: |
| |
| ```kotlin |
| // Don't do this; just pass the context |
| val myObject = MyClass(CoroutineScope(parentScope.coroutineContext + Dispatchers.IO)) |
| ``` |
| |
| ### `CoroutineContext` parameters default to `EmptyCoroutineContext` |
| |
| When an optional `CoroutineContext` parameter appears in an API surface the |
| default value must be the `EmptyCoroutineContext` sentinel. This allows for |
| better composition of API behaviors, as an `EmptyCoroutineContext` value from |
| a caller is treated in the same way as accepting the default: |
| |
| ```kotlin |
| class MyOuterClass( |
| coroutineContext: CoroutineContext = EmptyCoroutineContext |
| ) { |
| private val innerObject = MyInnerClass(coroutineContext) |
| |
| // ... |
| } |
| |
| class MyInnerClass( |
| coroutineContext: CoroutineContext = EmptyCoroutineContext |
| ) { |
| private val job = Job(parent = coroutineContext[Job]) |
| private val scope = CoroutineScope(coroutineContext + job) |
| |
| // ... |
| } |
| ``` |