blob: 4ca1c16fcc2b1005415cc969be6b27b7ce22fe71 [file] [log] [blame] [view]
# Android async and non-blocking API guidelines
go/android-api-guidelines-async
<!--*
# Document freshness: For more information, see go/fresh-source.
freshness: { owner: 'adamp' reviewed: '2024-02-02' }
*-->
[TOC]
Non-blocking APIs request work to happen and then yield control back to the
calling thread so that it can perform other work before the completion of the
requested operation. They are useful for cases where the requested work might be
long-running or may require waiting for I/O, IPC, highly contended system
resources to become available, or even user input before work can proceed.
Especially well-behaved APIs will provide a way to *cancel* the operation in
progress and stop work from being performed on the original caller's behalf,
preserving system health and battery life when the operation is no longer
needed.
Asynchronous APIs are one way of achieving non-blocking behavior. Async APIs
accept some form of continuation or callback that will be notified when the
operation is complete, or of other events during the operation's progress.
There are two primary motivations for writing an asynchronous API:
1. Executing multiple operations concurrently, where an Nth operation must be
initiated before the N-1th operation completes
2. Avoiding blocking a calling thread until an operation is complete
Kotlin strongly promotes
*[structured concurrency](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/)*,
a series of principles and APIs built on suspend functions that decouple
synchronous/asynchronous execution of code from thread-blocking behavior.
Suspend functions are **non-blocking** and **synchronous**.
Suspend functions:
* Do not block their calling thread and instead yield their execution thread
under the hood while awaiting the results of operations executing elsewhere
* Execute synchronously and do not require the caller of a non-blocking API to
continue executing concurrently with non-blocking work initiated by the API
call.
This document details a minimum baseline of expectations developers may safely
hold when working with non-blocking and asynchronous APIs, followed by a series
of recipes for authoring APIs that meet these expectations in the Kotlin or in
Java languages, in the Android platform or Jetpack libraries. When in doubt,
consider the developer expectations as requirements for any new API surface.
## Developer expectations for async APIs
The following expectations are written from the standpoint of non-`suspend` APIs
unless otherwise noted.
### APIs that accept callbacks are usually asynchronous
If an API accepts a callback that is not documented to only ever be called
*in-place*, (that is, called only by the calling thread before the API call
itself returns,) the API is assumed to be asynchronous and that API should meet
all other expectations documented below.
An example of a callback that is only ever called in-place is a higher-order map
or filter function that invokes a mapper or predicate on each item in a
collection before returning.
### Asynchronous APIs should return as quickly as possible
Developers expect async APIs to be *non-blocking* and return quickly after
initiating the request for the operation. It should always be safe to call an
async API at any time, and calling an async API should never result in janky
frames or ANR.
Many operations and lifecycle signals can be triggered by the platform or
libraries on-demand, and expecting a developer to hold global knowledge of all
potential call sites for their code is unsustainable. For example, a `Fragment`
can be added to the `FragmentManager` in a synchronous transaction in response
to `View` measurement and layout when app content must be populated to fill
available space. (e.g. `RecyclerView`.) A `LifecycleObserver` responding to this
fragment's `onStart` lifecycle callback may reasonably perform one-time startup
operations here, and this may be on a critical code path for producing a frame
of animation free of jank. A developer should always feel confident that calling
**any** async API in response to these kinds of lifecycle callbacks will not be
the cause of a janky frame.
This implies that the work performed by an async API before returning must be
very lightweight; creating a record of the request and associated callback and
registering it with the execution engine that will perform the work at most. If
registering for an async operation requires IPC, the API's implementation should
take whatever measures are necessary to meet this developer expectation. This
may include one or more of:
* Implementing an underlying IPC as a oneway binder call
* Making a two-way binder call into the system server where completing the
registration does not require taking a highly contended lock
* Posting the request to a worker thread in the app process to perform a
blocking registration over IPC
### Asynchronous APIs should return void and only throw for invalid arguments
Async APIs should report all results of the requested operation to the provided
callback. This allows the developer to implement a single code path for success
and error handling.
Async APIs *may* check arguments for null and throw `NullPointerException`, or
check that provided arguments are within a valid range and throw
`IllegalArgumentException`. e.g. for a function that accepts a `float` in the
range of `0`-`1f`, the function may check that the parameter is within this
range and throw `IllegalArgumentException` if it is out of range, or a short
`String` may be checked for conformance to a valid format such as
alphanumerics-only. (Remember that the system server should never trust the app
process! Any system service should duplicate these checks in the system service
itself.)
**All other errors should be reported to the provided callback.** This includes,
but is not limited to:
* Terminal failure of the requested operation
* Security exceptions for missing authorization/permissions required to
complete the operation
* Exceeded quota for performing the operation
* App process is not sufficiently "foreground" to perform the operation
* Required hardware has been disconnected
* Network failures
* Timeouts
* Binder death/unavailable remote process
### Asynchronous APIs should provide a cancellation mechanism
Async APIs should provide a way to indicate to a running operation that the
caller no longer cares about the result. This cancel operation should signal two
things:
#### Hard references to callbacks provided by the caller should be released
Callbacks provided to async APIs may contain hard references to large object
graphs, and ongoing work holding a hard reference to that callback can keep
those object graphs from being garbage collected. By releasing these callback
references on cancellation, these object graphs may become eligible for garbage
collection much sooner than if the work were permitted to run to completion.
#### The execution engine performing work for the caller may stop that work
Work initiated by async API calls may carry a high cost in power consumption or
other system resources. APIs that allow callers to signal when this work is no
longer needed permit stopping that work before it can consume further system
resources.
### Special considerations for Cached or Frozen apps
When designing asynchronous APIs where callbacks originate in a system process
and are delivered to apps, consider the following:
1. [Processes and app lifecycle](https://developer.android.com/guide/components/activities/process-lifecycle):
the recipient app process may be in the cached state.
2. [Cached Apps Freezer](https://source.android.com/docs/core/perf/cached-apps-freezer):
the recipient app process may be frozen.
When an app process enters the cached state, this means that it's not currently
hosting any user-visible components such as Activities and Services. The app is
kept in memory in case it becomes user-visible again, but in the meantime should
not be doing work. In most cases, you should pause dispatching app callbacks
when that app enters the cached state and resume when the app exits the cached
state, so as to not induce work in cached app processes.
A cached app may also be frozen. When an app is frozen, it receives zero CPU
time and is not able to do any work at all. Any calls to that app's registered
callbacks will be buffered and delivered when the app is unfrozen.
Buffered transactions to app callbacks may be stale by the time that the app is
unfrozen and processes them. The buffer is finite, and if overflown would cause
the recipient app to crash. To avoid overwhelming apps with stale events or
overflowing their buffers, don't dispatch app callbacks while their process is
frozen.
In review:
* You should *consider* pausing dispatching app callbacks while the app's
process is cached.
* You *MUST* pause dispatching app callbacks while the app's process is
frozen.
#### Registering for all states
To track when apps enters or exit the cached state:
```java
mActivityManager.addOnUidImportanceListener(
new UidImportanceListener() { ... },
IMPORTANCE_CACHED);
```
For example, see
[ag/20754479 Defer sending display events to cached apps](https://googleplex-android-review.git.corp.google.com/c/platform/frameworks/base/+/20754479).
To track when apps are frozen or unfrozen:
```java
ActivityManager.registerUidFrozenStateChangedCallback(executor, callback);
```
<!-- TODO(shayba): add an example change once such a change exists. -->
<!-- TODO(shayba): replace with per-pid APIs after we've added them. -->
#### Strategies for resuming dispatching app callbacks
Whether you pause dispatching app callbacks when the app enters the cached state
or the frozen state, when the app exits the respective state you should resume
dispatching the app's registered callbacks once the app exits the respective
state until the app has unregistered its callback or the app process dies.
Apps often save updates they received via callbacks as a snapshot of the latest
state. Consider a hypothetical API for apps to monitor the remaining battery
percentage:
```java
interface BatteryListener {
void onBatteryPercentageChanged(int newPercentage);
}
```
Apps may cache the last value seen as the current battery percentage remaining.
For this reason, resuming dispatching is not enough; you should also immediately
notify the app of the current remaining battery percentage so that it can "catch
up."
In some cases, you may track the last value delivered to the app so the app
doesn't need to be notified of the same value once it is unfrozen.
State may be expressed as more complex data. Consider a hypothetical API for
apps to be notified of network interfaces:
```java
interface NetworkListener {
void onAvailable(Network network);
void onLost(Network network);
void onChanged(Network network);
}
```
When pausing notifications to an app, you should remember the set of networks
and states that the app had last seen. Upon resuming, it's recommended to notify
the app of old networks that were lost, of new networks that became available,
and of existing networks whose state had changed - in this order.
Do not notify the app of networks that were made available and then lost while
callbacks were paused. Apps should not receive a full account of events that
happened while they were frozen, and API documentation should not promise to
deliver event streams uninterrupted outside of explicit lifecycle states. In
this example, if the app needs to continuously monitor network availability then
it must remain in a lifecycle state that keeps it from becoming cached or
frozen.
In review, you should coalesce events that had happened after pausing and before
resuming notifications and deliver the latest state to the registered app
callbacks succinctly.
#### Considerations for developer documentation
Delivery of async events may be delayed, either because the sender paused
delivery for a period of time as shown above or because the recipient app did
not receive enough device resources to process the event in a timely way.
Discourage developers from making assumptions on the time between when their app
is notified of an event and the time that the event actually happened.
## Developer expectations for suspending APIs
Developers familiar with Kotlin's structured concurrency expect the following
behaviors from any suspending API:
### Suspend functions should complete all associated work before returning or throwing
Results of non-blocking operations are returned as normal function return
values, and errors are reported by throwing exceptions. (This often means that
callback parameters are unnecessary.)
### Suspend functions should only invoke callback parameters in-place
Since suspend functions should always complete all associated work before
returning, they should never invoke a provided callback or other function
parameter or retain a reference to it after the suspend function has returned.
### Suspend functions that accept callback parameters should be context-preserving unless otherwise documented
Calling a function in a suspend function causes it to run in the
CoroutineContext of the caller. As suspend functions should complete all
associated work before returning or throwing, and should only invoke callback
parameters in-place, the default expectation is that any such callbacks are
*also* run on the calling CoroutineContext using its associated dispatcher. If
the API's purpose is to run a callback outside of the calling CoroutineContext,
this behavior should be clearly documented.
### Suspend functions should support kotlinx.coroutines Job cancellation
Any suspend function offered should cooperate with job cancellation as defined
by kotlinx.coroutines. If the calling Job of an operation in progress is
cancelled, the function should resume with a CancellationException as soon as
possible so that the caller can clean up and continue as soon as possible. This
is handled automatically by suspendCancellableCoroutine and other suspending
APIs offered by kotlinx.coroutines. Library implementations generally should not
use suspendCoroutine directly, as it does not support this cancellation behavior
by default.
### Suspend functions that perform blocking work on a background (non-main or UI thread) must provide a way to configure the dispatcher used
It is **not recommended** to make a *blocking* function suspend *entirely* to
switch threads. For more information see
[Android API guidelines](http://go/androidx-api-guidelines#kotlin-2).
Calling a suspend function should not result in the creation of additional
threads without permitting the developer to supply their own thread or thread
pool to perform that work. For example, a constructor may accept a
CoroutineContext that will be used to perform background work for the class's
methods.
Suspend functions that would accept an optional CoroutineContext or Dispatcher
parameter only to switch to that dispatcher to perform blocking work should
instead expose the underlying blocking function and recommend that calling
developers use their own call to withContext to direct the work to a desired
dispatcher.
## Classes launching coroutines
Classes that launch coroutines must have a `CoroutineScope` to perform those
launch operations. Respecting structured concurrency principles implies
the following structural patterns for obtaining and managing that scope.
Before writing a class that launches concurrent tasks into another scope,
consider alternative patterns:
```kotlin
class MyClass {
private val requests = Channel<MyRequest>(Channel.UNLIMITED)
suspend fun handleRequests() {
coroutineScope {
for (request in requests) {
// Allow requests to be processed concurrently;
// alternatively, omit the [launch] and outer [coroutineScope]
// to process requests serially
launch {
processRequest(request)
}
}
}
}
fun submitRequest(request: MyRequest) {
requests.trySend(request).getOrThrow()
}
}
```
Exposing a `suspend fun` to perform concurrent work allows the caller to invoke
the operation in their own context, removing the need to have `MyClass` manage
a `CoroutineScope`. Serializing the processing of requests becomes simpler
and state can often exist as local variables of `handleRequests` instead of as
class properties that would otherwise require additional synchronization.
### Classes that manage coroutines should expose a `close()` and/or `cancel()` method
Classes that launch coroutines as implementation details must offer a way to
cleanly shut down those ongoing concurrent tasks so that they do not leak
uncontrolled concurrent work into a parent scope. Typically this takes the form
of creating a child `Job` of a provided `CoroutineContext`:
```kotlin
private val myJob = Job(parent = coroutineContext[Job])
private val myScope = CoroutineScope(coroutineContext + myJob)
fun cancel() {
myJob.cancel()
}
```
A `join()` method may also be provided to allow user code to await the
completion of any outstanding concurrent work being performed by the object.
(This may include cleanup work performed by cancelling an operation.)
```kotlin
suspend fun join() {
myJob.join()
}
```
#### Naming terminal operations
The name used for methods that cleanly shut down concurrent tasks owned by
an object that are still in progress should reflect the behavioral contract
of how shutdown will occur:
Use `close()` when operations in progress will be allowed to complete but no new
operations may begin after the call to `close()` returns.
Use `cancel()` when operations in progress may be cancelled before completing.
No new operations may begin after the call to `cancel()` returns.
### Class constructors accept `CoroutineContext`, not `CoroutineScope`
When objects are forbidden from launching directly into a provided parent scope,
the suitability of `CoroutineScope` as a constructor parameter breaks down:
```kotlin
// Don't do this
class MyClass(scope: CoroutineScope) {
private val myJob = Job(parent = scope.coroutineContext[Job])
private val myScope = CoroutineScope(scope.coroutineContext + myJob)
// ... the [scope] constructor parameter is never used again
}
```
The `CoroutineScope` becomes an unnecessary and misleading wrapper that in some
use cases may be constructed solely to pass as a constructor parameter, only
to be discarded:
```kotlin
// Don't do this; just pass the context
val myObject = MyClass(CoroutineScope(parentScope.coroutineContext + Dispatchers.IO))
```
### `CoroutineContext` parameters default to `EmptyCoroutineContext`
When an optional `CoroutineContext` parameter appears in an API surface the
default value must be the `EmptyCoroutineContext` sentinel. This allows for
better composition of API behaviors, as an `EmptyCoroutineContext` value from
a caller is treated in the same way as accepting the default:
```kotlin
class MyOuterClass(
coroutineContext: CoroutineContext = EmptyCoroutineContext
) {
private val innerObject = MyInnerClass(coroutineContext)
// ...
}
class MyInnerClass(
coroutineContext: CoroutineContext = EmptyCoroutineContext
) {
private val job = Job(parent = coroutineContext[Job])
private val scope = CoroutineScope(coroutineContext + job)
// ...
}
```