Android async and non-blocking API guidelines

go/android-api-guidelines-async

Non-blocking APIs request work to happen and then yield control back to the calling thread so that it can perform other work before the completion of the requested operation. They are useful for cases where the requested work might be long-running or may require waiting for I/O, IPC, highly contended system resources to become available, or even user input before work can proceed. Especially well-behaved APIs will provide a way to cancel the operation in progress and stop work from being performed on the original caller's behalf, preserving system health and battery life when the operation is no longer needed.

Asynchronous APIs are one way of achieving non-blocking behavior. Async APIs accept some form of continuation or callback that will be notified when the operation is complete, or of other events during the operation's progress.

There are two primary motivations for writing an asynchronous API:

Executing multiple operations concurrently, where an Nth operation must be initiated before the N-1th operation completes
Avoiding blocking a calling thread until an operation is complete

Kotlin strongly promotes structured concurrency, a series of principles and APIs built on suspend functions that decouple synchronous/asynchronous execution of code from thread-blocking behavior. Suspend functions are non-blocking and synchronous.

Suspend functions:

Do not block their calling thread and instead yield their execution thread under the hood while awaiting the results of operations executing elsewhere
Execute synchronously and do not require the caller of a non-blocking API to continue executing concurrently with non-blocking work initiated by the API call.

This document details a minimum baseline of expectations developers may safely hold when working with non-blocking and asynchronous APIs, followed by a series of recipes for authoring APIs that meet these expectations in the Kotlin or in Java languages, in the Android platform or Jetpack libraries. When in doubt, consider the developer expectations as requirements for any new API surface.

Developer expectations for async APIs

The following expectations are written from the standpoint of non-suspend APIs unless otherwise noted.

APIs that accept callbacks are usually asynchronous

If an API accepts a callback that is not documented to only ever be called in-place, (that is, called only by the calling thread before the API call itself returns,) the API is assumed to be asynchronous and that API should meet all other expectations documented below.

An example of a callback that is only ever called in-place is a higher-order map or filter function that invokes a mapper or predicate on each item in a collection before returning.

Asynchronous APIs should return as quickly as possible

Developers expect async APIs to be non-blocking and return quickly after initiating the request for the operation. It should always be safe to call an async API at any time, and calling an async API should never result in janky frames or ANR.

Many operations and lifecycle signals can be triggered by the platform or libraries on-demand, and expecting a developer to hold global knowledge of all potential call sites for their code is unsustainable. For example, a Fragment can be added to the FragmentManager in a synchronous transaction in response to View measurement and layout when app content must be populated to fill available space. (e.g. RecyclerView.) A LifecycleObserver responding to this fragment's onStart lifecycle callback may reasonably perform one-time startup operations here, and this may be on a critical code path for producing a frame of animation free of jank. A developer should always feel confident that calling any async API in response to these kinds of lifecycle callbacks will not be the cause of a janky frame.

This implies that the work performed by an async API before returning must be very lightweight; creating a record of the request and associated callback and registering it with the execution engine that will perform the work at most. If registering for an async operation requires IPC, the API's implementation should take whatever measures are necessary to meet this developer expectation. This may include one or more of:

Implementing an underlying IPC as a oneway binder call
Making a two-way binder call into the system server where completing the registration does not require taking a highly contended lock
Posting the request to a worker thread in the app process to perform a blocking registration over IPC

Asynchronous APIs should return void and only throw for invalid arguments

Async APIs should report all results of the requested operation to the provided callback. This allows the developer to implement a single code path for success and error handling.

Async APIs may check arguments for null and throw NullPointerException, or check that provided arguments are within a valid range and throw IllegalArgumentException. e.g. for a function that accepts a float in the range of 0-1f, the function may check that the parameter is within this range and throw IllegalArgumentException if it is out of range, or a short String may be checked for conformance to a valid format such as alphanumerics-only. (Remember that the system server should never trust the app process! Any system service should duplicate these checks in the system service itself.)

All other errors should be reported to the provided callback. This includes, but is not limited to:

Terminal failure of the requested operation
Security exceptions for missing authorization/permissions required to complete the operation
Exceeded quota for performing the operation
App process is not sufficiently “foreground” to perform the operation
Required hardware has been disconnected
Network failures
Timeouts
Binder death/unavailable remote process

Asynchronous APIs should provide a cancellation mechanism

Async APIs should provide a way to indicate to a running operation that the caller no longer cares about the result. This cancel operation should signal two things:

Hard references to callbacks provided by the caller should be released

Callbacks provided to async APIs may contain hard references to large object graphs, and ongoing work holding a hard reference to that callback can keep those object graphs from being garbage collected. By releasing these callback references on cancellation, these object graphs may become eligible for garbage collection much sooner than if the work were permitted to run to completion.

The execution engine performing work for the caller may stop that work

Work initiated by async API calls may carry a high cost in power consumption or other system resources. APIs that allow callers to signal when this work is no longer needed permit stopping that work before it can consume further system resources.

Developer expectations for suspending APIs

Developers familiar with Kotlin's structured concurrency expect the following behaviors from any suspending API:

Suspend functions should complete all associated work before returning or throwing

Results of non-blocking operations are returned as normal function return values, and errors are reported by throwing exceptions. (This often means that callback parameters are unnecessary.)

Suspend functions should only invoke callback parameters in-place

Since suspend functions should always complete all associated work before returning, they should never invoke a provided callback or other function parameter or retain a reference to it after the suspend function has returned.

Suspend functions that accept callback parameters should be context-preserving unless otherwise documented

Calling a function in a suspend function causes it to run in the CoroutineContext of the caller. As suspend functions should complete all associated work before returning or throwing, and should only invoke callback parameters in-place, the default expectation is that any such callbacks are also run on the calling CoroutineContext using its associated dispatcher. If the API's purpose is to run a callback outside of the calling CoroutineContext, this behavior should be clearly documented.

Suspend functions should support kotlinx.coroutines Job cancellation

Any suspend function offered should cooperate with job cancellation as defined by kotlinx.coroutines. If the calling Job of an operation in progress is cancelled, the function should resume with a CancellationException as soon as possible so that the caller can clean up and continue as soon as possible. This is handled automatically by suspendCancellableCoroutine and other suspending APIs offered by kotlinx.coroutines. Library implementations generally should not use suspendCoroutine directly, as it does not support this cancellation behavior by default.

Suspend functions that perform blocking work on a background (non-main or UI thread) must provide a way to configure the dispatcher used

It is not recommended to make a blocking function suspend entirely to switch threads. For more information see Android API guidelines.

Calling a suspend function should not result in the creation of additional threads without permitting the developer to supply their own thread or thread pool to perform that work. For example, a constructor may accept a CoroutineContext that will be used to perform background work for the class's methods.

Suspend functions that would accept an optional CoroutineContext or Dispatcher parameter only to switch to that dispatcher to perform blocking work should instead expose the underlying blocking function and recommend that calling developers use their own call to withContext to direct the work to a desired dispatcher.

Classes launching coroutines

Classes that launch coroutines must have a CoroutineScope to perform those launch operations. Respecting structured concurrency principles implies the following structural patterns for obtaining and managing that scope.

Before writing a class that launches concurrent tasks into another scope, consider alternative patterns:

class MyClass {
    private val requests = Channel<MyRequest>(Channel.UNLIMITED)

    suspend fun handleRequests() {
        coroutineScope {
            for (request in requests) {
                // Allow requests to be processed concurrently;
                // alternatively, omit the [launch] and outer [coroutineScope]
                // to process requests serially
                launch {
                    processRequest(request)
                }
            }
        }
    }

    fun submitRequest(request: MyRequest) {
        requests.trySend(request).getOrThrow()
    }
}

Exposing a suspend fun to perform concurrent work allows the caller to invoke the operation in their own context, removing the need to have MyClass manage a CoroutineScope. Serializing the processing of requests becomes simpler and state can often exist as local variables of handleRequests instead of as class properties that would otherwise require additional synchronization.

Classes that manage coroutines should expose a `close()` and/or `cancel()` method

Classes that launch coroutines as implementation details must offer a way to cleanly shut down those ongoing concurrent tasks so that they do not leak uncontrolled concurrent work into a parent scope. Typically this takes the form of creating a child Job of a provided CoroutineContext:

private val myJob = Job(parent = coroutineContext[Job])
private val myScope = CoroutineScope(coroutineContext + myJob)

fun cancel() {
    myJob.cancel()
}

A join() method may also be provided to allow user code to await the completion of any outstanding concurrent work being performed by the object. (This may include cleanup work performed by cancelling an operation.)

suspend fun join() {
    myJob.join()
}

Naming terminal operations

The name used for methods that cleanly shut down concurrent tasks owned by an object that are still in progress should reflect the behavioral contract of how shutdown will occur:

Use close() when operations in progress will be allowed to complete but no new operations may begin after the call to close() returns.

Use cancel() when operations in progress may be cancelled before completing. No new operations may begin after the call to cancel() returns.

Class constructors accept `CoroutineContext`, not `CoroutineScope`

When objects are forbidden from launching directly into a provided parent scope, the suitability of CoroutineScope as a constructor parameter breaks down:

// Don't do this
class MyClass(scope: CoroutineScope) {
    private val myJob = Job(parent = scope.coroutineContext[Job])
    private val myScope = CoroutineScope(scope.coroutineContext + myJob)

    // ... the [scope] constructor parameter is never used again
}

The CoroutineScope becomes an unnecessary and misleading wrapper that in some use cases may be constructed solely to pass as a constructor parameter, only to be discarded:

// Don't do this; just pass the context
val myObject = MyClass(CoroutineScope(parentScope.coroutineContext + Dispatchers.IO))

`CoroutineContext` parameters default to `EmptyCoroutineContext`

When an optional CoroutineContext parameter appears in an API surface the default value must be the EmptyCoroutineContext sentinel. This allows for better composition of API behaviors, as an EmptyCoroutineContext value from a caller is treated in the same way as accepting the default:

class MyOuterClass(
    coroutineContext: CoroutineContext = EmptyCoroutineContext
) {
    private val innerObject = MyInnerClass(coroutineContext)

    // ...
}

class MyInnerClass(
    coroutineContext: CoroutineContext = EmptyCoroutineContext
) {
    private val job = Job(parent = coroutineContext[Job])
    private val scope = CoroutineScope(coroutineContext + job)

    // ...
}