Merge "Add API guidelines for caching" into main
diff --git a/api-guidelines/caching.md b/api-guidelines/caching.md
new file mode 100644
index 0000000..b7a3261
--- /dev/null
+++ b/api-guidelines/caching.md
@@ -0,0 +1,350 @@
+# Android API client-side caching guidelines
+
+go/android-api-caching
+
+<!--*
+# Document freshness: For more information, see go/fresh-source.
+freshness: { owner: 'shayba' reviewed: '2024-07-04' }
+*-->
+
+[TOC]
+
+## Motivation
+
+Android API calls typically involve non-negligible latency and computation per
+invocation. Client-side caching is therefore an important consideration in
+designing APIs that are helpful, correct, and performant.
+
+APIs exposed to app developers via the Android SDK are often implemented as
+client code in the Android Framework that makes a Binder IPC call to a system
+service in a platform process, whose job it is to perform some computation and
+return a result to the client. The latency of this operation is typically
+dominated by three factors:
+
+1. IPC overhead: a simple IPC call is typically 10,000x the latency of a simple
+ in-process method call.
+2. Server-side contention: the work done in the system service in response to
+ the client's request may not start immediately, for instance if a server
+ thread is busy handling other requests that arrived earlier.
+3. Server-side computation: the work itself to handle the request in the server
+ may require non-trivial work.
+
+You can eliminate all three of these latency factors by implementing a cache on
+the client side, provided that the cache is:
+
+* Correct: the client-side cache never returns results that would be different
+ than what the server would have returned.
+* Effective: client requests are often served from the cache, i.e. the cache
+ has a high hit rate.
+* Efficient: the client-side cache makes efficient use of client-side
+ resources, such as by representing cached data in a compact way and by not
+ storing too many cached results or stale data in the client's memory.
+
+## Consider caching server results in the client
+
+If clients often make the exact same request multiple times, and the value
+returned doesn't change over time, then you should implement a cache in the
+client library keyed by the request parameters.
+
+Consider using `IpcDataCache` in your implementation:
+
+```java {.good .no-copy}
+public class BirthdayManager {
+ private final IpcDataCache.QueryHandler<User, Birthday> mBirthdayQuery =
+ new IpcDataCache.QueryHandler<User, Birthday>() {
+ @Override
+ public Birthday apply(User user) {
+ return mService.getBirthday(user);
+ }
+ };
+ private static final int BDAY_CACHE_MAX = 8; // Maximum birthdays to cache
+ private static final String BDAY_API = "getUserBirthday";
+ private final IpcDataCache<User, Birthday> mCache
+ new IpcDataCache<User, Birthday>(
+ BDAY_CACHE_MAX, MODULE_SYSTEM, BDAY_API, BDAY_API, mBirthdayQuery);
+
+ /** @hide **/
+ @VisibleForTesting
+ public static void clearCache() {
+ IpcDataCache.invalidateCache(MODULE_SYSTEM, BDAY_API);
+ }
+
+ public Birthday getBirthday(User user) {
+ return mCache.query(user);
+ }
+}
+```
+
+For a complete example see for instance `android.app.admin.DevicePolicyManager`.
+
+`IpcDataCache` is available to all system code, including mainline modules.
+There is also `PropertyInvalidatedCache` which is nearly identical, but is only
+visible to the framework. Prefer `IpcDataCache` when possible.
+
+### Invalidate caches on server-side changes
+
+If the value returned from the server can change over time, implement a callback
+for observing changes, and register a callback so that you may invalidate the
+client-side cache accordingly.
+
+### Invalidate caches between unit test cases
+
+In a unit test suite, you might test the client code against a test double
+rather than the real server. If so, then be sure to clear any client-side caches
+between test cases. This is to keep test cases mutually hermetic, and prevent
+one test case from interfering with another.
+
+```java {.good .no-copy}
+@RunWith(AndroidJUnit4.class)
+public class BirthdayManagerTest {
+
+ @Before
+ public void setUp() {
+ BirthdayManager.clearCache();
+ }
+
+ @After
+ public void tearDown() {
+ BirthdayManager.clearCache();
+ }
+
+ ...
+}
+```
+
+When writing CTS tests that exercise an API client that uses caching internally,
+the cache is an implementation detail that is not exposed to the API author,
+therefore CTS tests should not require any special knowledge of caching used in
+client code.
+
+### Study cache hits and misses
+
+`IpcDataCache` and `PropertyInvalidatedCache` can print live statistics:
+
+```shell {.no-copy}
+adb shell dumpsys cacheinfo
+ ...
+ Cache Name: cache_key.is_compat_change_enabled
+ Property: cache_key.is_compat_change_enabled
+ Hits: 1301458, Misses: 21387, Skips: 0, Clears: 39
+ Skip-corked: 0, Skip-unset: 0, Skip-bypass: 0, Skip-other: 0
+ Nonce: 0x856e911694198091, Invalidates: 72, CorkedInvalidates: 0
+ Current Size: 1254, Max Size: 2048, HW Mark: 2049, Overflows: 310
+ Enabled: true
+ ...
+```
+
+The same stats can also be found in a bugreport.
+
+### Tune the size of the cache
+
+Caches have a maximum size. When the maximum cache size is exceeded, entries are
+evicted in LRU order.
+
+* Caching too few entries could negatively affect the cache hit rate.
+* Caching too many entries increases the cache's memory usage.
+
+Find the right balance for your use case.
+
+## Eliminate redundant client calls
+
+If client code makes the same query to the server multiple times in a short span
+of time, consider reusing the results from previous calls.
+
+```java {.bad .no-copy}
+public void executeAll(List<Operation> operations) throws SecurityException {
+ for (Operation op : operations) {
+ for (Permission permission : op.requiredPermissions()) {
+ if (!permissionChecker.checkPermission(permission, ...)) {
+ throw new SecurityException("Missing permission " + permission);
+ }
+ }
+ op.execute();
+ }
+}
+```
+
+```java {.good .no-copy}
+public void executeAll(List<Operation> operations) throws SecurityException {
+ Set<Permission> permissionsChecked = new HashSet<>();
+ for (Operation op : operations) {
+ for (Permission permission : op.requiredPermissions()) {
+ if (!permissionsChecked.add(permission)) {
+ if (!permissionChecker.checkPermission(permission, ...)) {
+ throw new SecurityException(
+ "Missing permission " + permission);
+ }
+ }
+ }
+ op.execute();
+ }
+}
+```
+
+Example:
+[Caching permission check results in ContentProvider#applyBatch](https://googleplex-android-review.git.corp.google.com/c/platform/frameworks/base/+/28059458)
+
+## Consider client-side memoization of recent server responses
+
+Client apps may query the API at a faster rate than the API's server can produce
+meaningfully new responses. In this case, an effective approach is to memoize
+the last seen server response at the client side along with a timestamp, and to
+return the memoized result without querying the server if the memoized result is
+recent enough. The API client author can determine the memoization duration.
+
+For instance, an app may display network traffic statistics to the user by
+querying for the stats in every frame drawn:
+
+```java {.no-copy}
+@UiThread
+private void setStats() {
+ mobileRxBytesTextView.setText(
+ Long.toString(TrafficStats.getMobileRxBytes()));
+ mobileRxPacketsTextView.setText(
+ Long.toString(TrafficStats.getMobileRxPackages()));
+ mobileTxBytesTextView.setText(
+ Long.toString(TrafficStats.getMobileTxBytes()));
+ mobileTxPacketsTextView.setText(
+ Long.toString(TrafficStats.getMobileTxPackages()));
+}
+```
+
+The app may draw frames at 60Hz. But hypothetically, the client code in
+`TrafficStats` may choose to query the server for stats at most once per second,
+and if queried within a second of a previous query, return the last seen value.
+This is allowed since the API documentation doesn't make any guarantees about
+the freshness of the results returned.
+
+```sequence-diagram
+participant App code as app
+participant Client library as clib
+participant Server as server
+
+app->clib: request @ T=100ms
+clib->server: request
+server->clib: response 1
+clib->app: response 1
+
+app->clib: request @ T=200ms
+clib->app: response 1
+
+app->clib: request @ T=300ms
+clib->app: response 1
+
+app->clib: request @ T=2000ms
+clib->server: request
+server->clib: response 2
+clib->app: response 2
+```
+
+## Consider client-side codegen instead of server queries
+
+If the query results are knowable to the server at build time, then consider if
+they are knowable to the client at build time as well, and consider whether the
+API could be implemented entirely in the client side.
+
+Consider the following app code that checks if the device is a watch (aka the
+device is running Wear OS):
+
+```java {.no-copy}
+public boolean isWatch(Context ctx) {
+ PackageManager pm = ctx.getPackageManager();
+ return pm.hasSystemFeature(PackageManager.FEATURE_WATCH);
+}
+```
+
+This property of the device is known at build time, specifically at the time
+that the Framework was built for this device's boot image. The client-side code
+for `hasSystemFeature` could return a known result immediately, rather than
+querying the remote `PackageManager` system service.
+
+See:
+[Android Platform: Build-time Optimizations for System Features](https://docs.google.com/document/d/1000XMQwhfv6goHcZI3SBwWju4mnHXoYSBX-Vs1mQzXQ/edit?usp=sharing)
+
+## Deduplicate server callbacks in the client
+
+Lastly, the API client may register callbacks with the API server to be notified
+of events.
+
+It's typical for apps to register multiple callbacks for the same underlying
+information. Rather than have the server notify the client once per registered
+callback via IPC, the client library should have one registered callback via IPC
+with the server, and then notify each registered callback in the app.
+
+```dot
+digraph d_front_back {
+ rankdir=RL;
+ node [style=filled, shape="rectangle", fontcolor="white" fontname="Roboto"]
+ server->clib
+ clib->c1;
+ clib->c2;
+ clib->c3;
+
+ subgraph cluster_client {
+ graph [style="dashed", label="Client app process"];
+ c1 [label="my.app.FirstCallback" color="#4285F4"];
+ c2 [label="my.app.SecondCallback" color="#4285F4"];
+ c3 [label="my.app.ThirdCallback" color="#4285F4"];
+ clib [label="android.app.FooManager" color="#F4B400"];
+ }
+
+ subgraph cluster_server {
+ graph [style="dashed", label="Server process"];
+ server [label="com.android.server.FooManagerService" color="#0F9D58"];
+ }
+}
+```
+
+## Review binder telemetry
+
+Data from droidfood and beta population can help you identify caching
+opportunities, or opportunities to improve the efficacy of existing caches
+through tuning.
+
+### Binder call counts and latencies
+
+Look in
+[Pitot Binder metrics](https://pitot-autopush.corp.google.com/u/0/metric_index?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&feature=SAM_TABLES)
+for trace-based metrics on binder call counts and binder call latencies. Binder
+interfaces with a high volume of calls or a high latency per call are good
+candidates for caching, provided that they meet the other properties listed
+above.
+
+You will find this page particularly helpful:
+[distribution of binder calls count on battery](https://pitot-autopush.corp.google.com/u/0/metric_detail/901?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&timeline.hideSessionThreshold=0&feature=SAM_TABLES&appLaunches=2,4,3,5,6,7)
+
+Additionally, you can use the [go/trace-binder-spam] dashboard to find examples
+of binder spam from field traces. Once you found a spammed interface, you can
+open associated traces to find relevant anecdotes that you can study in detail.
+
+
+
+In the screenshot above we see a list of interfaces ordered by highest calls
+count first. You will notice that there are several `IPackageManager`
+interfaces. Some, like `setComponentEnabledSetting`, are not candidates for
+caching because they are mutative. Others, like `hasSystemFeature`, are
+excellent candidates for caching, since they read a property with a clear
+lifetime definition.
+
+### Cache sizes
+
+The
+[Java heap classes](https://pitot-autopush.corp.google.com/u/0/memory/java_heap_classes?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&feature=SAM_TABLES&process=system_server)
+dashboard can be used to find examples where a cache is dominating more memory
+than is expected, and suggest tuning opportunities.
+
+
+
+In the screenshot above we see heap dumps with high amounts of memory retained
+by an inner class of `PropertyInvalidatedCache`. Click on a heap dump to see a
+dominator tree.
+
+
+
+In the screenshot above we see the dominator tree for a particular heap dump.
+Notice that you can see a large cache in `ChangeIdStateCache`, and another in
+`PermissionManager`. Are these sizes expected, or too large?
+
+Recall that you can also inspect the cache hit rate at runtime using `adb shell
+dumpsys cacheinfo`. A large cache doesn't necessarily mean that the hit rate is
+high.
diff --git a/api-guidelines/heap_dominator_tree.png b/api-guidelines/heap_dominator_tree.png
new file mode 100644
index 0000000..0e3bc2d
--- /dev/null
+++ b/api-guidelines/heap_dominator_tree.png
Binary files differ
diff --git a/api-guidelines/pitot_binders_high_calls_count.png b/api-guidelines/pitot_binders_high_calls_count.png
new file mode 100644
index 0000000..37583b8
--- /dev/null
+++ b/api-guidelines/pitot_binders_high_calls_count.png
Binary files differ
diff --git a/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png b/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png
new file mode 100644
index 0000000..aed0836
--- /dev/null
+++ b/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png
Binary files differ