Merge "Add API guidelines for caching" into main

commit: 03dcaa2618a6e933dd2edfbb503e52208b678fcb [log] [tgz]
author: Treehugger Robot <[email protected]> Tue Jul 09 19:52:30 2024 +0000
committer: Gerrit Code Review <[email protected]> Tue Jul 09 19:52:30 2024 +0000
tree: 19437f175b4b126beef4e05a99d23c27229eb356
parent: b9b82d6909fdf626198083e66c8fab2258feae9f [diff]
parent: 87ecd88140b37319201d664d838397740cd4008a [diff]
diff --git a/api-guidelines/caching.md b/api-guidelines/caching.md
new file mode 100644
index 0000000..b7a3261
--- /dev/null
+++ b/api-guidelines/caching.md

@@ -0,0 +1,350 @@
+# Android API client-side caching guidelines
+
+go/android-api-caching
+
+<!--*
+# Document freshness: For more information, see go/fresh-source.
+freshness: { owner: 'shayba' reviewed: '2024-07-04' }
+*-->
+
+[TOC]
+
+## Motivation
+
+Android API calls typically involve non-negligible latency and computation per
+invocation. Client-side caching is therefore an important consideration in
+designing APIs that are helpful, correct, and performant.
+
+APIs exposed to app developers via the Android SDK are often implemented as
+client code in the Android Framework that makes a Binder IPC call to a system
+service in a platform process, whose job it is to perform some computation and
+return a result to the client. The latency of this operation is typically
+dominated by three factors:
+
+1.  IPC overhead: a simple IPC call is typically 10,000x the latency of a simple
+    in-process method call.
+2.  Server-side contention: the work done in the system service in response to
+    the client's request may not start immediately, for instance if a server
+    thread is busy handling other requests that arrived earlier.
+3.  Server-side computation: the work itself to handle the request in the server
+    may require non-trivial work.
+
+You can eliminate all three of these latency factors by implementing a cache on
+the client side, provided that the cache is:
+
+*   Correct: the client-side cache never returns results that would be different
+    than what the server would have returned.
+*   Effective: client requests are often served from the cache, i.e. the cache
+    has a high hit rate.
+*   Efficient: the client-side cache makes efficient use of client-side
+    resources, such as by representing cached data in a compact way and by not
+    storing too many cached results or stale data in the client's memory.
+
+## Consider caching server results in the client
+
+If clients often make the exact same request multiple times, and the value
+returned doesn't change over time, then you should implement a cache in the
+client library keyed by the request parameters.
+
+Consider using `IpcDataCache` in your implementation:
+
+```java {.good .no-copy}
+public class BirthdayManager {
+    private final IpcDataCache.QueryHandler<User, Birthday> mBirthdayQuery =
+            new IpcDataCache.QueryHandler<User, Birthday>() {
+                @Override
+                public Birthday apply(User user) {
+                    return mService.getBirthday(user);
+                }
+            };
+    private static final int BDAY_CACHE_MAX = 8;  // Maximum birthdays to cache
+    private static final String BDAY_API = "getUserBirthday";
+    private final IpcDataCache<User, Birthday> mCache
+            new IpcDataCache<User, Birthday>(
+                BDAY_CACHE_MAX, MODULE_SYSTEM, BDAY_API,  BDAY_API, mBirthdayQuery);
+
+    /** @hide **/
+    @VisibleForTesting
+    public static void clearCache() {
+        IpcDataCache.invalidateCache(MODULE_SYSTEM, BDAY_API);
+    }
+
+    public Birthday getBirthday(User user) {
+        return mCache.query(user);
+    }
+}
+```
+
+For a complete example see for instance `android.app.admin.DevicePolicyManager`.
+
+`IpcDataCache` is available to all system code, including mainline modules.
+There is also `PropertyInvalidatedCache` which is nearly identical, but is only
+visible to the framework. Prefer `IpcDataCache` when possible.
+
+### Invalidate caches on server-side changes
+
+If the value returned from the server can change over time, implement a callback
+for observing changes, and register a callback so that you may invalidate the
+client-side cache accordingly.
+
+### Invalidate caches between unit test cases
+
+In a unit test suite, you might test the client code against a test double
+rather than the real server. If so, then be sure to clear any client-side caches
+between test cases. This is to keep test cases mutually hermetic, and prevent
+one test case from interfering with another.
+
+```java {.good .no-copy}
+@RunWith(AndroidJUnit4.class)
+public class BirthdayManagerTest {
+
+    @Before
+    public void setUp() {
+        BirthdayManager.clearCache();
+    }
+
+    @After
+    public void tearDown() {
+        BirthdayManager.clearCache();
+    }
+
+    ...
+}
+```
+
+When writing CTS tests that exercise an API client that uses caching internally,
+the cache is an implementation detail that is not exposed to the API author,
+therefore CTS tests should not require any special knowledge of caching used in
+client code.
+
+### Study cache hits and misses
+
+`IpcDataCache` and `PropertyInvalidatedCache` can print live statistics:
+
+```shell {.no-copy}
+adb shell dumpsys cacheinfo
+  ...
+  Cache Name: cache_key.is_compat_change_enabled
+    Property: cache_key.is_compat_change_enabled
+    Hits: 1301458, Misses: 21387, Skips: 0, Clears: 39
+    Skip-corked: 0, Skip-unset: 0, Skip-bypass: 0, Skip-other: 0
+    Nonce: 0x856e911694198091, Invalidates: 72, CorkedInvalidates: 0
+    Current Size: 1254, Max Size: 2048, HW Mark: 2049, Overflows: 310
+    Enabled: true
+  ...
+```
+
+The same stats can also be found in a bugreport.
+
+### Tune the size of the cache
+
+Caches have a maximum size. When the maximum cache size is exceeded, entries are
+evicted in LRU order.
+
+*   Caching too few entries could negatively affect the cache hit rate.
+*   Caching too many entries increases the cache's memory usage.
+
+Find the right balance for your use case.
+
+## Eliminate redundant client calls
+
+If client code makes the same query to the server multiple times in a short span
+of time, consider reusing the results from previous calls.
+
+```java {.bad .no-copy}
+public void executeAll(List<Operation> operations) throws SecurityException {
+    for (Operation op : operations) {
+        for (Permission permission : op.requiredPermissions()) {
+            if (!permissionChecker.checkPermission(permission, ...)) {
+                throw new SecurityException("Missing permission " + permission);
+            }
+        }
+        op.execute();
+  }
+}
+```
+
+```java {.good .no-copy}
+public void executeAll(List<Operation> operations) throws SecurityException {
+    Set<Permission> permissionsChecked = new HashSet<>();
+    for (Operation op : operations) {
+        for (Permission permission : op.requiredPermissions()) {
+            if (!permissionsChecked.add(permission)) {
+                if (!permissionChecker.checkPermission(permission, ...)) {
+                    throw new SecurityException(
+                            "Missing permission " + permission);
+                }
+            }
+        }
+        op.execute();
+  }
+}
+```
+
+Example:
+[Caching permission check results in ContentProvider#applyBatch](https://googleplex-android-review.git.corp.google.com/c/platform/frameworks/base/+/28059458)
+
+## Consider client-side memoization of recent server responses
+
+Client apps may query the API at a faster rate than the API's server can produce
+meaningfully new responses. In this case, an effective approach is to memoize
+the last seen server response at the client side along with a timestamp, and to
+return the memoized result without querying the server if the memoized result is
+recent enough. The API client author can determine the memoization duration.
+
+For instance, an app may display network traffic statistics to the user by
+querying for the stats in every frame drawn:
+
+```java {.no-copy}
+@UiThread
+private void setStats() {
+    mobileRxBytesTextView.setText(
+        Long.toString(TrafficStats.getMobileRxBytes()));
+    mobileRxPacketsTextView.setText(
+        Long.toString(TrafficStats.getMobileRxPackages()));
+    mobileTxBytesTextView.setText(
+        Long.toString(TrafficStats.getMobileTxBytes()));
+    mobileTxPacketsTextView.setText(
+        Long.toString(TrafficStats.getMobileTxPackages()));
+}
+```
+
+The app may draw frames at 60Hz. But hypothetically, the client code in
+`TrafficStats` may choose to query the server for stats at most once per second,
+and if queried within a second of a previous query, return the last seen value.
+This is allowed since the API documentation doesn't make any guarantees about
+the freshness of the results returned.
+
+```sequence-diagram
+participant App code as app
+participant Client library as clib
+participant Server as server
+
+app->clib: request @ T=100ms
+clib->server: request
+server->clib: response 1
+clib->app: response 1
+
+app->clib: request @ T=200ms
+clib->app: response 1
+
+app->clib: request @ T=300ms
+clib->app: response 1
+
+app->clib: request @ T=2000ms
+clib->server: request
+server->clib: response 2
+clib->app: response 2
+```
+
+## Consider client-side codegen instead of server queries
+
+If the query results are knowable to the server at build time, then consider if
+they are knowable to the client at build time as well, and consider whether the
+API could be implemented entirely in the client side.
+
+Consider the following app code that checks if the device is a watch (aka the
+device is running Wear OS):
+
+```java {.no-copy}
+public boolean isWatch(Context ctx) {
+    PackageManager pm = ctx.getPackageManager();
+    return pm.hasSystemFeature(PackageManager.FEATURE_WATCH);
+}
+```
+
+This property of the device is known at build time, specifically at the time
+that the Framework was built for this device's boot image. The client-side code
+for `hasSystemFeature` could return a known result immediately, rather than
+querying the remote `PackageManager` system service.
+
+See:
+[Android Platform: Build-time Optimizations for System Features](https://docs.google.com/document/d/1000XMQwhfv6goHcZI3SBwWju4mnHXoYSBX-Vs1mQzXQ/edit?usp=sharing)
+
+## Deduplicate server callbacks in the client
+
+Lastly, the API client may register callbacks with the API server to be notified
+of events.
+
+It's typical for apps to register multiple callbacks for the same underlying
+information. Rather than have the server notify the client once per registered
+callback via IPC, the client library should have one registered callback via IPC
+with the server, and then notify each registered callback in the app.
+
+```dot
+digraph d_front_back {
+  rankdir=RL;
+  node [style=filled, shape="rectangle", fontcolor="white" fontname="Roboto"]
+  server->clib
+  clib->c1;
+  clib->c2;
+  clib->c3;
+
+  subgraph cluster_client {
+    graph [style="dashed", label="Client app process"];
+    c1 [label="my.app.FirstCallback" color="#4285F4"];
+    c2 [label="my.app.SecondCallback" color="#4285F4"];
+    c3 [label="my.app.ThirdCallback" color="#4285F4"];
+    clib [label="android.app.FooManager" color="#F4B400"];
+  }
+
+  subgraph cluster_server {
+    graph [style="dashed", label="Server process"];
+    server [label="com.android.server.FooManagerService" color="#0F9D58"];
+  }
+}
+```
+
+## Review binder telemetry
+
+Data from droidfood and beta population can help you identify caching
+opportunities, or opportunities to improve the efficacy of existing caches
+through tuning.
+
+### Binder call counts and latencies
+
+Look in
+[Pitot Binder metrics](https://pitot-autopush.corp.google.com/u/0/metric_index?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&feature=SAM_TABLES)
+for trace-based metrics on binder call counts and binder call latencies. Binder
+interfaces with a high volume of calls or a high latency per call are good
+candidates for caching, provided that they meet the other properties listed
+above.
+
+You will find this page particularly helpful:
+[distribution of binder calls count on battery](https://pitot-autopush.corp.google.com/u/0/metric_detail/901?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&timeline.hideSessionThreshold=0&feature=SAM_TABLES&appLaunches=2,4,3,5,6,7)
+
+Additionally, you can use the [go/trace-binder-spam] dashboard to find examples
+of binder spam from field traces. Once you found a spammed interface, you can
+open associated traces to find relevant anecdotes that you can study in detail.
+
+![Pitot binder calls count dashboard](pitot_binders_high_calls_count.png)
+
+In the screenshot above we see a list of interfaces ordered by highest calls
+count first. You will notice that there are several `IPackageManager`
+interfaces. Some, like `setComponentEnabledSetting`, are not candidates for
+caching because they are mutative. Others, like `hasSystemFeature`, are
+excellent candidates for caching, since they read a property with a clear
+lifetime definition.
+
+### Cache sizes
+
+The
+[Java heap classes](https://pitot-autopush.corp.google.com/u/0/memory/java_heap_classes?release=dogfood-group:Trunk%20Releases&comparison=target&device_view=standard_next&duration=28&feature=SAM_TABLES&process=system_server)
+dashboard can be used to find examples where a cache is dominating more memory
+than is expected, and suggest tuning opportunities.
+
+![Java heap classes dashboard](pitot_heap_dumps_propertyinvalidatedcache.png)
+
+In the screenshot above we see heap dumps with high amounts of memory retained
+by an inner class of `PropertyInvalidatedCache`. Click on a heap dump to see a
+dominator tree.
+
+![Heap dominator tree](heap_dominator_tree.png)
+
+In the screenshot above we see the dominator tree for a particular heap dump.
+Notice that you can see a large cache in `ChangeIdStateCache`, and another in
+`PermissionManager`. Are these sizes expected, or too large?
+
+Recall that you can also inspect the cache hit rate at runtime using `adb shell
+dumpsys cacheinfo`. A large cache doesn't necessarily mean that the hit rate is
+high.

diff --git a/api-guidelines/heap_dominator_tree.png b/api-guidelines/heap_dominator_tree.png
new file mode 100644
index 0000000..0e3bc2d
--- /dev/null
+++ b/api-guidelines/heap_dominator_tree.png
Binary files differ

diff --git a/api-guidelines/pitot_binders_high_calls_count.png b/api-guidelines/pitot_binders_high_calls_count.png
new file mode 100644
index 0000000..37583b8
--- /dev/null
+++ b/api-guidelines/pitot_binders_high_calls_count.png
Binary files differ

diff --git a/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png b/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png
new file mode 100644
index 0000000..aed0836
--- /dev/null
+++ b/api-guidelines/pitot_heap_dumps_propertyinvalidatedcache.png
Binary files differ
commit	03dcaa2618a6e933dd2edfbb503e52208b678fcb	[log] [tgz]
author	Treehugger Robot <[email protected]>	Tue Jul 09 19:52:30 2024 +0000
committer	Gerrit Code Review <[email protected]>	Tue Jul 09 19:52:30 2024 +0000
tree	19437f175b4b126beef4e05a99d23c27229eb356
parent	b9b82d6909fdf626198083e66c8fab2258feae9f [diff]
parent	87ecd88140b37319201d664d838397740cd4008a [diff]