This runbook is intended to give developers the skills to identify and fix key performance issues in your app independently.
Generally, the recommended workflow to identify and remedy performance issues is as follows:
Manual debugging of individual test runs is critical for understanding and debugging these performance issues. The above steps cannot be replaced by analyzing aggregated data. However, setting up metrics collection in automated testing as well as in the field is also important to understand what users are actually seeing and identify when regressions may occur:
dumpsys gfxinfo
commands that bracket a user journey in question. This is a reasonable way to understand variation in jank over a specific user journey. The RenderTime metrics, which highlight how long frames are taking to draw, are more important than the count of janky frames for identifying regressions or improvements.Proper setup is essential for getting accurate, repeatable, actionable benchmarks from an application. In general, you want to test on a system that is as close to production as possible, while suppressing sources of noise. Below are a number of APK and system specific steps you can take to prepare a test setup, some of which are use case specific.
Applications can instrument their code with the androidx.tracing.Trace class. It is strongly recommended to instrument key workloads in your application to increase the utility of traces, both for local profiling, and for inspecting results from CI. In Kotlin, the androidx.tracing:tracing-ktx
module makes this very simple:
fun loadItemData(configuration: Config) = trace("loadItemData") { // perform expensive operation here ... }
While traces are being captured, tracing does incur a small overhead (roughly 5us) per section, so don't put it around every method. Just tracing larger chunks of work (>0.1ms) can give significant insights into bottlenecks.
Do not measure performance on a debug build.
Debug variants can be helpful for troubleshooting and symbolizing stack samples, but they have severe non-linear impacts on performance. Devices on Q+ can use profileableFromShell in their manifest to enable profiling in release builds.
Use your production grade proguard configuration. Depending on the resources your application utilizes, this can have a substantial impact on performance. Note that some proguard configs strip tracepoints, consider removing those rules for the configuration you're running tests on
Compile your application on-device to a known state (generally speed or speed-profile). Background JIT activity can have a significant performance overhead, and you will hit it often if you are reinstalling the APK between test runs. The command to do this is:
adb shell cmd package compile -m speed -f com.google.packagename
The ‘speed’ compilation mode will compile the app completely; the ‘speed-profile’ mode will compile the app according to a profile of the utilized code paths that is collected during app usage. It can be a bit tricky to collect profiles consistently and correctly, so if you decide to use them, you‘ll probably want to confirm they are what you expect. They’re located here:
/data/misc/profiles/ref/[package-name]/primary.prof
Note that Macrobenchmark allows you to directly specify compilation mode.
For low-level/high fidelity measurements, calibrate your devices. Try to run A/B comparisons across the same device and same OS version. There can be significant variations in performance, even across the same device type.
On rooted devices, consider using a lockClocks script for microbenchmarks. Among other things, these scripts: place CPUs at a fixed frequency, disable little cores, configure the GPU, and disable thermal throttling. This is not recommended for user-experience focused tests (i.e app launch, DoU testing, jank testing, etc.), but can be essential for cutting down noise in microbenchmark tests.
On rooted devices, consider killing the application and dropping file caches (“echo 3 > /proc/sys/vm/drop_caches”) between iterations. This will more accurately model cold start behavior.
When possible, consider using a testing framework like Macrobenchmark, which can reduce noise in your measurements, and prevent measurement inaccuracy.
A trampoline activity can extend app startup time unnecessarily, and it's important to be aware if your app is doing it. As you can see in the example trace below, one activityStart is immediately followed by another activityStart without any frames being drawn by the first activity.
This can happen both in a notification entrypoint and a regular app startup entrypoint, and can often be addressed by refactoring -- can you add the check you're firing up an activity for as a module that can be shared between activities?
You may note that GCs are happening more frequently than you expect in a systrace.
In this case, every 10 seconds during a long-running operation is an indicator that we might be allocating unnecessarily but consistently over time:
Or, you may notice that a specific callstack is making the vast majority of the allocations when using the Memory Profiler. You don't need to eliminate all allocations aggressively, as this can make code harder to maintain. Start instead by working on hotspots of allocations.
The graphics pipeline is relatively complicated, and there can be some nuance involved in determining whether a user ultimately may have seen a dropped frame - in some cases, the platform can “rescue” a frame using buffering. However, you can ignore most of that nuance to easily identify problematic frames from your app's perspective.
When frames are being drawn with little work required from the app, the Choreographer#doFrame tracepoints occur on a 16.7ms cadence (assuming a 60 FPS device):
If you zoom out and navigate through the trace, you‘ll sometimes see frames take a little longer to complete, but that’s still okay because they're not taking more than their allotted 16.7ms time:
But when you actually see a disruption to that regular cadence, that will be a janky frame:
With a little practice, you'll be able to see them everywhere!
In some cases, you'll just need to zoom into that tracepoint for more information about which views are being inflated or what RecyclerView is doing. In other cases, you may have to inspect further.
For more information about identifying janky frames and debugging their causes, see
Slow Rendering Vitals Documentation
In case you like videos, here's a talk summarizing basic systrace usage.
The Android developer documentation on App startup time provides a good overview of the application startup process.
Generally the stages of app startup are:
Startup types can be disambiguated by these stages:
We recommend capturing systraces using the on-device system tracing app available in Developer Options. If you'd like to use command-line tools, Perfetto is available on Android Q+, while devices on earlier versions should rely on systrace.
Note that “first frame” is a bit of a misnomer, applications can vary significantly in how they handle startup after creating the initial activity. Some applications will continue inflation for several frames, and others will even immediately launch into a secondary activity.
When possible, we recommend that app developers include a reportFullyDrawn (available Q+) call when startup is completed from the application’s perspective. RFD-defined start times can be extracted through the Perfetto trace processor, and a user-visible trace event will be emitted.
Some things that you should look for include:
The Android Studio memory profiler is a powerful tool to reduce memory pressure that could be caused by memory leaks or bad usage patterns since it provides a live view of object allocations
To fix memory problems in your app you can use the memory profiler to track why and how often garbage collections happen.
The overall steps taken when profiling app memory can be broken down into the following steps:
Start recording a memory profiling session of the user journey you care about, then look for an increasing object count which will eventually lead to garbage collections.
Once you have identified that there is a certain user journey that is adding memory pressure start analyzing for root causes of the memory pressure.
Select a range in the timeline to visualize both Allocations and Shallow Size. The are multiple ways to sort this data. Here are some examples of how each view can help you analyze problems.
Useful when you want to find classes that are generating objects that should otherwise be cached or reused from a memory pool.
For example: Imagine you see an app creating 2000 objects of class called “Vertex” every second. This would increase the Allocations count by 2000 every second and you would see it when sorting by class. Should we be reusing such objects to avoid generating that garbage? If the answer is yes, then likely implementing a memory pool will be needed.
Useful when there is a hot path where memory is being allocated, maybe inside a for loop or inside a specific function doing a lot of allocation work you will be able to find it here.
Shallow size only tracks the memory of the object itself, so it will be useful for tracking simple classes composed mostly of primitive values only.
Retained Size shows the total memory due to the object and references that are solely referenced by the object, so it will be useful for tracking memory pressure due to complex objects. To get this value, take a full memory dump and it will be added as a column.
The more evident and easy to measure impact of memory optimizations are GCs. When an optimization reduces the memory pressure, then you should see fewer GCs.
To measure this, in the profiler timeline measure the time between GCs, and you should see it taking longer between GCs.
The ultimate impact of memory improvements like this is: