This tool automates the process of running local A/B performance micro benchmarks by comparing two Git revisions (branches, commits, tags, etc.). It is designed to help developers quickly measure the performance impact of their changes before code submission.
The script performs the following actions:
main).HEAD~1).Before running the tool, please ensure the following conditions are met:
--serial flag to specify which one to use. The script will exit with an error if multiple devices are detected without a specified ID.To get more stable and reliable benchmark results, it's important to minimize environmental noise. Here are some recommendations:
./benchmark/gradle-plugin/src/main/resources/scripts/disableJit.sh
./benchmark/gradle-plugin/src/main/resources/scripts/lockClocks.sh
userdebug build of AOSP for more performance control. AOSP build do not have GMS services hences reduces background interference.The script is executed via Gradle from the development/ab-benchmarking directory.
./gradlew run --args="<rev_a> <rev_b> <module> <benchmarkTest> [options]"
To find the serial ID of all connected devices, run the following ADB command in your terminal:
adb devices
The output will list your connected devices. The string in the first column is the serial ID.
List of devices attached emulator-5554 device 123456789ABCDEF device
The script accepts the following positional arguments in order:
rev_a (String): The first Git revision to test (e.g., a branch, commit hash, tag, or HEAD).rev_b (String): The second Git revision to test.module (String): The Gradle module path containing the benchmark test.compose:ui:ui-benchmarkbenchmark_test (String): The fully qualified class name of the benchmark test to run.androidx.compose.ui.benchmark.accessibility.AccessibilityBenchmarkClassName#methodName.androidx.compose.ui.benchmark.ModifiersBenchmark#full[clickable_1x]--run_count (Int): The number of times the entire test suite should be run on each revision to gather a sample set. For example, a run_count of 10 will result in 10 test executions on rev_a and 10 on rev_b. Defaults to 1.--iteration_count (Int): The number of internal iterations the benchmark framework should perform in a single test run. This is passed directly to the androidx.benchmark.iterations argument. Defaults to 50.--serial (String): The serial ID of the target Android device to use for benchmarking. This is required if more than one device is connected. Use the adb devices command to find the ID.--output_path (String): The path where temporary and final result files should be stored. This includes intermediate CSV files, the final .metadata.json file, and a histogram plot. Defaults to ~/androidx-main/frameworks/support/development/ab-benchmarking/app/build/benchmark-results/.Here is an example that compares the main branch against a feature branch named my-perf-fix.
./gradlew run --args="main my-perf-fix compose:ui:ui-benchmark androidx.compose.ui.benchmark.accessibility.AccessibilityBenchmark --run_count 5 --iteration_count 1000 --serial emulator-5554"
To measure the impact of the very last commit on the current branch, you can compare HEAD with its parent, HEAD~1.
./gradlew run --args="HEAD~1 HEAD compose:ui:ui-benchmark androidx.compose.ui.benchmark.accessibility.AccessibilityBenchmark --run_count 3 --iteration_count 1500 --serial emulator-5554"
To isolate the performance of a specific method within a benchmark class, use the # separator.
./gradlew run --args="main my-perf-fix compose:ui:ui-benchmark androidx.compose.ui.benchmark.accessibility.AccessibilityBenchmark#mySpecificTest --run_count 5 --iteration_count 500 --serial emulator-5554"
The tool produces four forms of output: a human-readable summary, a machine-readable CSV line, a metadata JSON file, and a histogram plot.
The summary provides descriptive statistics for the benchmark timings (in nanoseconds) from both datasets (revisions) and an analysis of their difference.
--- Comparison for: withTrailingLambdas_compose ---
Dataset 1 (Branch A) | Dataset 2 (Branch B)
----------------------------------------------------------------
Count | 100 | 100
Min (ns) | 160768.30 | 160433.01
Mean (ns) | 167300.93 | 167229.12
Median (ns) | 164604.77 | 164904.08
Std. Dev. (ns) | 8627.72 | 7774.46
Min Difference: | -335.28 ns (-0.21%)
Mean Difference: | -71.81 ns (-0.04%)
Median Difference: | 299.31 ns (0.18%)
95% CI of Diff: | [-678.86, 1390.19] ns ([-0.41%, 0.84%])
The confidence interval contains zero, suggesting no statistically significant difference between the medians.
--- MannWhitneyUTest Results (Branch B vs. Branch A) ---
P-value: 0.5675
Result: No statistically significant difference.
-------------------------------------------------------
Count, Min, Mean, Median, and Std. Dev. provide a basic overview of the timing distributions for each revision.0.05 indicates a statistically significant difference between the two revisions.0.05 suggests no statistically significant difference was detected.A single CSV line is printed for easy parsing by other scripts or for logging results.
--- Machine-Readable CSV --- benchmarkName,count,min1,min2,min_diff,min_diff_%,mean1,mean2,mean_diff_%,median1,median2,p-value,median_diff_%,median_diff,median_diff_ci_lower,median_diff_ci_upper,median_diff_ci_lower_%,median_diff_ci_upper_% withTrailingLambdas_compose,100,160768.30,160433.01,-335.28,-0.21%,167300.93,167229.12,-0.04%,164604.77,164904.08,0.5675,0.18%,299.31,-678.86,1390.19,-0.41%,0.84%
benchmarkName: The name of the benchmark test method.count: The number of measurements taken for each revision.min1: The minimum timing value for revision A.min2: The minimum timing value for revision B.min_diff: The absolute difference in minimums (min2 - min1).min_diff_%: The percentage difference in minimums.mean1: The mean (average) timing for revision A.mean2: The mean (average) timing for revision B.mean_diff_%: The percentage difference in means.median1: Median of the baseline revision (A).median2: Median of the comparison revision (B).p-value: The p-value from the Mann-Whitney U test.median_diff_%: The percentage difference in medians ((median2 - median1) / median1).median_diff: The absolute difference in medians (median2 - median1).median_diff_CI_lower: The lower bound of the 95% confidence interval for the median difference.median_diff_CI_upper: The upper bound of the 95% confidence interval for the median difference.median_diff_CI_lower_%: The lower bound of the confidence interval as a percentage.median_diff_CI_upper_%: The upper bound of the confidence interval as a percentage.A JSON file named metadata.json is created in the output directory. It contains information about the benchmark run, including:
A PNG image file named <benchmark_name>_histogram.png is created in the output directory, where benchmark_name is the name of the benchmark test method. This plot visualizes the distribution of the benchmark timings for both revisions, making it easier to spot differences in performance. Note: The <path_to_output_dir> is the value passed to the --output_path parameter. If this parameter is not specified, it defaults to ~/androidx-main/frameworks/support/development/ab-benchmarking/app/build/benchmark-results/.
--- Graphical Plot --- Saved histogram to: file://<path_to_output_dir>/<benchmark_name>_histogram.png