| # Build Performance |
| |
| ## Debugging Build Performance |
| |
| ### Tracing |
| |
| soong_ui has tracing built in, so that every build execution's trace can be |
| viewed. Just open `$OUT_DIR/build.trace.gz` in Chrome's <chrome://tracing>, or |
| with [catapult's trace viewer][catapult trace_viewer]. The last few traces are |
| stored in `build.trace.#.gz` (larger numbers are older). The associated logs |
| are stored in `soong.#.log` and `verbose.#.log.gz`. |
| |
|  |
| |
| ### Critical path |
| |
| soong_ui logs the wall time of the longest dependency chain compared to the |
| elapsed wall time in `$OUT_DIR/soong.log`. For example: |
| ``` |
| critical path took 3m10s |
| elapsed time 5m16s |
| perfect parallelism ratio 60% |
| critical path: |
| 0:00 build out/target/product/generic_arm64/obj/FAKE/sepolicy_neverallows_intermediates/policy_2.conf |
| 0:04 build out/target/product/generic_arm64/obj/FAKE/sepolicy_neverallows_intermediates/sepolicy_neverallows |
| 0:13 build out/target/product/generic_arm64/obj/ETC/plat_sepolicy.cil_intermediates/plat_sepolicy.cil |
| 0:01 build out/target/product/generic_arm64/obj/ETC/plat_pub_versioned.cil_intermediates/plat_pub_versioned.cil |
| 0:02 build out/target/product/generic_arm64/obj/ETC/vendor_sepolicy.cil_intermediates/vendor_sepolicy.cil |
| 0:16 build out/target/product/generic_arm64/obj/ETC/sepolicy_intermediates/sepolicy |
| 0:00 build out/target/product/generic_arm64/obj/ETC/plat_seapp_contexts_intermediates/plat_seapp_contexts |
| 0:00 Install: out/target/product/generic_arm64/system/etc/selinux/plat_seapp_contexts |
| 0:02 build out/target/product/generic_arm64/obj/NOTICE.txt |
| 0:00 build out/target/product/generic_arm64/obj/NOTICE.xml.gz |
| 0:00 build out/target/product/generic_arm64/system/etc/NOTICE.xml.gz |
| 0:01 Installed file list: out/target/product/generic_arm64/installed-files.txt |
| 1:00 Target system fs image: out/target/product/generic_arm64/obj/PACKAGING/systemimage_intermediates/system.img |
| 0:01 Install system fs image: out/target/product/generic_arm64/system.img |
| 0:01 Target vbmeta image: out/target/product/generic_arm64/vbmeta.img |
| 1:26 Package target files: out/target/product/generic_arm64/obj/PACKAGING/target_files_intermediates/aosp_arm64-target_files-6663974.zip |
| 0:01 Package: out/target/product/generic_arm64/aosp_arm64-img-6663974.zip |
| 0:01 Dist: /buildbot/dist_dirs/aosp-master-linux-aosp_arm64-userdebug/6663974/aosp_arm64-img-6663974.zip |
| ``` |
| |
| If the elapsed time is much longer than the critical path then additional |
| parallelism on the build machine will improve total build times. If there are |
| long individual times listed in the critical path then improving build times |
| for those steps or adjusting dependencies so that those steps can run earlier |
| in the build graph will improve total build times. |
| |
| ### Soong |
| |
| Soong can be traced and profiled using the standard Go tools. It understands |
| the `-cpuprofile`, `-trace`, and `-memprofile` command line arguments, but we |
| don't currently have an easy way to enable them in the context of a full build. |
| |
| ### Kati |
| |
| In general, the slow path of reading Android.mk files isn't particularly |
| performance sensitive, since it doesn't need to happen on every build. It is |
| important for the fast-path (detecting whether it needs to regenerate the ninja |
| file) to be fast however. And it shouldn't hit the slow path too often -- so |
| don't rely on output of a `$(shell)` command that includes the current timestamp, |
| or read a file that's going to change on every build. |
| |
| #### Regen check is slow |
| |
| In most cases, we've found that the fast-path is slow because all of the |
| `$(shell)` commands need to be re-executed to determine if their output changed. |
| The `$OUT_DIR/verbose.log.gz` contains statistics from the regen check: |
| |
| ``` |
| verbose: *kati*: regen check time: 0.754030 |
| verbose: *kati*: glob time (regen): 0.545859 / 43840 |
| verbose: *kati*: shell time (regen): 0.278095 / 66 (59 unique) |
| verbose: *kati*: 0.012 / 1 mkdir -p out/target/product/generic && echo Android/aosp_arm/generic:R/AOSP.MASTER/$(date -d @$(cat out/build_date.txt) +%m%d%H%M):eng/test-keys >out/target/product/generic/build_fingerprint.txt && grep " " out/target/product/generic/build_fingerprint.txt |
| verbose: *kati*: 0.010 / 1 echo 'com.android.launcher3.config.FlagOverrideSampleTest com.android.launcher3.logging.FileLogTest com.android.launcher3.model.AddWorkspaceItemsTaskTest com.android.launcher3.model.CacheDataUpdatedTaskTest com.android.launcher3.model.DbDowngradeHelperTest com.android.launcher3.model.GridBackupTableTest com.android.launcher3.model.GridSizeMigrationTaskTest com.android.launcher3.model.PackageInstallStateChangedTaskTest com.android.launcher3.popup.PopupPopulatorTest com.android.launcher3.util.GridOccupancyTest com.android.launcher3.util.IntSetTest' | tr ' ' '\n' | cat |
| verbose: *kati*: 0.010 / 1 cd cts/tests/framework/base/windowmanager ; find -L * -name "Components.java" -and -not -name ".*" |
| verbose: *kati*: 0.010 / 1 git -C test/framework/build log -s -n 1 --format="%cd" --date=format:"%Y%m%d_%H%M%S" 2>/dev/null |
| verbose: *kati*: 0.009 / 2 cd development/samples/ShortcutDemo/publisher ; find -L ../common/src -name "*.java" -and -not -name ".*" |
| verbose: *kati*: 0.009 / 2 cd development/samples/ShortcutDemo/launcher ; find -L ../common/src -name "*.java" -and -not -name ".*" |
| verbose: *kati*: 0.009 / 1 if ! cmp -s out/target/product/generic/obj/CONFIG/kati_packaging/dist.mk.tmp out/target/product/generic/obj/CONFIG/kati_packaging/dist.mk; then mv out/target/product/generic/obj/CONFIG/kati_packaging/dist.mk.tmp out/target/product/generic/obj/CONFIG/kati_packaging/dist.mk; else rm out/target/product/generic/obj/CONFIG/kati_packaging/dist.mk.tmp; fi |
| verbose: *kati*: 0.008 / 1 mkdir -p out/target/product/generic && echo R/AOSP.MASTER/$(cat out/build_number.txt):eng/test-keys >out/target/product/generic/build_thumbprint.txt && grep " " out/target/product/generic/build_thumbprint.txt |
| verbose: *kati*: 0.007 / 1 echo 'com.android.customization.model.clock.BaseClockManagerTest com.android.customization.model.clock.ClockManagerTest com.android.customization.model.grid.GridOptionsManagerTest com.android.customization.model.theme.ThemeManagerTest' | tr ' ' '\n' | cat |
| verbose: *kati*: 0.007 / 1 uname -sm |
| verbose: *kati*: stat time (regen): 0.361907 / 1241 |
| ``` |
| |
| In this case, the total time spent checking was 0.75 seconds, even though the |
| other "(regen)" numbers add up to more than that (some parts are parallelized |
| where possible). Often times, the biggest contributor is the `$(shell)` times |
| -- in this case, 66 calls took 0.27s. The top 10 longest shell functions are |
| printed. |
| |
| All the longest commands in this case are all variants of a call to `find`, but |
| this is where using pure make functions instead of calling out to the shell can |
| make a performance impact -- many calls to check if `26 > 20` can add up. We've |
| added some basic math functions in `math.mk` to help some common use cases that |
| used to be rather expensive when they were used too often. |
| |
| There are some optimizations in place for find commands -- if Kati can |
| understand the find command, the built-in find emulator can turn some of them |
| into glob or stat checks (falling back to calling `find` if one of those imply |
| that the output may change). Many of the common macros produce find commands |
| that Kati can understand, but if you're writing your own, you may want to |
| experiment with other options if they're showing up in this list. For example, |
| if this was significantly more expensive (either in runtime, or was called |
| often): |
| |
| ``` |
| .../kati.go:127: *kati*: 0.015 cd libcore && (find luni/src/test/java -name "*.java" 2> /dev/null) | grep -v -f java_tests_blacklist |
| ``` |
| |
| It may be more efficient to move the grep into make, so that the `find` portion |
| can be rewritten and cached: |
| |
| ``` |
| $(filter-out $(file <$(LOCAL_PATH)/java_tests_blacklist),$(call all-java-files-under,luni/src/test/java)) |
| ``` |
| |
| Others can be simplified by just switching to an equivalent find command that |
| Kati understands: |
| |
| ``` |
| .../kati.go:127: *kati*: 0.217 find device vendor -type f -name \*.pk8 -o -name verifiedboot\* -o -name \*.x509.pem -o -name oem\*.prop | sort |
| ``` |
| |
| By adding the implicit `-a` and moving the `| sort` to Make, this can now be |
| cached by Kati: |
| |
| ``` |
| $(sort $(shell find device vendor -type -f -a -name \*.pk8 -o -name verifiedboot\* -o -name \*.x509.pem -o -name oem\*.prop)) |
| ``` |
| |
| Kati has now learned about the implicit `-a`, so this particular change is no |
| longer necessary, but the basic concept holds. |
| |
| #### Kati regens too often |
| |
| Kati prints out what triggered the slow path to be taken -- this can be a |
| changed file, a changed environment variable, or different output from a |
| `$(shell)` command: |
| |
| ``` |
| out/soong/Android-aosp_arm.mk was modified, regenerating... |
| ``` |
| |
| The state is stored in `$OUT_DIR/.kati_stamp*` files, and can be (partially) |
| read with the `ckati_stamp_dump` tool in prebuilts/build-tools. More debugging |
| is available when ckati is run with `--regen_debug`, but that can be a lot of |
| data to understand. |
| |
| #### Debugging the slow path |
| |
| Kati will now dump out information about which Makefiles took the most time to |
| execute. This is also in the `verbose.log.gz` file: |
| |
| ``` |
| verbose: *kati*: included makefiles: 73.640833 / 232810 (1066 unique) |
| verbose: *kati*: 18.389 / 1 out/soong/Android-aosp_arm.mk |
| verbose: *kati*: 13.137 / 20144 build/make/core/soong_cc_prebuilt.mk |
| verbose: *kati*: 11.743 / 27666 build/make/core/base_rules.mk |
| verbose: *kati*: 2.289 / 1 art/Android.mk |
| verbose: *kati*: 2.054 / 1 art/build/Android.cpplint.mk |
| verbose: *kati*: 1.955 / 28269 build/make/core/clear_vars.mk |
| verbose: *kati*: 1.795 / 283 build/make/core/package.mk |
| verbose: *kati*: 1.790 / 283 build/make/core/package_internal.mk |
| verbose: *kati*: 1.757 / 17382 build/make/core/link_type.mk |
| verbose: *kati*: 1.078 / 297 build/make/core/aapt2.mk |
| ``` |
| |
| This shows that soong_cc_prebuilt.mk was included 20144 times, for a total time |
| spent of 13.137 secounds. While Android-aosp_arm.mk was only included once, and |
| took 18.389 seconds. In this case, Android-aosp_arm.mk is the only file that |
| includes soong_cc_prebuilt.mk, so we can safely assume that 13 of the 18 seconds |
| in Android-aosp_arm.mk was actually spent within soong_cc_prebuilt.mk (or |
| something that it included, like base_rules.mk). |
| |
| By default this only includes the top 10 entries, but you can ask for the stats |
| for any makefile to be printed with `$(KATI_profile_makefile)`: |
| |
| ``` |
| $(KATI_profile_makefile build/make/core/product.mk) |
| ``` |
| |
| With these primitives, it's possible to get the timing information for small |
| chunks, or even single lines, of a makefile. Just move the lines you want to |
| measure into a new makefile, and replace their use with an `include` of the |
| new makefile. It's possible to analyze where the time is being spent by doing |
| a binary search using this method, but you do need to be careful not to split |
| conditionals across two files (the ifeq/else/endif must all be in the same file). |
| |
| ### Ninja |
| |
| #### Understanding why something rebuilt |
| |
| Add `NINJA_ARGS="-d explain"` to your environment before a build, this will cause |
| ninja to print out explanations on why actions were taken. Start reading from the |
| beginning, as this much data can be hard to read: |
| |
| ``` |
| $ cd art |
| $ mma |
| $ touch runtime/jit/profile_compilation_info.h |
| $ NINJA_ARGS="-d explain" mma |
| ... |
| ninja explain: output out/soong/.intermediates/art/tools/cpp-define-generator/cpp-define-generator-data/linux_glibc_x86_64/obj/art/tools/cpp-define-generator/main.o older than most recent input art/runtime/jit/profile_compilation_info.h ( |
| 1516683538 vs 1516685188) |
| ninja explain: out/soong/.intermediates/art/tools/cpp-define-generator/cpp-define-generator-data/linux_glibc_x86_64/obj/art/tools/cpp-define-generator/main.o is dirty |
| ninja explain: out/soong/.intermediates/art/tools/cpp-define-generator/cpp-define-generator-data/linux_glibc_x86_64/cpp-define-generator-data is dirty |
| ninja explain: out/soong/host/linux-x86/bin/cpp-define-generator-data is dirty |
| ninja explain: out/soong/.intermediates/art/tools/cpp-define-generator/cpp-define-generator-asm-support/gen/asm_support_gen.h is dirty |
| ninja explain: out/soong/.intermediates/art/cmdline/art_cmdline_tests/android_arm_armv7-a_core_cmdline_parser_test/obj/art/cmdline/cmdline_parser_test.o is dirty |
| ... |
| ``` |
| |
| In this case, art/cmdline/cmdline_parser_test.o was rebuilt because it uses |
| asm_support_gen.h, which was generated by cpp-define-generator-data, which uses |
| profile_compilation_info.h. |
| |
| You'll likely need to cross-reference this data against the build graph in the |
| various .ninja files. The files are (mostly) human-readable, but a (slow) web |
| interface can be used by running `NINJA_ARGS="-t browse <target>" m`. |
| |
| #### Builds take a long time |
| |
| If the long part in the trace view of a build is a relatively solid block, then |
| the performance is probably more related to how much time the actual build |
| commands are taking than having extra dependencies, or slowdowns in |
| soong/kati/ninja themselves. |
| |
| Beyond looking at visible outliers in the trace view, we don't have any tooling |
| to help in this area yet. It's possible to aggregate some of the raw data |
| together, but since our builds are heavily parallelized, it's particularly easy |
| for build commands to impact unrelated build commands. This is an area we'd |
| like to improve -- we expect keeping track of user/system time per-action would |
| provide more reliable data, but tracking some full-system data (memory/swap |
| use, disk bandwidth, etc) may also be necessary. |
| |
| ## Known Issues |
| |
| ### Common |
| |
| ### <= Android 10 (Q): mm |
| |
| Soong always loads the entire module graph, so as modules convert from Make to |
| Soong, `mm` is becoming closer to `mma`. This produces more correct builds, but |
| does slow down builds, as we need to verify/produce/load a larger build graph. |
| |
| As of Android Q, loading large build graphs is fast, and in Android R, `mm` is |
| now an alias of `mma`. |
| |
| ### Android 8.1 (Oreo MR1) |
| |
| In some cases, a tree would get into a state where Soong would be run twice on |
| every incremental build, even if there was nothing to do. This was fixed in |
| master with [these changes][blueprint_microfactory], but they were too |
| significant to backport at the time. And while they fix this particular issue, |
| they appear to cause ninja to spend more time during every build loading the |
| `.ninja_log` / `.ninja_deps` files, especially as they become larger. |
| |
| A workaround to get out of this state is to remove the build.ninja entry from |
| `$OUT_DIR/.ninja_log`: |
| |
| ``` |
| sed -i "/\/build.ninja/d" $(get_build_var OUT_DIR)/.ninja_log |
| ``` |
| |
| [catapult trace_viewer]: https://github.com/catapult-project/catapult/blob/master/tracing/README.md |
| [ninja parse optimization]: https://android-review.googlesource.com/c/platform/external/ninja/+/461005 |
| [blueprint_microfactory]: https://android-review.googlesource.com/q/topic:%22blueprint_microfactory%22+status:merged |