libatrace_rust benchmarks

Benchmarks to compare the performance of Rust ATrace bindings with directly calling the libcutils methods from C++.

Benchmarks

ATrace wrapper benchmarks

There are two binaries implementing the same benchmarks:

  • libatrace_rust_benchmark (atrace_benchmark.rs) for Rust.
  • libatrace_rust_benchmark_cc (atrace_benchmark.cc) for C++.

The benchmarks emit ATrace events with tracing off and tracing on. atrace_begin is measured with short and long event names to check if the string length affects timings. For example, tracing_on_begin/1000 measures atrace_begin with a 1000-character name and tracing enabled.

ATrace tracing subscriber benchmark

There is a benchmark for the tracing crate subscriber - libatrace_tracing_subscriber_benchmark. We use it to check overhead over the base libatrace_rust.

Similarly to the wrapper benchmarks, the subscriber is measured with tracing off and on. There are cases with and without extra fields to measure the cost of formatting. Cases that start with filtered_ measure the subscriber in filtering mode with tracing disabled.

Running the benchmarks

To run the benchmarks, push the binaries to the device with adb and launch them via adb shell. You may need to push dynamic libraries they depend on as well if they're not present on device and run with LD_LIBRARY_PATH.

Do not enable ATrace collectors. The benchmarks effectively emit events in a loop and will spam any trace and distort performance results.

The benchmarks will override system properties to enable or disable events, specifically ATrace App event collection in debug.atrace.app_number and debug.atrace.app_0. After a successful execution the events will be disabled.

Results

The timings are not representative of actual cost of fully enabling tracing, only of emitting events via API, since there's nothing receiving the events.

The tests were done on a aosp_cf_x86_64_phone-userdebug Cuttlefish VM. Execution times on real device may be different but we expect similar relative performance between Rust wrappers and C.

If you notice that measurements with tracing off and tracing on have similar times, it might mean that enabling ATrace events failed and you need to debug the benchmark.

ATrace wrapper

Rust results from libatrace_rust_benchmark 2>&1 | grep time:

tracing_off_begin/10    time:   [6.0211 ns 6.0382 ns 6.0607 ns]
tracing_off_begin/1000  time:   [6.0119 ns 6.0418 ns 6.0823 ns]
tracing_off_end         time:   [6.5417 ns 6.6801 ns 6.8131 ns]
tracing_on_begin/10     time:   [1.2847 µs 1.2929 µs 1.3044 µs]
tracing_on_begin/1000   time:   [1.5395 µs 1.5476 µs 1.5580 µs]
tracing_on_end          time:   [1.1153 µs 1.1208 µs 1.1276 µs]

C++ results from libatrace_rust_benchmark_cc:

------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
BM_TracingOffAtraceBegin/10         4.00 ns         3.96 ns    175953732
BM_TracingOffAtraceBegin/1000       4.05 ns         4.02 ns    176298494
BM_TracingOffAtraceEnd              4.08 ns         4.05 ns    176422059
BM_TracingOnAtraceBegin/10          1119 ns         1110 ns       640816
BM_TracingOnAtraceBegin/1000        1151 ns         1142 ns       615781
BM_TracingOnAtraceEnd               1076 ns         1069 ns       653646

ATrace tracing subscriber

The tracing subscriber time consists of the underlying libatrace_rust call plus the time spent in the subscriber itself.

Results from libatrace_tracing_subscriber_benchmark 2>&1 | grep time:

tracing_off_event       time:   [47.444 ns 47.945 ns 48.585 ns]
filtered_event          time:   [26.852 ns 26.942 ns 27.040 ns]
tracing_off_event_args  time:   [80.597 ns 80.997 ns 81.475 ns]
filtered_event_args     time:   [26.680 ns 26.782 ns 26.887 ns]
tracing_off_span        time:   [316.48 ns 317.72 ns 319.12 ns]
filtered_span           time:   [27.900 ns 27.959 ns 28.018 ns]
tracing_off_span_args   time:   [364.92 ns 367.57 ns 370.95 ns]
filtered_span_args      time:   [27.625 ns 27.919 ns 28.207 ns]
tracing_on_event        time:   [1.4639 µs 1.4805 µs 1.4954 µs]
tracing_on_event_args   time:   [2.0088 µs 2.0197 µs 2.0314 µs]
tracing_on_span         time:   [2.7907 µs 2.7996 µs 2.8103 µs]
tracing_on_span_args    time:   [3.6846 µs 3.6992 µs 3.7168 µs]