blob: 36bff9368583cc69dbcf35202a02b90c67b79a50 [file] [log] [blame]
<html devsite>
<head>
<title>Using Profile-Guided Optimization (PGO)</title>
<meta name="project_path" value="/_project.yaml">
<meta name="book_path" value="/_book.yaml">
</head>
<body>
<!--
Copyright 2018 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>The Android build system supports using Clang's <a href=
"https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization">profile-guided
optimization (PGO)</a> on native Android modules that have <a href=
"https://android.googlesource.com/platform/build/soong/">blueprint</a> build
rules. This page describes Clang PGO, how to continually generate and update
profiles used for PGO, and how to integrate PGO with the build system (with
use case).</p>
<h2 id="about-clang-pgo">About Clang PGO</h2>
<p>Clang can perform profile-guided optimization using two types of
profiles:</p>
<ul>
<li><strong>Instrumentation-based profiles</strong> are generated from an
instrumented target program. These profiles are detailed and impose a high
runtime overhead.</li>
<li><strong>Sampling-based profiles</strong> are typically produced by
sampling hardware counters. They impose a low runtime overhead, and can be
collected without any instrumentation or modification to the binary. They
are less detailed than instrumentation-based profiles.</li>
</ul>
<p>All profiles should be generated from a representative workload that
exercises the typical behavior of the application. While Clang supports both
AST-based (<code>-fprofile-instr-generate</code>) and LLVM IR-based
(<code>-fprofile-generate)</code>, Android supports only LLVM IR-based for
instrumentation-based PGO.</p>
<p>The following flags are needed to build for profile collection:</p>
<ul>
<li><code>-fprofile-generate</code> for IR-based instrumentation. With this
option, the backend uses a weighted minimal spanning tree approach to
reduce the number of instrumentation points and optimize their placement to
low-weight edges (use this option for the link step as well). The Clang
driver automatically passes the profiling runtime
(<code>libclang_rt.profile-<em>arch</em>-android.a</code>) to the linker.
This library contains routines to write the profiles to disk upon program
exit.</li>
<li><code>-gline-tables-only</code> for sampling-based profile collection
to generate minimal debug information.</li>
</ul>
<p>A profile can be used for PGO using
<code>-fprofile-instr-use=<em>pathname</em></code> or
<code>-fprofile-sample-use=<em>pathname</em></code> for instrumentation-based
and sampling-based profiles respectively.</p>
<p><strong>Note:</strong> As changes are made to the code, if Clang can no
longer use the profile data it generates a
<code>-Wprofile-instr-out-of-date</code> warning.</p>
<h2 id="using-pgo">Using PGO</h2>
<p>Using PGO involves the following steps:</p>
<ol>
<li>Build the library/executable with instrumentation by passing
<code>-fprofile-generate</code> to the compiler and linker.</li>
<li>Collect profiles by running a representative workload on the
instrumented binary.</li>
<li>Post-process the profiles using the <code>llvm-profdata</code> utility
(for details, see <a href="#handling-llvm-profile-files">Handling LLVM
profile files</a>).</li>
<li>Use the profiles to apply PGO by passing
<code>-fprofile-use=&lt;&gt;.profdata</code> to the compiler and
linker.</li>
</ol>
<p>For PGO in Android, profiles should be collected offline and checked in
alongside the code to ensure reproducible builds. The profiles can be used as
code evolves, but must be regenerated periodically (or whenever Clang warns
that the profiles are stale).</p>
<h3 id="collecting-profiles">Collecting profiles</h3>
<p>Clang can use profiles collected by running benchmarks using an
instrumented build of the library or by sampling hardware counters when the
benchmark is run. At this time, Android does not support using sampling-based
profile collection, so you must collect profiles using an instrumented
build:</p>
<ol>
<li>Identify a benchmark and the set of libraries collectively exercised by
that benchmark.</li>
<li>Add <code>pgo</code> properties to the benchmark and libraries (details
below).</li>
<li>Produce an Android build with an instrumented copy of these libraries
using:
<pre class="prettyprint">make ANDROID_PGO_INSTRUMENT=benchmark</pre>
</li>
</ol>
<p><code><em>benchmark</em></code> is a placeholder that identifies the
collection of libraries instrumented during build. The actual representative
inputs (and possibly another executable that links against a library being
benchmarked) are not specific to PGO and are beyond the scope of this
document.</p>
<ol>
<li>Flash or sync the instrumented build on a device.</li>
<li>Run the benchmark to collect profiles.</li>
<li>Use the <code>llvm-profdata</code> tool (discussed below) to
post-process the profiles and make them ready to be checked into the source
tree.</li>
</ol>
<h3 id="using-profiles-during-build">Using profiles during build</h3>
<p>Check the profiles into <code>toolchain/pgo-profiles</code> in an Android
tree. The name should match what is specified in the
<code>profile_file</code> sub-property of the <code>pgo</code> property for
the library. The build system automatically passes the profile file to Clang
when building the library. The <code>ANDROID_PGO_DISABLE_PROFILE_USE</code>
environment variable can be set to <strong><code>true</code></strong> to
temporarily disable PGO and measure its performance benefit.</p>
<p>To specify additional product-specific profile directories, append them to
the <code>PGO_ADDITIONAL_PROFILE_DIRECTORIES</code> make variable in a
<code>BoardConfig.mk</code>. If additional paths are specified, profiles in
these paths override those in <code>toolchain/pgo-profiles</code>.</p>
<p>When generating a release image using the <code>dist</code> target to
<code>make</code>, the build system writes the names of missing profile files
to <code>$DIST_DIR/pgo_profile_file_missing.txt</code>. You can check this
file to see what profile files were accidentally dropped (which silently
disables PGO).</p>
<h2 id="enabling-pgo-in-android-bp-files">Enabling PGO in Android.bp
files</h2>
<p>To enable PGO in <code>Android.bp</code> files for native modules, simply
specify the <code>pgo</code> property. This property has the following
sub-properties:</p>
<table>
<tr>
<th><strong>Property</strong>
</th>
<th><strong>Description</strong>
</th>
</tr>
<tr>
<td><code>instrumentation</code>
</td>
<td>Set to <code>true</code> for PGO using instrumentation. Default is
<code>false</code>.</td>
</tr>
<tr>
<td><code>sampling</code>
</td>
<td><strong>Currently unsupported.</strong> Set to <code>true</code> for
PGO using sampling. Default is <code>false</code>.</td>
</tr>
<tr>
<td><code>benchmarks</code>
</td>
<td>List of strings. This module is built for profiling if any benchmark
in the list is specified in the <code>ANDROID_PGO_INSTRUMENT</code> build
option.</td>
</tr>
<tr>
<td><code>profile_file</code>
</td>
<td>Profile file (relative to <code>toolchain/pgo-profile</code>) to use
with PGO. The build warns that this file doesn't exist by adding this
file to <code>$DIST_DIR/pgo_profile_file_missing.txt</code>
<em>unless</em> the <code>enable_profile_use</code> property is set to
<code>false</code> <strong>OR</strong> the
<code>ANDROID_PGO_NO_PROFILE_USE</code> build variable is set to
<code>true</code>.</td>
</tr>
<tr>
<td><code>enable_profile_use</code>
</td>
<td>Set to <code>false</code> if profiles should not be used during
build. Can be used during bootstrap to enable profile collection or to
temporarily disable PGO. Default is <code>true</code>.</td>
</tr>
<tr>
<td><code>cflags</code>
</td>
<td>List of additional flags to use during an instrumented build.</td>
</tr>
</table>
<p>Example of a module with PGO:</p>
<pre class="prettyprint">cc_library {
name: "libexample",
srcs: [
"src1.cpp",
"src2.cpp",
],
static: [
"libstatic1",
"libstatic2",
],
shared: [
"libshared1",
]
pgo: {
instrumentation: true,
benchmarks: [
"benchmark1",
"benchmark2",
],
profile_file: "example.profdata",
}
}
</pre>
<p>If the benchmarks <code>benchmark1</code> and <code>benchmark2</code>
exercise representative behavior for libraries <code>libstatic1</code>,
<code>libstatic2</code>, or <code>libshared1</code>, the <code>pgo</code>
property of these libraries can also include the benchmarks. The
<code>defaults</code> module in <code>Android.bp</code> can include a common
<code>pgo</code> specification for a set of libraries to avoid repeating the
same build rules for several modules.</p>
<p>To select different profile files or selectively disable PGO for an
architecture, specify the <code>profile_file</code>,
<code>enable_profile_use</code>, and <code>cflags</code> properties per
architecture. Example (with architecture target in
<strong>bold</strong>):</p>
<pre class="prettyprint">cc_library {
name: "libexample",
srcs: [
"src1.cpp",
"src2.cpp",
],
static: [
"libstatic1",
"libstatic2",
],
shared: [
"libshared1",
],
pgo: {
instrumentation: true,
benchmarks: [
"benchmark1",
"benchmark2",
],
}
<strong>target: {
android_arm: {
pgo: {
profile_file: "example_arm.profdata",
}
},
android_arm64: {
pgo: {
profile_file: "example_arm64.profdata",
}
}
}
}</strong>
</pre>
<p>To resolve references to the profiling runtime library during
instrumentation-based profiling, pass the build flag
<code>-fprofile-generate</code> to the linker. Static libraries instrumented
with PGO, all shared libraries, and any binary that directly depends on the
static library must also be instrumented for PGO. However, such shared
libraries or executables don't need to use PGO profiles, and their
<code>enable_profile_use</code> property can be set to <code>false</code>.
Outside of this restriction, you can apply PGO to any static library, shared
library, or executable.</p>
<h2 id="handling-llvm-profile-files">Handling LLVM profile files</h2>
<p>Executing an instrumented library or executable produces a profile file
named <code>default_<em>unique_id</em>_0.profraw</code> in
<code>/data/local/tmp</code> (where <code><em>unique_id</em></code> is a
numeric hash that is unique to this library). If this file already exists,
the profiling runtime merges the new profile with the old one while writing
the profiles. To change the location of the profile file, set the
<code>LLVM_PROFILE_FILE</code> environment variable at runtime.</p>
<p>The <code><a href=
"https://llvm.org/docs/CommandGuide/llvm-profdata.html">llvm-profdata</a></code>
utility is then used to convert the <code>.profraw</code> file (and possibly
merge multiple <code>.profraw</code> files) to a <code>.profdata</code>
file:</p>
<pre class="prettyprint">
llvm-profdata merge -output=profile.profdata &lt;.profraw and/or .profdata files&gt;</pre>
<p><code><em>profile.profdata</em></code> can then be checked into the source
tree for use during build.</p>
<p>If multiple instrumented binaries/libraries are loaded during a benchmark,
each library generates a separate <code>.profraw</code> file with a separate
unique ID. Typically, all of these files can be merged to a single
<code>.profdata</code> file and used for PGO build. In cases where a library
is exercised by another benchmark, that library must be optimized using
profiles from both the benchmarks. In this situation, the <code>show</code>
option of <code>llvm-profdata</code> is useful:</p>
<pre class="prettyprint">
llvm-profdata merge -output=default_unique_id.profdata default_unique_id_0.profraw
llvm-profdata show -all-functions default_unique_id.profdata</pre>
<p>To map <em>unique_id</em>s to individual libraries, search the
<code>show</code> output for each <em>unique_id</em> for a function name that
is unique to the library.</p>
<h2 id="case-study-pgo-for-art">Case Study: PGO for ART</h2>
<p><em>The case study presents ART as a relatable example; however, it is not
an accurate description of the actual set of libraries profiled for ART or
their interdependencies.</em>
</p>
<p>The <code>dex2oat</code> ahead-of-time compiler in ART depends on
<code>libart-compiler.so</code>, which in turn depends on
<code>libart.so</code>. The ART runtime is implemented mainly in
<code>libart.so</code>. Benchmarks for the compiler and the runtime will be
different:</p>
<table>
<tr>
<th><strong>Benchmark</strong>
</th>
<th><strong>Profiled libraries</strong>
</th>
</tr>
<tr>
<td><code>dex2oat</code>
</td>
<td><code>dex2oat</code> (executable), <code>libart-compiler.so</code>,
<code>libart.so</code></td>
</tr>
<tr>
<td><code>art_runtime</code>
</td>
<td><code>libart.so</code>
</td>
</tr>
</table>
<ol>
<li>Add the following <code>pgo</code> property to <code>dex2oat</code>,
<code>libart-compiler.so</code>:
<pre class="prettyprint"> pgo: {
instrumentation: true,
benchmarks: ["dex2oat",],
profile_file: "dex2oat.profdata",
}</pre>
</li>
<li>Add the following <code>pgo</code> property to <code>libart.so</code>:
<pre class="prettyprint"> pgo: {
instrumentation: true,
benchmarks: ["art_runtime", "dex2oat",],
profile_file: "libart.profdata",
}</pre>
</li>
<li>Create instrumented builds for the <code>dex2oat</code> and
<code>art_runtime</code> benchmarks using:
<pre class="prettyprint"> make ANDROID_PGO_INSTRUMENT=dex2oat
make ANDROID_PGO_INSTRUMENT=art_runtime</pre>
</li>
<p>Alternatively, create a single instrumented build with all libraries
instrumented using:</p>
<pre class="prettyprint"> make ANDROID_PGO_INSTRUMENT=dex2oat,art_runtime
(or)
make ANDROID_PGO_INSTRUMENT=ALL</pre>
<p>The second command builds <strong>all</strong> PGO-enabled modules for
profiling.</p>
<li>Run the benchmarks exercising <code>dex2oat</code> and
<code>art_runtime</code> to obtain:
<ul>
<li>Three <code>.profraw</code> files from <code>dex2oat</code>
(<code>dex2oat_exe.profdata</code>,
<code>dex2oat_libart-compiler.profdata</code>, and
<code>dexeoat_libart.profdata</code>), identified using the method
described in <a href="#handling-llvm-profile-files">Handling LLVM profile
files</a>.</li>
<li>A single <code>art_runtime_libart.profdata</code>.</li>
</ul>
</li>
<li>Produce a common profdata file for <code>dex2oat</code> executable and
<code>libart-compiler.so</code> using:
<pre class="prettyprint">llvm-profdata merge -output=dex2oat.profdata \
dex2oat_exe.profdata dex2oat_libart-compiler.profdata</pre>
</li>
<li>Obtain the profile for <code>libart.so</code> by merging the profiles
from the two benchmarks:
<pre class="prettyprint">llvm-profdata merge -output=libart.profdata \
dex2oat_libart.profdata art_runtime_libart.profdata</pre>
<p>The raw counts for <code>libart.so</code> from the two profiles might be
disparate because the benchmarks differ in the number of test cases and the
duration for which they run. In this case, you can use a weighted merge:</p>
<pre class="prettyprint">llvm-profdata merge -output=libart.profdata \
-weighted-input=2,dex2oat_libart.profdata \
-weighted-input=1,art_runtime_libart.profdata</pre>
<p>The above command assigns twice the weight to the profile from
<code>dex2oat</code>. The actual weight should be determined based on domain
knowledge or experimentation.</p>
</li>
<li>Check the profile files <code>dex2oat.profdata</code> and
<code>libart.profdata</code> into <code>toolchain/pgo-profiles</code> for
use during build.</li>
</ol>
</body>
</html>