| <html devsite> |
| <head> |
| <title>Using Profile-Guided Optimization (PGO)</title> |
| <meta name="project_path" value="/_project.yaml"> |
| <meta name="book_path" value="/_book.yaml"> |
| </head> |
| |
| <body> |
| <!-- |
| Copyright 2018 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| |
| <p>The Android build system supports using Clang's <a href= |
| "https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization">profile-guided |
| optimization (PGO)</a> on native Android modules that have <a href= |
| "https://android.googlesource.com/platform/build/soong/">blueprint</a> build |
| rules. This page describes Clang PGO, how to continually generate and update |
| profiles used for PGO, and how to integrate PGO with the build system (with |
| use case).</p> |
| |
| |
| <h2 id="about-clang-pgo">About Clang PGO</h2> |
| |
| |
| <p>Clang can perform profile-guided optimization using two types of |
| profiles:</p> |
| |
| |
| <ul> |
| <li><strong>Instrumentation-based profiles</strong> are generated from an |
| instrumented target program. These profiles are detailed and impose a high |
| runtime overhead.</li> |
| |
| |
| <li><strong>Sampling-based profiles</strong> are typically produced by |
| sampling hardware counters. They impose a low runtime overhead, and can be |
| collected without any instrumentation or modification to the binary. They |
| are less detailed than instrumentation-based profiles.</li> |
| </ul> |
| |
| |
| <p>All profiles should be generated from a representative workload that |
| exercises the typical behavior of the application. While Clang supports both |
| AST-based (<code>-fprofile-instr-generate</code>) and LLVM IR-based |
| (<code>-fprofile-generate)</code>, Android supports only LLVM IR-based for |
| instrumentation-based PGO.</p> |
| |
| |
| <p>The following flags are needed to build for profile collection:</p> |
| |
| |
| <ul> |
| <li><code>-fprofile-generate</code> for IR-based instrumentation. With this |
| option, the backend uses a weighted minimal spanning tree approach to |
| reduce the number of instrumentation points and optimize their placement to |
| low-weight edges (use this option for the link step as well). The Clang |
| driver automatically passes the profiling runtime |
| (<code>libclang_rt.profile-<em>arch</em>-android.a</code>) to the linker. |
| This library contains routines to write the profiles to disk upon program |
| exit.</li> |
| |
| |
| <li><code>-gline-tables-only</code> for sampling-based profile collection |
| to generate minimal debug information.</li> |
| </ul> |
| |
| |
| <p>A profile can be used for PGO using |
| <code>-fprofile-instr-use=<em>pathname</em></code> or |
| <code>-fprofile-sample-use=<em>pathname</em></code> for instrumentation-based |
| and sampling-based profiles respectively.</p> |
| |
| |
| <p><strong>Note:</strong> As changes are made to the code, if Clang can no |
| longer use the profile data it generates a |
| <code>-Wprofile-instr-out-of-date</code> warning.</p> |
| |
| |
| <h2 id="using-pgo">Using PGO</h2> |
| |
| |
| <p>Using PGO involves the following steps:</p> |
| |
| |
| <ol> |
| <li>Build the library/executable with instrumentation by passing |
| <code>-fprofile-generate</code> to the compiler and linker.</li> |
| |
| |
| <li>Collect profiles by running a representative workload on the |
| instrumented binary.</li> |
| |
| |
| <li>Post-process the profiles using the <code>llvm-profdata</code> utility |
| (for details, see <a href="#handling-llvm-profile-files">Handling LLVM |
| profile files</a>).</li> |
| |
| |
| <li>Use the profiles to apply PGO by passing |
| <code>-fprofile-use=<>.profdata</code> to the compiler and |
| linker.</li> |
| </ol> |
| |
| |
| <p>For PGO in Android, profiles should be collected offline and checked in |
| alongside the code to ensure reproducible builds. The profiles can be used as |
| code evolves, but must be regenerated periodically (or whenever Clang warns |
| that the profiles are stale).</p> |
| |
| |
| <h3 id="collecting-profiles">Collecting profiles</h3> |
| |
| |
| <p>Clang can use profiles collected by running benchmarks using an |
| instrumented build of the library or by sampling hardware counters when the |
| benchmark is run. At this time, Android does not support using sampling-based |
| profile collection, so you must collect profiles using an instrumented |
| build:</p> |
| |
| |
| <ol> |
| <li>Identify a benchmark and the set of libraries collectively exercised by |
| that benchmark.</li> |
| |
| |
| <li>Add <code>pgo</code> properties to the benchmark and libraries (details |
| below).</li> |
| |
| |
| <li>Produce an Android build with an instrumented copy of these libraries |
| using: |
| |
| <pre class="prettyprint">make ANDROID_PGO_INSTRUMENT=benchmark</pre> |
| </li> |
| </ol> |
| |
| |
| <p><code><em>benchmark</em></code> is a placeholder that identifies the |
| collection of libraries instrumented during build. The actual representative |
| inputs (and possibly another executable that links against a library being |
| benchmarked) are not specific to PGO and are beyond the scope of this |
| document.</p> |
| |
| |
| <ol> |
| <li>Flash or sync the instrumented build on a device.</li> |
| |
| |
| <li>Run the benchmark to collect profiles.</li> |
| |
| |
| <li>Use the <code>llvm-profdata</code> tool (discussed below) to |
| post-process the profiles and make them ready to be checked into the source |
| tree.</li> |
| </ol> |
| |
| |
| <h3 id="using-profiles-during-build">Using profiles during build</h3> |
| |
| |
| <p>Check the profiles into <code>toolchain/pgo-profiles</code> in an Android |
| tree. The name should match what is specified in the |
| <code>profile_file</code> sub-property of the <code>pgo</code> property for |
| the library. The build system automatically passes the profile file to Clang |
| when building the library. The <code>ANDROID_PGO_DISABLE_PROFILE_USE</code> |
| environment variable can be set to <strong><code>true</code></strong> to |
| temporarily disable PGO and measure its performance benefit.</p> |
| |
| |
| <p>To specify additional product-specific profile directories, append them to |
| the <code>PGO_ADDITIONAL_PROFILE_DIRECTORIES</code> make variable in a |
| <code>BoardConfig.mk</code>. If additional paths are specified, profiles in |
| these paths override those in <code>toolchain/pgo-profiles</code>.</p> |
| |
| |
| <p>When generating a release image using the <code>dist</code> target to |
| <code>make</code>, the build system writes the names of missing profile files |
| to <code>$DIST_DIR/pgo_profile_file_missing.txt</code>. You can check this |
| file to see what profile files were accidentally dropped (which silently |
| disables PGO).</p> |
| |
| |
| <h2 id="enabling-pgo-in-android-bp-files">Enabling PGO in Android.bp |
| files</h2> |
| |
| |
| <p>To enable PGO in <code>Android.bp</code> files for native modules, simply |
| specify the <code>pgo</code> property. This property has the following |
| sub-properties:</p> |
| |
| |
| <table> |
| <tr> |
| <th><strong>Property</strong> |
| </th> |
| |
| <th><strong>Description</strong> |
| </th> |
| </tr> |
| |
| |
| <tr> |
| <td><code>instrumentation</code> |
| </td> |
| |
| <td>Set to <code>true</code> for PGO using instrumentation. Default is |
| <code>false</code>.</td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>sampling</code> |
| </td> |
| |
| <td><strong>Currently unsupported.</strong> Set to <code>true</code> for |
| PGO using sampling. Default is <code>false</code>.</td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>benchmarks</code> |
| </td> |
| |
| <td>List of strings. This module is built for profiling if any benchmark |
| in the list is specified in the <code>ANDROID_PGO_INSTRUMENT</code> build |
| option.</td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>profile_file</code> |
| </td> |
| |
| <td>Profile file (relative to <code>toolchain/pgo-profile</code>) to use |
| with PGO. The build warns that this file doesn't exist by adding this |
| file to <code>$DIST_DIR/pgo_profile_file_missing.txt</code> |
| <em>unless</em> the <code>enable_profile_use</code> property is set to |
| <code>false</code> <strong>OR</strong> the |
| <code>ANDROID_PGO_NO_PROFILE_USE</code> build variable is set to |
| <code>true</code>.</td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>enable_profile_use</code> |
| </td> |
| |
| <td>Set to <code>false</code> if profiles should not be used during |
| build. Can be used during bootstrap to enable profile collection or to |
| temporarily disable PGO. Default is <code>true</code>.</td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>cflags</code> |
| </td> |
| |
| <td>List of additional flags to use during an instrumented build.</td> |
| </tr> |
| </table> |
| |
| |
| <p>Example of a module with PGO:</p> |
| |
| <pre class="prettyprint">cc_library { |
| name: "libexample", |
| srcs: [ |
| "src1.cpp", |
| "src2.cpp", |
| ], |
| static: [ |
| "libstatic1", |
| "libstatic2", |
| ], |
| shared: [ |
| "libshared1", |
| ] |
| pgo: { |
| instrumentation: true, |
| benchmarks: [ |
| "benchmark1", |
| "benchmark2", |
| ], |
| profile_file: "example.profdata", |
| } |
| } |
| </pre> |
| |
| |
| |
| <p>If the benchmarks <code>benchmark1</code> and <code>benchmark2</code> |
| exercise representative behavior for libraries <code>libstatic1</code>, |
| <code>libstatic2</code>, or <code>libshared1</code>, the <code>pgo</code> |
| property of these libraries can also include the benchmarks. The |
| <code>defaults</code> module in <code>Android.bp</code> can include a common |
| <code>pgo</code> specification for a set of libraries to avoid repeating the |
| same build rules for several modules.</p> |
| |
| |
| <p>To select different profile files or selectively disable PGO for an |
| architecture, specify the <code>profile_file</code>, |
| <code>enable_profile_use</code>, and <code>cflags</code> properties per |
| architecture. Example (with architecture target in |
| <strong>bold</strong>):</p> |
| |
| <pre class="prettyprint">cc_library { |
| name: "libexample", |
| srcs: [ |
| "src1.cpp", |
| "src2.cpp", |
| ], |
| static: [ |
| "libstatic1", |
| "libstatic2", |
| ], |
| shared: [ |
| "libshared1", |
| ], |
| pgo: { |
| instrumentation: true, |
| benchmarks: [ |
| "benchmark1", |
| "benchmark2", |
| ], |
| } |
| |
| <strong>target: { |
| android_arm: { |
| pgo: { |
| profile_file: "example_arm.profdata", |
| } |
| }, |
| android_arm64: { |
| pgo: { |
| profile_file: "example_arm64.profdata", |
| } |
| } |
| } |
| }</strong> |
| </pre> |
| |
| |
| <p>To resolve references to the profiling runtime library during |
| instrumentation-based profiling, pass the build flag |
| <code>-fprofile-generate</code> to the linker. Static libraries instrumented |
| with PGO, all shared libraries, and any binary that directly depends on the |
| static library must also be instrumented for PGO. However, such shared |
| libraries or executables don't need to use PGO profiles, and their |
| <code>enable_profile_use</code> property can be set to <code>false</code>. |
| Outside of this restriction, you can apply PGO to any static library, shared |
| library, or executable.</p> |
| |
| |
| <h2 id="handling-llvm-profile-files">Handling LLVM profile files</h2> |
| |
| |
| <p>Executing an instrumented library or executable produces a profile file |
| named <code>default_<em>unique_id</em>_0.profraw</code> in |
| <code>/data/local/tmp</code> (where <code><em>unique_id</em></code> is a |
| numeric hash that is unique to this library). If this file already exists, |
| the profiling runtime merges the new profile with the old one while writing |
| the profiles. To change the location of the profile file, set the |
| <code>LLVM_PROFILE_FILE</code> environment variable at runtime.</p> |
| |
| |
| <p>The <code><a href= |
| "https://llvm.org/docs/CommandGuide/llvm-profdata.html">llvm-profdata</a></code> |
| utility is then used to convert the <code>.profraw</code> file (and possibly |
| merge multiple <code>.profraw</code> files) to a <code>.profdata</code> |
| file:</p> |
| |
| <pre class="prettyprint"> |
| llvm-profdata merge -output=profile.profdata <.profraw and/or .profdata files></pre> |
| |
| <p><code><em>profile.profdata</em></code> can then be checked into the source |
| tree for use during build.</p> |
| |
| |
| <p>If multiple instrumented binaries/libraries are loaded during a benchmark, |
| each library generates a separate <code>.profraw</code> file with a separate |
| unique ID. Typically, all of these files can be merged to a single |
| <code>.profdata</code> file and used for PGO build. In cases where a library |
| is exercised by another benchmark, that library must be optimized using |
| profiles from both the benchmarks. In this situation, the <code>show</code> |
| option of <code>llvm-profdata</code> is useful:</p> |
| |
| <pre class="prettyprint"> |
| llvm-profdata merge -output=default_unique_id.profdata default_unique_id_0.profraw |
| llvm-profdata show -all-functions default_unique_id.profdata</pre> |
| |
| <p>To map <em>unique_id</em>s to individual libraries, search the |
| <code>show</code> output for each <em>unique_id</em> for a function name that |
| is unique to the library.</p> |
| |
| |
| <h2 id="case-study-pgo-for-art">Case Study: PGO for ART</h2> |
| |
| |
| <p><em>The case study presents ART as a relatable example; however, it is not |
| an accurate description of the actual set of libraries profiled for ART or |
| their interdependencies.</em> |
| </p> |
| |
| |
| <p>The <code>dex2oat</code> ahead-of-time compiler in ART depends on |
| <code>libart-compiler.so</code>, which in turn depends on |
| <code>libart.so</code>. The ART runtime is implemented mainly in |
| <code>libart.so</code>. Benchmarks for the compiler and the runtime will be |
| different:</p> |
| |
| |
| <table> |
| <tr> |
| <th><strong>Benchmark</strong> |
| </th> |
| |
| <th><strong>Profiled libraries</strong> |
| </th> |
| </tr> |
| |
| |
| <tr> |
| <td><code>dex2oat</code> |
| </td> |
| |
| <td><code>dex2oat</code> (executable), <code>libart-compiler.so</code>, |
| <code>libart.so</code></td> |
| </tr> |
| |
| |
| <tr> |
| <td><code>art_runtime</code> |
| </td> |
| |
| <td><code>libart.so</code> |
| </td> |
| </tr> |
| </table> |
| |
| |
| <ol> |
| <li>Add the following <code>pgo</code> property to <code>dex2oat</code>, |
| <code>libart-compiler.so</code>: |
| |
| <pre class="prettyprint"> pgo: { |
| instrumentation: true, |
| benchmarks: ["dex2oat",], |
| profile_file: "dex2oat.profdata", |
| }</pre> |
| </li> |
| |
| <li>Add the following <code>pgo</code> property to <code>libart.so</code>: |
| |
| <pre class="prettyprint"> pgo: { |
| instrumentation: true, |
| benchmarks: ["art_runtime", "dex2oat",], |
| profile_file: "libart.profdata", |
| }</pre> |
| </li> |
| |
| <li>Create instrumented builds for the <code>dex2oat</code> and |
| <code>art_runtime</code> benchmarks using: |
| |
| <pre class="prettyprint"> make ANDROID_PGO_INSTRUMENT=dex2oat |
| make ANDROID_PGO_INSTRUMENT=art_runtime</pre> |
| </li> |
| |
| |
| <p>Alternatively, create a single instrumented build with all libraries |
| instrumented using:</p> |
| |
| <pre class="prettyprint"> make ANDROID_PGO_INSTRUMENT=dex2oat,art_runtime |
| (or) |
| make ANDROID_PGO_INSTRUMENT=ALL</pre> |
| |
| <p>The second command builds <strong>all</strong> PGO-enabled modules for |
| profiling.</p> |
| |
| <li>Run the benchmarks exercising <code>dex2oat</code> and |
| <code>art_runtime</code> to obtain: |
| |
| <ul> |
| |
| <li>Three <code>.profraw</code> files from <code>dex2oat</code> |
| (<code>dex2oat_exe.profdata</code>, |
| <code>dex2oat_libart-compiler.profdata</code>, and |
| <code>dexeoat_libart.profdata</code>), identified using the method |
| described in <a href="#handling-llvm-profile-files">Handling LLVM profile |
| files</a>.</li> |
| |
| <li>A single <code>art_runtime_libart.profdata</code>.</li> |
| </ul> |
| </li> |
| |
| <li>Produce a common profdata file for <code>dex2oat</code> executable and |
| <code>libart-compiler.so</code> using: |
| |
| <pre class="prettyprint">llvm-profdata merge -output=dex2oat.profdata \ |
| dex2oat_exe.profdata dex2oat_libart-compiler.profdata</pre> |
| </li> |
| |
| <li>Obtain the profile for <code>libart.so</code> by merging the profiles |
| from the two benchmarks: |
| |
| <pre class="prettyprint">llvm-profdata merge -output=libart.profdata \ |
| dex2oat_libart.profdata art_runtime_libart.profdata</pre> |
| |
| <p>The raw counts for <code>libart.so</code> from the two profiles might be |
| disparate because the benchmarks differ in the number of test cases and the |
| duration for which they run. In this case, you can use a weighted merge:</p> |
| |
| <pre class="prettyprint">llvm-profdata merge -output=libart.profdata \ |
| -weighted-input=2,dex2oat_libart.profdata \ |
| -weighted-input=1,art_runtime_libart.profdata</pre> |
| |
| <p>The above command assigns twice the weight to the profile from |
| <code>dex2oat</code>. The actual weight should be determined based on domain |
| knowledge or experimentation.</p> |
| </li> |
| |
| <li>Check the profile files <code>dex2oat.profdata</code> and |
| <code>libart.profdata</code> into <code>toolchain/pgo-profiles</code> for |
| use during build.</li> |
| </ol> |
| </body> |
| </html> |