| <html devsite> |
| <head> |
| <title>Android 8.0 ART Improvements</title> |
| <meta name="project_path" value="/_project.yaml" /> |
| <meta name="book_path" value="/_book.yaml" /> |
| </head> |
| <body> |
| <!-- |
| Copyright 2017 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <p> |
| The Android runtime (ART) has been improved significantly in the Android 8.0 |
| release. The list below summarizes enhancements device manufacturers can expect |
| in ART. |
| </p> |
| |
| <h2 id="concurrent-compacting-gc">Concurrent compacting garbage collector</h2> |
| |
| <p> |
| As announced at Google I/O, ART features a new concurrent compacting garbage |
| collector (GC) in Android 8.0. This collector compacts the heap every time GC |
| runs and while the app is running, with only one short pause for processing |
| thread roots. Here are its benefits: |
| </p> |
| |
| <ul> |
| <li> |
| GC always compacts the heap: 32% smaller heap sizes on average compared to |
| Android 7.0. |
| </li> |
| <li> |
| Compaction enables thread local bump pointer object allocation: Allocations |
| are 70% faster than in Android 7.0. |
| </li> |
| <li> |
| Offers 85% smaller pause times for the H2 benchmark compared to the Android |
| 7.0 GC. |
| </li> |
| <li> |
| Pause times no longer scale with heap size; apps should be able to use large |
| heaps without worrying about jank. |
| </li> |
| <li>GC implementation detail - Read barriers: |
| <ul> |
| <li> |
| Read barriers are a small amount of work done for each object field read. |
| </li> |
| <li> |
| These are optimized in the compiler, but might slow down some use cases. |
| </li> |
| </ul> |
| </ul> |
| |
| <h2 id="loop-optimizations">Loop optimizations</h2> |
| |
| <p> |
| A wide variety of loop optimizations are employed by ART in the Android 8.0 |
| release: |
| </p> |
| |
| <ul> |
| <li>Bounds check eliminations |
| <ul> |
| <li>Static: ranges are proven to be within bounds at compile-time</li> |
| <li> |
| Dynamic: run-time tests ensure loops stay within bounds (deopt otherwise) |
| </li> |
| </ul> |
| </li> |
| <li>Induction variable eliminations |
| <ul> |
| <li>Remove dead induction</li> |
| <li> |
| Replace induction that is used only after the loop by closed-form |
| expressions |
| </li> |
| </ul> |
| </li> |
| <li> |
| Dead code elimination inside the loop-body, removal of whole loops that |
| become dead |
| </li> |
| <li>Strength reduction</li> |
| <li> |
| Loop transformations: reversal, interchanging, splitting, unrolling, |
| unimodular, etc. |
| </li> |
| <li>SIMDization (also called vectorization)</li> |
| </ul> |
| |
| <p> |
| The loop optimizer resides in its own optimization pass in the ART compiler. |
| Most loop optimizations are similar to optimizations and simplification |
| elsewhere. Challenges arise with some optimizations that rewrite the CFG in a |
| more than usual elaborate way, because most CFG utilities (see nodes.h) focus |
| on building a CFG, not rewriting one. |
| </p> |
| |
| <h2 id="class-hierarchy-analysis">Class hierarchy analysis</h2> |
| |
| <p> |
| ART in Android 8.0 uses Class Hierarchy Analysis (CHA), a compiler optimization |
| that devirtualizes virtual calls into direct calls based on the information |
| generated by analyzing class hierarchies. Virtual calls are expensive since |
| they are implemented around a vtable lookup, and they take a couple of |
| dependent loads. Also virtual calls cannot be inlined. |
| </p> |
| |
| <p>Here is a summary of related enhancements:</p> |
| |
| <ul> |
| <li> |
| Dynamic single-implementation method status updating - At the end of class |
| linking time, when vtable has been populated, ART conducts an entry-by-entry |
| comparison to the vtable of the super class. |
| </li> |
| <li>Compiler optimization - The compiler will take advantage of the |
| single-implementation info of a method. If a method A.foo has |
| single-implementation flag set, compiler will devirtualize the virtual call |
| into a direct call, and further try to inline the direct call as a result. |
| </li> |
| <li> |
| Compiled code invalidation - Also at the end of class linking time when |
| single-implementation info is updated, if method A.foo that previously had |
| single-implementation but that status is now invalidated, all compiled code |
| that depends on the assumption that method A.foo has single-implementation |
| needs to have their compiled code invalidated. |
| </li> |
| <li> |
| Deoptimization - For live compiled code that's on stack, deoptimization will |
| be initiated to force the invalidated compiled code into interpreter mode to |
| guarantee correctness. A new mechanism of deoptimization which is a hybrid of |
| synchronous and asynchronous deoptimization will be used. |
| </li> |
| </ul> |
| |
| <h2 id="inline-caches-in-oat-files">Inline caches in .oat files</h2> |
| |
| <p> |
| ART now employs inline caches and optimizes the call sites for which enough |
| data exists. The inline caches feature records additional runtime information |
| into profiles and uses it to add dynamic optimizations to ahead of time compilation. |
| </p> |
| |
| <h2 id="dexlayout">Dexlayout</h2> |
| |
| <p> |
| Dexlayout is a library introduced in Android 8.0 to analyze dex files and |
| reorder them according to a profile. Dexlayout aims to use runtime profiling |
| information to reorder sections of the dex file during idle maintenance |
| compilation on device. By grouping together parts of the dex file that are |
| often accessed together, programs can have better memory access patterns from |
| improved locality, saving RAM and shortening start up time. |
| </p> |
| |
| <p> |
| Since profile information is currently available only after apps have been run, |
| dexlayout is integrated in dex2oat's on-device compilation during idle |
| maintenance. |
| </p> |
| |
| <h2 id="dex-cache-removal">Dex cache removal</h2> |
| |
| <p> |
| Up to Android 7.0, the DexCache object owned four large arrays, proportional to |
| the number of certain elements in the DexFile, namely: |
| </p> |
| |
| <ul> |
| <li> |
| strings (one reference per DexFile::StringId), |
| </li> |
| <li> |
| types (one reference per DexFile::TypeId), |
| </li> |
| <li> |
| methods (one native pointer per DexFile::MethodId), |
| </li> |
| <li> |
| fields (one native pointer per DexFile::FieldId). |
| </li> |
| </ul> |
| |
| <p> |
| These arrays were used for fast retrieval of objects that we previously |
| resolved. In Android 8.0, all arrays have been removed except the methods array. |
| </p> |
| |
| <h2 id="interpreter-performance">Interpreter performance</h2> |
| |
| <p> |
| Interpreter performance significantly improved in the Android 7.0 release with |
| the introduction of "mterp" - an interpreter featuring a core |
| fetch/decode/interpret mechanism written in assembly language. Mterp is |
| modelled after the fast Dalvik interpreter, and supports arm, arm64, x86, |
| x86_64, mips and mips64. For computational code, Art's mterp is roughly |
| comparable to Dalvik's fast interpreter. However, in some situations it can be |
| significantly - and even dramatically - slower: |
| </p> |
| |
| <ol> |
| <li>Invoke performance.</li> |
| <li> |
| String manipulation, and other heavy users of methods recognized as |
| intrinsics in Dalvik. |
| </li> |
| <li>Higher stack memory usage.</li> |
| </ol> |
| |
| <p>Android 8.0 addresses these issues.</p> |
| |
| <h2 id="more-inlining">More inlining</h2> |
| |
| <p> |
| Since Android 6.0, ART can inline any call within the same dex files, but could |
| only inline leaf methods from different dex files. There were two reasons for |
| this limitation: |
| </p> |
| |
| <ol> |
| <li> |
| Inlining from another dex file requires to use the dex cache of that other |
| dex file, unlike same dex file inlining, which could just re-use the dex cache |
| of the caller. The dex cache is needed in compiled code for a couple of |
| instructions like static calls, string load, or class load. |
| <li> |
| The stack maps are only encoding a method index within the current dex file. |
| </li> |
| </ol> |
| |
| <p>To address these limitations, Android 8.0:</p> |
| |
| <ol> |
| <li>Removes dex cache access from compiled code (also see section "Dex cache |
| removal")</li> |
| <li>Extends stack map encoding.</li> |
| </ol> |
| |
| <h2 id="synchronization-improvements">Synchronization improvements</h2> |
| |
| <p> |
| The ART team tuned the MonitorEnter/MonitorExit code paths, and reduced our |
| reliance on traditional memory barriers on ARMv8, replacing them with newer |
| (acquire/release) instructions where possible. |
| </p> |
| |
| <h2 id="faster-native-methods">Faster native methods</h2> |
| |
| <p> |
| Faster native calls to the Java Native Interface (JNI) are available using |
| the <a class="external" |
| href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/FastNative.java" |
| ><code>@FastNative</code></a> and <a class="external" |
| href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/CriticalNative.java" |
| ><code>@CriticalNative</code></a> annotations. These built-in ART runtime |
| optimizations speed up JNI transitions and replace the now deprecated |
| <em>!bang JNI</em> notation. The annotations have no effect on non-native |
| methods and are only available to platform Java Language code on the |
| <code>bootclasspath</code> (no Play Store updates). |
| </p> |
| |
| <p> |
| The <code>@FastNative</code> annotation supports non-static methods. Use this |
| if a method accesses a <code>jobject</code> as a parameter or return value. |
| </p> |
| |
| <p> |
| The <code>@CriticalNative</code> annotation provides an even faster way to run |
| native methods, with the following restrictions: |
| </p> |
| |
| <ul> |
| <li> |
| Methods must be static—no objects for parameters, return values, or an |
| implicit <code>this</code>. |
| </li> |
| <li>Only primitive types are passed to the native method.</li> |
| <li> |
| The native method does not use the <code>JNIEnv</code> and |
| <code>jclass</code> parameters in its function definition. |
| </li> |
| <li> |
| The method must be registered with <code>RegisterNatives</code> instead of |
| relying on dynamic JNI linking. |
| </li> |
| </ul> |
| |
| <aside class="caution"> |
| <p> |
| The <code>@FastNative</code> and <code>@CriticalNative</code> annotations |
| disable garbage collection while executing a native method. Do not use with |
| long-running methods, including usually-fast, but generally unbounded, |
| methods. |
| </p> |
| <p> |
| Pauses to the garbage collection may cause deadlock. Do not acquire locks |
| during a fast native call if the locks haven't been released locally (i.e. |
| before returning to managed code). This does not apply to regular JNI calls |
| since ART considers the executing native code as suspended. |
| </p> |
| </aside> |
| |
| <p> |
| <code>@FastNative</code> can improve native method performance up to 3x, and |
| <code>@CriticalNative</code> up to 5x. For example, a JNI transition measured |
| on a Nexus 6P device: |
| </p> |
| |
| <table> |
| <tr> |
| <th>Java Native Interface (JNI) invocation</th> |
| <th>Execution time (in nanoseconds)</th> |
| </tr> |
| <tr> |
| <td>Regular JNI</td> |
| <td>115</td> |
| </tr> |
| <tr> |
| <td><em>!bang JNI</em></td> |
| <td>60</td> |
| </tr> |
| <tr> |
| <td><code>@FastNative</code></td> |
| <td>35</td> |
| </tr> |
| <tr> |
| <td><code>@CriticalNative</code></td> |
| <td>25</td> |
| </tr> |
| </table> |
| |
| </body> |
| </html> |