blob: 9317e3362558d4e57a0be541c49a071f1675a836 [file] [log] [blame]
<html devsite>
<head>
<title>Android 8.0 ART Improvements</title>
<meta name="project_path" value="/_project.yaml" />
<meta name="book_path" value="/_book.yaml" />
</head>
<body>
<!--
Copyright 2017 The Android Open Source Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p>
The Android runtime (ART) has been improved significantly in the Android 8.0
release. The list below summarizes enhancements device manufacturers can expect
in ART.
</p>
<h2 id="concurrent-compacting-gc">Concurrent compacting garbage collector</h2>
<p>
As announced at Google I/O, ART features a new concurrent compacting garbage
collector (GC) in Android 8.0. This collector compacts the heap every time GC
runs and while the app is running, with only one short pause for processing
thread roots. Here are its benefits:
</p>
<ul>
<li>
GC always compacts the heap: 32% smaller heap sizes on average compared to
Android 7.0.
</li>
<li>
Compaction enables thread local bump pointer object allocation: Allocations
are 70% faster than in Android 7.0.
</li>
<li>
Offers 85% smaller pause times for the H2 benchmark compared to the Android
7.0 GC.
</li>
<li>
Pause times no longer scale with heap size; apps should be able to use large
heaps without worrying about jank.
</li>
<li>GC implementation detail - Read barriers:
<ul>
<li>
Read barriers are a small amount of work done for each object field read.
</li>
<li>
These are optimized in the compiler, but might slow down some use cases.
</li>
</ul>
</ul>
<h2 id="loop-optimizations">Loop optimizations</h2>
<p>
A wide variety of loop optimizations are employed by ART in the Android 8.0
release:
</p>
<ul>
<li>Bounds check eliminations
<ul>
<li>Static: ranges are proven to be within bounds at compile-time</li>
<li>
Dynamic: run-time tests ensure loops stay within bounds (deopt otherwise)
</li>
</ul>
</li>
<li>Induction variable eliminations
<ul>
<li>Remove dead induction</li>
<li>
Replace induction that is used only after the loop by closed-form
expressions
</li>
</ul>
</li>
<li>
Dead code elimination inside the loop-body, removal of whole loops that
become dead
</li>
<li>Strength reduction</li>
<li>
Loop transformations: reversal, interchanging, splitting, unrolling,
unimodular, etc.
</li>
<li>SIMDization (also called vectorization)</li>
</ul>
<p>
The loop optimizer resides in its own optimization pass in the ART compiler.
Most loop optimizations are similar to optimizations and simplification
elsewhere. Challenges arise with some optimizations that rewrite the CFG in a
more than usual elaborate way, because most CFG utilities (see nodes.h) focus
on building a CFG, not rewriting one.
</p>
<h2 id="class-hierarchy-analysis">Class hierarchy analysis</h2>
<p>
ART in Android 8.0 uses Class Hierarchy Analysis (CHA), a compiler optimization
that devirtualizes virtual calls into direct calls based on the information
generated by analyzing class hierarchies. Virtual calls are expensive since
they are implemented around a vtable lookup, and they take a couple of
dependent loads. Also virtual calls cannot be inlined.
</p>
<p>Here is a summary of related enhancements:</p>
<ul>
<li>
Dynamic single-implementation method status updating - At the end of class
linking time, when vtable has been populated, ART conducts an entry-by-entry
comparison to the vtable of the super class.
</li>
<li>Compiler optimization - The compiler will take advantage of the
single-implementation info of a method. If a method A.foo has
single-implementation flag set, compiler will devirtualize the virtual call
into a direct call, and further try to inline the direct call as a result.
</li>
<li>
Compiled code invalidation - Also at the end of class linking time when
single-implementation info is updated, if method A.foo that previously had
single-implementation but that status is now invalidated, all compiled code
that depends on the assumption that method A.foo has single-implementation
needs to have their compiled code invalidated.
</li>
<li>
Deoptimization - For live compiled code that's on stack, deoptimization will
be initiated to force the invalidated compiled code into interpreter mode to
guarantee correctness. A new mechanism of deoptimization which is a hybrid of
synchronous and asynchronous deoptimization will be used.
</li>
</ul>
<h2 id="inline-caches-in-oat-files">Inline caches in .oat files</h2>
<p>
ART now employs inline caches and optimizes the call sites for which enough
data exists. The inline caches feature records additional runtime information
into profiles and uses it to add dynamic optimizations to ahead of time compilation.
</p>
<h2 id="dexlayout">Dexlayout</h2>
<p>
Dexlayout is a library introduced in Android 8.0 to analyze dex files and
reorder them according to a profile. Dexlayout aims to use runtime profiling
information to reorder sections of the dex file during idle maintenance
compilation on device. By grouping together parts of the dex file that are
often accessed together, programs can have better memory access patterns from
improved locality, saving RAM and shortening start up time.
</p>
<p>
Since profile information is currently available only after apps have been run,
dexlayout is integrated in dex2oat's on-device compilation during idle
maintenance.
</p>
<h2 id="dex-cache-removal">Dex cache removal</h2>
<p>
Up to Android 7.0, the DexCache object owned four large arrays, proportional to
the number of certain elements in the DexFile, namely:
</p>
<ul>
<li>
strings (one reference per DexFile::StringId),
</li>
<li>
types (one reference per DexFile::TypeId),
</li>
<li>
methods (one native pointer per DexFile::MethodId),
</li>
<li>
fields (one native pointer per DexFile::FieldId).
</li>
</ul>
<p>
These arrays were used for fast retrieval of objects that we previously
resolved. In Android 8.0, all arrays have been removed except the methods array.
</p>
<h2 id="interpreter-performance">Interpreter performance</h2>
<p>
Interpreter performance significantly improved in the Android 7.0 release with
the introduction of "mterp" - an interpreter featuring a core
fetch/decode/interpret mechanism written in assembly language. Mterp is
modelled after the fast Dalvik interpreter, and supports arm, arm64, x86,
x86_64, mips and mips64. For computational code, Art's mterp is roughly
comparable to Dalvik's fast interpreter. However, in some situations it can be
significantly - and even dramatically - slower:
</p>
<ol>
<li>Invoke performance.</li>
<li>
String manipulation, and other heavy users of methods recognized as
intrinsics in Dalvik.
</li>
<li>Higher stack memory usage.</li>
</ol>
<p>Android 8.0 addresses these issues.</p>
<h2 id="more-inlining">More inlining</h2>
<p>
Since Android 6.0, ART can inline any call within the same dex files, but could
only inline leaf methods from different dex files. There were two reasons for
this limitation:
</p>
<ol>
<li>
Inlining from another dex file requires to use the dex cache of that other
dex file, unlike same dex file inlining, which could just re-use the dex cache
of the caller. The dex cache is needed in compiled code for a couple of
instructions like static calls, string load, or class load.
<li>
The stack maps are only encoding a method index within the current dex file.
</li>
</ol>
<p>To address these limitations, Android 8.0:</p>
<ol>
<li>Removes dex cache access from compiled code (also see section "Dex cache
removal")</li>
<li>Extends stack map encoding.</li>
</ol>
<h2 id="synchronization-improvements">Synchronization improvements</h2>
<p>
The ART team tuned the MonitorEnter/MonitorExit code paths, and reduced our
reliance on traditional memory barriers on ARMv8, replacing them with newer
(acquire/release) instructions where possible.
</p>
<h2 id="faster-native-methods">Faster native methods</h2>
<p>
Faster native calls to the Java Native Interface (JNI) are available using
the <a class="external"
href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/FastNative.java"
><code>@FastNative</code></a> and <a class="external"
href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/CriticalNative.java"
><code>@CriticalNative</code></a> annotations. These built-in ART runtime
optimizations speed up JNI transitions and replace the now deprecated
<em>!bang&nbsp;JNI</em> notation. The annotations have no effect on non-native
methods and are only available to platform Java Language code on the
<code>bootclasspath</code> (no Play Store updates).
</p>
<p>
The <code>@FastNative</code> annotation supports non-static methods. Use this
if a method accesses a <code>jobject</code> as a parameter or return value.
</p>
<p>
The <code>@CriticalNative</code> annotation provides an even faster way to run
native methods, with the following restrictions:
</p>
<ul>
<li>
Methods must be static—no objects for parameters, return values, or an
implicit <code>this</code>.
</li>
<li>Only primitive types are passed to the native method.</li>
<li>
The native method does not use the <code>JNIEnv</code> and
<code>jclass</code> parameters in its function definition.
</li>
<li>
The method must be registered with <code>RegisterNatives</code> instead of
relying on dynamic JNI linking.
</li>
</ul>
<aside class="caution">
<p>
The <code>@FastNative</code> and <code>@CriticalNative</code> annotations
disable garbage collection while executing a native method. Do not use with
long-running methods, including usually-fast, but generally unbounded,
methods.
</p>
<p>
Pauses to the garbage collection may cause deadlock. Do not acquire locks
during a fast native call if the locks haven't been released locally (i.e.
before returning to managed code). This does not apply to regular JNI calls
since ART considers the executing native code as suspended.
</p>
</aside>
<p>
<code>@FastNative</code> can improve native method performance up to 3x, and
<code>@CriticalNative</code> up to 5x. For example, a JNI transition measured
on a Nexus 6P device:
</p>
<table>
<tr>
<th>Java Native Interface (JNI) invocation</th>
<th>Execution time (in nanoseconds)</th>
</tr>
<tr>
<td>Regular JNI</td>
<td>115</td>
</tr>
<tr>
<td><em>!bang JNI</em></td>
<td>60</td>
</tr>
<tr>
<td><code>@FastNative</code></td>
<td>35</td>
</tr>
<tr>
<td><code>@CriticalNative</code></td>
<td>25</td>
</tr>
</table>
</body>
</html>