en/devices/tech/dalvik/improvements.html - platform/docs/source.android.com - Git at Google

 <html devsite>
   <head>
     <title>Android 8.0 ART Improvements</title>
     <meta name="project_path" value="/_project.yaml" />
     <meta name="book_path" value="/_book.yaml" />
   </head>
   <body>
   <!--
       Copyright 2017 The Android Open Source Project

       Licensed under the Apache License, Version 2.0 (the "License");
       you may not use this file except in compliance with the License.
       You may obtain a copy of the License at

           http://www.apache.org/licenses/LICENSE-2.0

       Unless required by applicable law or agreed to in writing, software
       distributed under the License is distributed on an "AS IS" BASIS,
       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
       See the License for the specific language governing permissions and
       limitations under the License.
   -->

 <p>
   The Android runtime (ART) has been improved significantly in the Android 8.0
   release. The list below summarizes enhancements device manufacturers can expect
   in ART.
 </p>

 <h2 id="concurrent-compacting-gc">Concurrent compacting garbage collector</h2>

 <p>
   As announced at Google I/O, ART features a new concurrent compacting garbage
   collector (GC) in Android 8.0. This collector compacts the heap every time GC
   runs and while the app is running, with only one short pause for processing
   thread roots. Here are its benefits:
 </p>

 <ul>
   <li>
     GC always compacts the heap: 32% smaller heap sizes on average compared to
     Android 7.0.
   </li>
   <li>
     Compaction enables thread local bump pointer object allocation: Allocations
     are 70% faster than in Android 7.0.
   </li>
   <li>
     Offers 85% smaller pause times for the H2 benchmark compared to the Android
     7.0 GC.
   </li>
   <li>
     Pause times no longer scale with heap size; apps should be able to use large
     heaps without worrying about jank.
   </li>
   <li>GC implementation detail - Read barriers:
     <ul>
       <li>
         Read barriers are a small amount of work done for each object field read.
       </li>
       <li>
         These are optimized in the compiler, but might slow down some use cases.
       </li>
     </ul>
 </ul>

 <h2 id="loop-optimizations">Loop optimizations</h2>

 <p>
   A wide variety of loop optimizations are employed by ART in the Android 8.0
   release:
 </p>

 <ul>
   <li>Bounds check eliminations
     <ul>
       <li>Static: ranges are proven to be within bounds at compile-time</li>
       <li>
         Dynamic: run-time tests ensure loops stay within bounds (deopt otherwise)
       </li>
     </ul>
   </li>
   <li>Induction variable eliminations
     <ul>
       <li>Remove dead induction</li>
       <li>
         Replace induction that is used only after the loop by closed-form
         expressions
       </li>
     </ul>
   </li>
   <li>
     Dead code elimination inside the loop-body, removal of whole loops that
     become dead
   </li>
   <li>Strength reduction</li>
   <li>
     Loop transformations: reversal, interchanging, splitting, unrolling,
     unimodular, etc.
   </li>
   <li>SIMDization (also called vectorization)</li>
 </ul>

 <p>
   The loop optimizer resides in its own optimization pass in the ART compiler.
   Most loop optimizations are similar to optimizations and simplification
   elsewhere. Challenges arise with some optimizations that rewrite the CFG in a
   more than usual elaborate way, because most CFG utilities (see nodes.h) focus
   on building a CFG, not rewriting one.
 </p>

 <h2 id="class-hierarchy-analysis">Class hierarchy analysis</h2>

 <p>
   ART in Android 8.0 uses Class Hierarchy Analysis (CHA), a compiler optimization
   that devirtualizes virtual calls into direct calls based on the information
   generated by analyzing class hierarchies. Virtual calls are expensive since
   they are implemented around a vtable lookup, and they take a couple of
   dependent loads. Also virtual calls cannot be inlined.
 </p>

 <p>Here is a summary of related enhancements:</p>

 <ul>
   <li>
     Dynamic single-implementation method status updating - At the end of class
     linking time, when vtable has been populated, ART conducts an entry-by-entry
     comparison to the vtable of the super class.
   </li>
   <li>Compiler optimization - The compiler will take advantage of the
     single-implementation info of a method. If a method A.foo has
     single-implementation flag set, compiler will devirtualize the virtual call
     into a direct call, and further try to inline the direct call as a result.
   </li>
   <li>
     Compiled code invalidation - Also at the end of class linking time when
     single-implementation info is updated, if method A.foo that previously had
     single-implementation but that status is now invalidated, all compiled code
     that depends on the assumption that method A.foo has single-implementation
     needs to have their compiled code invalidated.
   </li>
   <li>
     Deoptimization - For live compiled code that's on stack, deoptimization will
     be initiated to force the invalidated compiled code into interpreter mode to
     guarantee correctness. A new mechanism of deoptimization which is a hybrid of
     synchronous and asynchronous deoptimization will be used.
   </li>
 </ul>

 <h2 id="inline-caches-in-oat-files">Inline caches in .oat files</h2>

 <p>
   ART now employs inline caches and optimizes the call sites for which enough
   data exists. The inline caches feature records additional runtime information
   into profiles and uses it to add dynamic optimizations to ahead of time compilation.
 </p>

 <h2 id="dexlayout">Dexlayout</h2>

 <p>
   Dexlayout is a library introduced in Android 8.0 to analyze dex files and
   reorder them according to a profile. Dexlayout aims to use runtime profiling
   information to reorder sections of the dex file during idle maintenance
   compilation on device. By grouping together parts of the dex file that are
   often accessed together, programs can have better memory access patterns from
   improved locality, saving RAM and shortening start up time.
 </p>

 <p>
   Since profile information is currently available only after apps have been run,
   dexlayout is integrated in dex2oat's on-device compilation during idle
   maintenance.
 </p>

 <h2 id="dex-cache-removal">Dex cache removal</h2>

 <p>
   Up to Android 7.0, the DexCache object owned four large arrays, proportional to
   the number of certain elements in the DexFile, namely:
 </p>

 <ul>
   <li>
    strings (one reference per DexFile::StringId),
   </li>
   <li>
    types (one reference per DexFile::TypeId),
   </li>
   <li>
    methods (one native pointer per DexFile::MethodId),
   </li>
   <li>
    fields (one native pointer per DexFile::FieldId).
   </li>
 </ul>

 <p>
   These arrays were used for fast retrieval of objects that we previously
   resolved. In Android 8.0, all arrays have been removed except the methods array.
 </p>

 <h2 id="interpreter-performance">Interpreter performance</h2>

 <p>
   Interpreter performance significantly improved in the Android 7.0 release with
   the introduction of "mterp" - an interpreter featuring a core
   fetch/decode/interpret mechanism written in assembly language.  Mterp is
   modelled after the fast Dalvik interpreter, and supports arm, arm64, x86,
   x86_64, mips and mips64.  For computational code, Art's mterp is roughly
   comparable to Dalvik's fast interpreter.  However, in some situations it can be
   significantly - and even dramatically - slower:
 </p>

 <ol>
   <li>Invoke performance.</li>
   <li>
     String manipulation, and other heavy users of methods recognized as
     intrinsics in Dalvik.
   </li>
   <li>Higher stack memory usage.</li>
 </ol>

 <p>Android 8.0 addresses these issues.</p>

 <h2 id="more-inlining">More inlining</h2>

 <p>
   Since Android 6.0, ART can inline any call within the same dex files, but could
   only inline leaf methods from different dex files. There were two reasons for
   this limitation:
 </p>

 <ol>
   <li>
   Inlining from another dex file requires to use the dex cache of that other
   dex file, unlike same dex file inlining, which could just re-use the dex cache
   of the caller. The dex cache is needed in compiled code for a couple of
   instructions like static calls, string load, or class load.
   <li>
     The stack maps are only encoding a method index within the current dex file.
   </li>
 </ol>

 <p>To address these limitations, Android 8.0:</p>

 <ol>
   <li>Removes dex cache access from compiled code (also see section "Dex cache
 removal")</li>
   <li>Extends stack map encoding.</li>
 </ol>

 <h2 id="synchronization-improvements">Synchronization improvements</h2>

 <p>
   The ART team tuned the MonitorEnter/MonitorExit code paths, and reduced our
   reliance on traditional memory barriers on ARMv8, replacing them with newer
   (acquire/release) instructions where possible.
 </p>

 <h2 id="faster-native-methods">Faster native methods</h2>

 <p>
   Faster native calls to the Java Native Interface (JNI) are available using
   the <a class="external"
 href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/FastNative.java"
 ><code>@FastNative</code></a> and <a class="external"
 href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/CriticalNative.java"
 ><code>@CriticalNative</code></a> annotations. These built-in ART runtime
   optimizations speed up JNI transitions and replace the now deprecated
   <em>!bang&nbsp;JNI</em> notation. The annotations have no effect on non-native
   methods and are only available to platform Java Language code on the
   <code>bootclasspath</code> (no Play Store updates).
 </p>

 <p>
   The <code>@FastNative</code> annotation supports non-static methods. Use this
   if a method accesses a <code>jobject</code> as a parameter or return value.
 </p>

 <p>
   The <code>@CriticalNative</code> annotation provides an even faster way to run
   native methods, with the following restrictions:
 </p>

 <ul>
   <li>
     Methods must be static—no objects for parameters, return values, or an
     implicit <code>this</code>.
   </li>
   <li>Only primitive types are passed to the native method.</li>
   <li>
     The native method does not use the <code>JNIEnv</code> and
     <code>jclass</code> parameters in its function definition.
   </li>
   <li>
     The method must be registered with <code>RegisterNatives</code> instead of
     relying on dynamic JNI linking.
   </li>
 </ul>

 <aside class="caution">
   <p>
     The <code>@FastNative</code> and <code>@CriticalNative</code> annotations
     disable garbage collection while executing a native method. Do not use with
     long-running methods, including usually-fast, but generally unbounded,
     methods.
   </p>
   <p>
     Pauses to the garbage collection may cause deadlock. Do not acquire locks
     during a fast native call if the locks haven't been released locally (i.e.
     before returning to managed code). This does not apply to regular JNI calls
     since ART considers the executing native code as suspended.
   </p>
 </aside>

 <p>
   <code>@FastNative</code> can improve native method performance up to 3x, and
   <code>@CriticalNative</code> up to 5x. For example, a JNI transition measured
   on a Nexus 6P device:
 </p>

 <table>
   <tr>
     <th>Java Native Interface (JNI) invocation</th>
     <th>Execution time (in nanoseconds)</th>
   </tr>
   <tr>
     <td>Regular JNI</td>
     <td>115</td>
   </tr>
   <tr>
     <td><em>!bang JNI</em></td>
     <td>60</td>
   </tr>
   <tr>
     <td><code>@FastNative</code></td>
     <td>35</td>
   </tr>
   <tr>
     <td><code>@CriticalNative</code></td>
     <td>25</td>
   </tr>
 </table>

 </body>
 </html>
	<html devsite>
	<head>
	<title>Android 8.0 ART Improvements</title>
	<meta name="project_path" value="/_project.yaml" />
	<meta name="book_path" value="/_book.yaml" />
	</head>
	<body>
	<!--
	Copyright 2017 The Android Open Source Project

	Licensed under the Apache License, Version 2.0 (the "License");
	you may not use this file except in compliance with the License.
	You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	<p>
	The Android runtime (ART) has been improved significantly in the Android 8.0
	release. The list below summarizes enhancements device manufacturers can expect
	in ART.
	</p>

	<h2 id="concurrent-compacting-gc">Concurrent compacting garbage collector</h2>

	<p>
	As announced at Google I/O, ART features a new concurrent compacting garbage
	collector (GC) in Android 8.0. This collector compacts the heap every time GC
	runs and while the app is running, with only one short pause for processing
	thread roots. Here are its benefits:
	</p>

	<ul>
	<li>
	GC always compacts the heap: 32% smaller heap sizes on average compared to
	Android 7.0.
	</li>
	<li>
	Compaction enables thread local bump pointer object allocation: Allocations
	are 70% faster than in Android 7.0.
	</li>
	<li>
	Offers 85% smaller pause times for the H2 benchmark compared to the Android
	7.0 GC.
	</li>
	<li>
	Pause times no longer scale with heap size; apps should be able to use large
	heaps without worrying about jank.
	</li>
	<li>GC implementation detail - Read barriers:
	<ul>
	<li>
	Read barriers are a small amount of work done for each object field read.
	</li>
	<li>
	These are optimized in the compiler, but might slow down some use cases.
	</li>
	</ul>
	</ul>

	<h2 id="loop-optimizations">Loop optimizations</h2>

	<p>
	A wide variety of loop optimizations are employed by ART in the Android 8.0
	release:
	</p>

	<ul>
	<li>Bounds check eliminations
	<ul>
	<li>Static: ranges are proven to be within bounds at compile-time</li>
	<li>
	Dynamic: run-time tests ensure loops stay within bounds (deopt otherwise)
	</li>
	</ul>
	</li>
	<li>Induction variable eliminations
	<ul>
	<li>Remove dead induction</li>
	<li>
	Replace induction that is used only after the loop by closed-form
	expressions
	</li>
	</ul>
	</li>
	<li>
	Dead code elimination inside the loop-body, removal of whole loops that
	become dead
	</li>
	<li>Strength reduction</li>
	<li>
	Loop transformations: reversal, interchanging, splitting, unrolling,
	unimodular, etc.
	</li>
	<li>SIMDization (also called vectorization)</li>
	</ul>

	<p>
	The loop optimizer resides in its own optimization pass in the ART compiler.
	Most loop optimizations are similar to optimizations and simplification
	elsewhere. Challenges arise with some optimizations that rewrite the CFG in a
	more than usual elaborate way, because most CFG utilities (see nodes.h) focus
	on building a CFG, not rewriting one.
	</p>

	<h2 id="class-hierarchy-analysis">Class hierarchy analysis</h2>

	<p>
	ART in Android 8.0 uses Class Hierarchy Analysis (CHA), a compiler optimization
	that devirtualizes virtual calls into direct calls based on the information
	generated by analyzing class hierarchies. Virtual calls are expensive since
	they are implemented around a vtable lookup, and they take a couple of
	dependent loads. Also virtual calls cannot be inlined.
	</p>

	<p>Here is a summary of related enhancements:</p>

	<ul>
	<li>
	Dynamic single-implementation method status updating - At the end of class
	linking time, when vtable has been populated, ART conducts an entry-by-entry
	comparison to the vtable of the super class.
	</li>
	<li>Compiler optimization - The compiler will take advantage of the
	single-implementation info of a method. If a method A.foo has
	single-implementation flag set, compiler will devirtualize the virtual call
	into a direct call, and further try to inline the direct call as a result.
	</li>
	<li>
	Compiled code invalidation - Also at the end of class linking time when
	single-implementation info is updated, if method A.foo that previously had
	single-implementation but that status is now invalidated, all compiled code
	that depends on the assumption that method A.foo has single-implementation
	needs to have their compiled code invalidated.
	</li>
	<li>
	Deoptimization - For live compiled code that's on stack, deoptimization will
	be initiated to force the invalidated compiled code into interpreter mode to
	guarantee correctness. A new mechanism of deoptimization which is a hybrid of
	synchronous and asynchronous deoptimization will be used.
	</li>
	</ul>

	<h2 id="inline-caches-in-oat-files">Inline caches in .oat files</h2>

	<p>
	ART now employs inline caches and optimizes the call sites for which enough
	data exists. The inline caches feature records additional runtime information
	into profiles and uses it to add dynamic optimizations to ahead of time compilation.
	</p>

	<h2 id="dexlayout">Dexlayout</h2>

	<p>
	Dexlayout is a library introduced in Android 8.0 to analyze dex files and
	reorder them according to a profile. Dexlayout aims to use runtime profiling
	information to reorder sections of the dex file during idle maintenance
	compilation on device. By grouping together parts of the dex file that are
	often accessed together, programs can have better memory access patterns from
	improved locality, saving RAM and shortening start up time.
	</p>

	<p>
	Since profile information is currently available only after apps have been run,
	dexlayout is integrated in dex2oat's on-device compilation during idle
	maintenance.
	</p>

	<h2 id="dex-cache-removal">Dex cache removal</h2>

	<p>
	Up to Android 7.0, the DexCache object owned four large arrays, proportional to
	the number of certain elements in the DexFile, namely:
	</p>

	<ul>
	<li>
	strings (one reference per DexFile::StringId),
	</li>
	<li>
	types (one reference per DexFile::TypeId),
	</li>
	<li>
	methods (one native pointer per DexFile::MethodId),
	</li>
	<li>
	fields (one native pointer per DexFile::FieldId).
	</li>
	</ul>

	<p>
	These arrays were used for fast retrieval of objects that we previously
	resolved. In Android 8.0, all arrays have been removed except the methods array.
	</p>

	<h2 id="interpreter-performance">Interpreter performance</h2>

	<p>
	Interpreter performance significantly improved in the Android 7.0 release with
	the introduction of "mterp" - an interpreter featuring a core
	fetch/decode/interpret mechanism written in assembly language. Mterp is
	modelled after the fast Dalvik interpreter, and supports arm, arm64, x86,
	x86_64, mips and mips64. For computational code, Art's mterp is roughly
	comparable to Dalvik's fast interpreter. However, in some situations it can be
	significantly - and even dramatically - slower:
	</p>

	<ol>
	<li>Invoke performance.</li>
	<li>
	String manipulation, and other heavy users of methods recognized as
	intrinsics in Dalvik.
	</li>
	<li>Higher stack memory usage.</li>
	</ol>

	<p>Android 8.0 addresses these issues.</p>

	<h2 id="more-inlining">More inlining</h2>

	<p>
	Since Android 6.0, ART can inline any call within the same dex files, but could
	only inline leaf methods from different dex files. There were two reasons for
	this limitation:
	</p>

	<ol>
	<li>
	Inlining from another dex file requires to use the dex cache of that other
	dex file, unlike same dex file inlining, which could just re-use the dex cache
	of the caller. The dex cache is needed in compiled code for a couple of
	instructions like static calls, string load, or class load.
	<li>
	The stack maps are only encoding a method index within the current dex file.
	</li>
	</ol>

	<p>To address these limitations, Android 8.0:</p>

	<ol>
	<li>Removes dex cache access from compiled code (also see section "Dex cache
	removal")</li>
	<li>Extends stack map encoding.</li>
	</ol>

	<h2 id="synchronization-improvements">Synchronization improvements</h2>

	<p>
	The ART team tuned the MonitorEnter/MonitorExit code paths, and reduced our
	reliance on traditional memory barriers on ARMv8, replacing them with newer
	(acquire/release) instructions where possible.
	</p>

	<h2 id="faster-native-methods">Faster native methods</h2>

	<p>
	Faster native calls to the Java Native Interface (JNI) are available using
	the <a class="external"
	href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/FastNative.java"
	><code>@FastNative</code></a> and <a class="external"
	href="https://android.googlesource.com/platform/libcore/+/master/dalvik/src/main/java/dalvik/annotation/optimization/CriticalNative.java"
	><code>@CriticalNative</code></a> annotations. These built-in ART runtime
	optimizations speed up JNI transitions and replace the now deprecated
	<em>!bang JNI</em> notation. The annotations have no effect on non-native
	methods and are only available to platform Java Language code on the
	<code>bootclasspath</code> (no Play Store updates).
	</p>

	<p>
	The <code>@FastNative</code> annotation supports non-static methods. Use this
	if a method accesses a <code>jobject</code> as a parameter or return value.
	</p>

	<p>
	The <code>@CriticalNative</code> annotation provides an even faster way to run
	native methods, with the following restrictions:
	</p>

	<ul>
	<li>
	Methods must be static—no objects for parameters, return values, or an
	implicit <code>this</code>.
	</li>
	<li>Only primitive types are passed to the native method.</li>
	<li>
	The native method does not use the <code>JNIEnv</code> and
	<code>jclass</code> parameters in its function definition.
	</li>
	<li>
	The method must be registered with <code>RegisterNatives</code> instead of
	relying on dynamic JNI linking.
	</li>
	</ul>

	<aside class="caution">
	<p>
	The <code>@FastNative</code> and <code>@CriticalNative</code> annotations
	disable garbage collection while executing a native method. Do not use with
	long-running methods, including usually-fast, but generally unbounded,
	methods.
	</p>
	<p>
	Pauses to the garbage collection may cause deadlock. Do not acquire locks
	during a fast native call if the locks haven't been released locally (i.e.
	before returning to managed code). This does not apply to regular JNI calls
	since ART considers the executing native code as suspended.
	</p>
	</aside>

	<p>
	<code>@FastNative</code> can improve native method performance up to 3x, and
	<code>@CriticalNative</code> up to 5x. For example, a JNI transition measured
	on a Nexus 6P device:
	</p>

	<table>
	<tr>
	<th>Java Native Interface (JNI) invocation</th>
	<th>Execution time (in nanoseconds)</th>
	</tr>
	<tr>
	<td>Regular JNI</td>
	<td>115</td>
	</tr>
	<tr>
	<td><em>!bang JNI</em></td>
	<td>60</td>
	</tr>
	<tr>
	<td><code>@FastNative</code></td>
	<td>35</td>
	</tr>
	<tr>
	<td><code>@CriticalNative</code></td>
	<td>25</td>
	</tr>
	</table>

	</body>
	</html>