| <html devsite> |
| <head> |
| <title>A/B (Seamless) System Updates</title> |
| <meta name="project_path" value="/_project.yaml" /> |
| <meta name="book_path" value="/_book.yaml" /> |
| </head> |
| <body> |
| <!-- |
| Copyright 2018 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| |
| <p>A/B system updates, also known as seamless updates, ensure a workable |
| booting system remains on the disk during an <a href="/devices/tech/ota/index.html"> |
| over-the-air (OTA) update</a>. This approach reduces the likelihood of |
| an inactive device after an update, which means fewer device |
| replacements and device reflashes at repair and warranty centers. Other |
| commercial-grade operating systems such as |
| <a href="https://www.chromium.org/chromium-os">ChromeOS</a> also use A/B |
| updates successfully. |
| </p> |
| |
| <p>A/B system updates provide the following benefits:</p> |
| |
| <ul> |
| <li> |
| OTA updates can occur while the system is running, without |
| interrupting the user. Users can continue to use their devices during |
| an OTA—the only downtime during an update is when the device |
| reboots into the updated disk partition. |
| </li> |
| <li> |
| After an update, rebooting takes no longer than a regular reboot. |
| </li> |
| <li> |
| If an OTA fails to apply (for example, because of a bad flash), the |
| user will not be affected. The user will continue to run the old OS, |
| and the client is free to re-attempt the update. |
| </li> |
| <li> |
| If an OTA update is applied but fails to boot, the device will reboot |
| back into the old partition and remains usable. The client is free to |
| re-attempt the update. |
| </li> |
| <li> |
| Any errors (such as I/O errors) affect only the <strong>unused</strong> |
| partition set and can be retried. Such errors also become less likely |
| because the I/O load is deliberately low to avoid degrading the user |
| experience. |
| </li> |
| <li> |
| Updates can be streamed to A/B devices, removing the need to download |
| the package before installing it. Streaming means it's not necessary |
| for the user to have enough free space to store the update package on |
| <code>/data</code> or <code>/cache</code>. |
| </li> |
| <li> |
| The cache partition is no longer used to store OTA update packages, so |
| there is no need to ensure that the cache partition is large enough for |
| future updates. |
| </li> |
| <li> |
| <a href="/security/verifiedboot/dm-verity.html">dm-verity</a> |
| guarantees a device will boot an uncorrupted image. If a device |
| doesn't boot due to a bad OTA or dm-verity issue, the device can |
| reboot into an old image. (Android <a href="/security/verifiedboot/"> |
| Verified Boot</a> does not require A/B updates.) |
| </li> |
| </ul> |
| |
| <h2 id="overview">About A/B system updates</h2> |
| |
| <p> |
| A/B updates require changes to both the client and the system. The OTA |
| package server, however, should not require changes: update packages |
| are still served over HTTPS. For devices using Google's OTA |
| infrastructure, the system changes are all in AOSP, and the client code |
| is provided by Google Play services. OEMs not using Google's OTA |
| infrastructure will be able to reuse the AOSP system code but will |
| need to supply their own client. |
| </p> |
| |
| <p> |
| For OEMs supplying their own client, the client needs to: |
| </p> |
| |
| <ul> |
| <li> |
| Decide when to take an update. Because A/B updates happen in the |
| background, they are no longer user-initiated. To avoid disrupting |
| users, it is recommended that updates are scheduled when the device |
| is in idle maintenance mode, such as overnight, and on Wi-Fi. |
| However, your client can use any heuristics you want. |
| </li> |
| <li> |
| Check in with your OTA package servers and determine whether an |
| update is available. This should be mostly the same as your existing |
| client code, except that you will want to signal that the device |
| supports A/B. (Google's client also includes a |
| <strong>Check now</strong> button for users to check for the latest |
| update.) |
| </li> |
| <li> |
| Call <code>update_engine</code> with the HTTPS URL for your update |
| package, assuming one is available. <code>update_engine</code> will |
| update the raw blocks on the currently unused partition as it streams |
| the update package. |
| </li> |
| <li> |
| Report installation successes or failures to your servers, based on |
| the <code>update_engine</code> result code. If the update is applied |
| successfully, <code>update_engine</code> will tell the bootloader to |
| boot into the new OS on the next reboot. The bootloader will fallback |
| to the old OS if the new OS fails to boot, so no work is required |
| from the client. If the update fails, the client needs to decide when |
| (and whether) to try again, based on the detailed error code. For |
| example, a good client could recognize that a partial ("diff") OTA |
| package fails and try a full OTA package instead. |
| </li> |
| </ul> |
| |
| <p>Optionally, the client can:</p> |
| |
| <ul> |
| <li> |
| Show a notification asking the user to reboot. If you want to |
| implement a policy where the user is encouraged to routinely update, |
| then this notification can be added to your client. If the client |
| does not prompt users, then users will get the update next time they |
| reboot anyway. (Google's client has a per-update configurable delay.) |
| </li> |
| <li> |
| Show a notification telling users whether they booted into a new |
| OS version or whether they were expected to do so but fell back to |
| the old OS version. (Google's client typically does neither.) |
| </li> |
| </ul> |
| |
| <p>On the system side, A/B system updates affect the following:</p> |
| |
| <ul> |
| <li> |
| Partition selection (slots), the <code>update_engine</code> daemon, |
| and bootloader interactions (described below) |
| </li> |
| <li> |
| Build process and OTA update package generation (described in |
| <a href="/devices/tech/ota/ab/ab_implement.html">Implementing A/B |
| Updates</a>) |
| </li> |
| </ul> |
| |
| <aside class="note"> |
| <strong>Note:</strong> A/B system updates implemented through OTA are |
| recommended for new devices only. |
| </aside> |
| |
| <h3 id="slots">Partition selection (slots)</h3> |
| |
| <p> |
| A/B system updates use two sets of partitions referred to as |
| <em>slots</em> (normally slot A and slot B). The system runs from |
| the <em>current</em> slot while the partitions in the <em>unused</em> |
| slot are not accessed by the running system during normal operation. |
| This approach makes updates fault resistant by keeping the unused |
| slot as a fallback: If an error occurs during or immediately after |
| an update, the system can rollback to the old slot and continue to |
| have a working system. To achieve this goal, no partition used by |
| the <em>current</em> slot should be updated as part of the OTA |
| update (including partitions for which there is only one copy). |
| </p> |
| |
| <p> |
| Each slot has a <em>bootable</em> attribute that states whether the |
| slot contains a correct system from which the device can boot. The |
| current slot is bootable when the system is running, but the other |
| slot may have an old (still correct) version of the system, a newer |
| version, or invalid data. Regardless of what the <em>current</em> |
| slot is, there is one slot that is the <em>active</em> slot (the one |
| the bootloader will boot form on the next boot) or the |
| <em>preferred</em> slot. |
| </p> |
| |
| <p> |
| Each slot also has a <em>successful</em> attribute set by the user |
| space, which is relevant only if the slot is also bootable. A |
| successful slot should be able to boot, run, and update itself. A |
| bootable slot that was not marked as successful (after several |
| attempts were made to boot from it) should be marked as unbootable |
| by the bootloader, including changing the active slot to another |
| bootable slot (normally to the slot running immediately before the |
| attempt to boot into the new, active one). The specific details of |
| the interface are defined in |
| <code><a href="https://android.googlesource.com/platform/hardware/libhardware/+/master/include/hardware/boot_control.h" class="external-link"> |
| boot_control.h</a></code>. |
| </p> |
| |
| <h3 id="update-engine">Update engine daemon</h3> |
| |
| <p> |
| A/B system updates use a background daemon called |
| <code>update_engine</code> to prepare the system to boot into a new, |
| updated version. This daemon can perform the following actions: |
| </p> |
| |
| <ul> |
| <li> |
| Read from the current slot A/B partitions and write any data to |
| the unused slot A/B partitions as instructed by the OTA package. |
| </li> |
| <li> |
| Call the <code>boot_control</code> interface in a pre-defined |
| workflow. |
| </li> |
| <li> |
| Run a <em>post-install</em> program from the <em>new</em> |
| partition after writing all the unused slot partitions, as |
| instructed by the OTA package. (For details, see |
| <a href="#post-installation">Post-installation</a>). |
| </li> |
| </ul> |
| |
| <p> |
| As the <code>update_engine</code> daemon is not involved in the boot |
| process itself, it is limited in what it can do during an update by |
| the <a href="/security/selinux/">SELinux</a> policies and features |
| in the <em>current</em> slot (such policies and features can't be |
| updated until the system boots into a new version). To maintain a |
| robust system, the update process <strong>should not</strong> modify |
| the partition table, the contents of partitions in the current slot, |
| or the contents of non-A/B partitions that can't be wiped with a |
| factory reset. |
| </p> |
| |
| <h4 id="update_engine_source">Update engine source</h4> |
| |
| <p> |
| The <code>update_engine</code> source is located in |
| <code><a href="https://android.googlesource.com/platform/system/update_engine/" class="external">system/update_engine</a></code>. |
| The A/B OTA dexopt files are split between <code>installd</code> and |
| a package manager: |
| </p> |
| |
| <ul> |
| <li> |
| <code><a href="https://android.googlesource.com/platform/frameworks/native/+/master/cmds/installd/" class="external-link">frameworks/native/cmds/installd/</a></code>ota* |
| includes the postinstall script, the binary for chroot, the |
| installd clone that calls dex2oat, the post-OTA move-artifacts |
| script, and the rc file for the move script. |
| </li> |
| <li> |
| <code><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/pm/OtaDexoptService.java" class="external-link">frameworks/base/services/core/java/com/android/server/pm/OtaDexoptService.java</a></code> |
| (plus <code><a href="https://android.googlesource.com/platform/frameworks/base/+/master/services/core/java/com/android/server/pm/OtaDexoptShellCommand.java" class="external-link">OtaDexoptShellCommand</a></code>) |
| is the package manager that prepares dex2oat commands for |
| applications. |
| </li> |
| </ul> |
| |
| <p> |
| For a working example, refer to <code><a href="https://android.googlesource.com/device/google/marlin/+/nougat-dr1-release/device-common.mk" class="external-link">/device/google/marlin/device-common.mk</a></code>. |
| </p> |
| |
| <h4 id="update_engine_logs">Update engine logs</h4> |
| |
| <p> |
| For Android 8.x releases and earlier, the <code>update_engine</code> |
| logs can be found in <code>logcat</code> and in the bug report. To |
| make the <code>update_engine</code> logs available in the file system, |
| patch the following changes into your build: |
| </p> |
| |
| <ul> |
| <li><a |
| href="https://android-review.googlesource.com/c/platform/system/update_engine/+/486618"> |
| Change 486618</a></li> |
| <li><a |
| href="https://android-review.googlesource.com/c/platform/system/core/+/529080"> |
| Change 529080</a></li> |
| <li><a |
| href="https://android-review.googlesource.com/c/platform/system/update_engine/+/529081"> |
| Change 529081</a></li> |
| <li><a |
| href="https://android-review.googlesource.com/c/platform/system/sepolicy/+/534660"> |
| Change 534660</a></li> |
| </ul> |
| |
| <p>These changes save a copy of the most recent |
| <code>update_engine</code> log to |
| <code>/data/misc/update_engine_log/update_engine.log</code>. Users |
| with the <strong>log</strong> group ID will be able to access the file |
| system logs. |
| |
| <h3 id="bootloader-interactions">Bootloader interactions</h3> |
| |
| <p> |
| The <code>boot_control</code> HAL is used by |
| <code>update_engine</code> (and possibly other daemons) to instruct |
| the bootloader what to boot from. Common example scenarios and their |
| associated states include the following: |
| </p> |
| |
| <ul> |
| <li> |
| <strong>Normal case</strong>: The system is running from its |
| current slot, either slot A or B. No updates have been applied so |
| far. The system's current slot is bootable, successful, and the |
| active slot. |
| </li> |
| <li> |
| <strong>Update in progress</strong>: The system is running from |
| slot B, so slot B is the bootable, successful, and active slot. |
| Slot A was marked as unbootable since the contents of slot A are |
| being updated but not yet completed. A reboot in this state should |
| continue booting from slot B. |
| </li> |
| <li> |
| <strong>Update applied, reboot pending</strong>: The system is |
| running from slot B, slot B is bootable and successful, but slot A |
| was marked as active (and therefore is marked as bootable). Slot A |
| is not yet marked as successful and some number of attempts to |
| boot from slot A should be made by the bootloader. |
| </li> |
| <li> |
| <strong>System rebooted into new update</strong>: The system is |
| running from slot A for the first time, slot B is still bootable |
| and successful while slot A is only bootable, and still active but |
| not successful. A user space daemon, <code>update_verifier</code>, |
| should mark slot A as successful after some checks are made. |
| </li> |
| </ul> |
| |
| <h3 id="streaming-updates">Streaming update support</h3> |
| |
| <p> |
| User devices don't always have enough space on <code>/data</code> to |
| download the update package. As neither OEMs nor users want to waste |
| space on a <code>/cache</code> partition, some users go without |
| updates because the device has nowhere to store the update package. |
| To address this issue, Android 8.0 added support for streaming A/B |
| updates that write blocks directly to the B partition as they are |
| downloaded, without having to store the blocks on <code>/data</code>. |
| Streaming A/B updates need almost no temporary storage and require |
| just enough storage for roughly 100 KiB of metadata. |
| </p> |
| |
| <p>To enable streaming updates in Android 7.1, cherrypick the following |
| patches:</p> |
| |
| <ul> |
| <li> |
| <a href="https://android-review.googlesource.com/333624" class="external"> |
| Allow to cancel a proxy resolution request</a> |
| </li> |
| <li> |
| <a href="https://android-review.googlesource.com/333625" class="external"> |
| Fix terminating a transfer while resolving proxies</a> |
| </li> |
| <li> |
| <a href="https://android-review.googlesource.com/333626" class="external"> |
| Add unit test for TerminateTransfer between ranges</a> |
| </li> |
| <li> |
| <a href="https://android-review.googlesource.com/333627" class="external"> |
| Cleanup the RetryTimeoutCallback()</a> |
| </li> |
| </ul> |
| |
| <p> |
| These patches are required to support streaming A/B updates in |
| Android 7.1 and later whether using |
| <a href="https://www.android.com/gms/">Google Mobile Services |
| (GMS)</a> or any other update client. |
| </p> |
| |
| <h2 id="life-of-an-a-b-update">Life of an A/B update</h2> |
| |
| <p> |
| The update process starts when an OTA package (referred to in code as a |
| <em>payload</em>) is available for downloading. Policies in the device |
| may defer the payload download and application based on battery level, |
| user activity, charging status, or other policies. In addition, |
| because the update runs in the background, users might not know an |
| update is in progress. All of this means the update process might be |
| interrupted at any point due to policies, unexpected reboots, or user |
| actions. |
| </p> |
| |
| <p> |
| Optionally, metadata in the OTA package itself indicates the update |
| can be streamed; the same package can also be used for non-streaming |
| installation. The server may use the metadata to tell the client it's |
| streaming so the client will hand off the OTA to |
| <code>update_engine</code> correctly. Device manufacturers with their |
| own server and client can enable streaming updates by ensuring the |
| server identifies the update is streaming (or assumes all updates are |
| streaming) and the client makes the correct call to |
| <code>update_engine</code> for streaming. Manufacturers can use the |
| fact that the package is of the streaming variant to send a flag to |
| the client to trigger hand off to the framework side as streaming. |
| </p> |
| |
| <p>After a payload is available, the update process is as follows:</p> |
| |
| <table> |
| <tr> |
| <th>Step</th> |
| <th>Activities</th> |
| </tr> |
| <tr> |
| <td>1</td> |
| <td>The current slot (or "source slot") is marked as successful (if |
| not already marked) with <code>markBootSuccessful()</code>.</td> |
| </tr> |
| <tr> |
| <td>2</td> |
| <td> |
| The unused slot (or "target slot") is marked as unbootable by |
| calling the function <code>setSlotAsUnbootable()</code>. The |
| current slot is always marked as successful at the beginning of |
| the update to prevent the bootloader from falling back to the |
| unused slot, which will soon have invalid data. If the system has |
| reached the point where it can start applying an update, the |
| current slot is marked as successful even if other major |
| components are broken (such as the UI in a crash loop) as it is |
| possible to push new software to fix these problems. |
| <br /><br /> |
| The update payload is an opaque blob with the instructions to |
| update to the new version. The update payload consists of the |
| following: |
| <ul> |
| <li> |
| <em>Metadata</em>. A relatively small portion of the update |
| payload, the metadata contains a list of operations to produce |
| and verify the new version on the target slot. For example, an |
| operation could decompress a certain blob and write it to |
| specific blocks in a target partition, or read from a source |
| partition, apply a binary patch, and write to certain blocks |
| in a target partition. |
| </li> |
| <li> |
| <em>Extra data</em>. As the bulk of the update payload, the |
| extra data associated with the operations consists of the |
| compressed blob or binary patch in these examples. |
| </li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td>3</td> |
| <td>The payload metadata is downloaded.</td> |
| </tr> |
| <tr> |
| <td>4</td> |
| <td> |
| For each operation defined in the metadata, in order, the |
| associated data (if any) is downloaded to memory, the operation is |
| applied, and the associated memory is discarded. |
| </td> |
| </tr> |
| <tr> |
| <td>5</td> |
| <td> |
| The whole partitions are re-read and verified against the expected |
| hash. |
| </td> |
| </tr> |
| <tr> |
| <td>6</td> |
| <td> |
| The post-install step (if any) is run. In the case of an error |
| during the execution of any step, the update fails and is |
| re-attempted with possibly a different payload. If all the steps |
| so far have succeeded, the update succeeds and the last step is |
| executed. |
| </td> |
| </tr> |
| <tr> |
| <td>7</td> |
| <td> |
| The <em>unused slot</em> is marked as active by calling |
| <code>setActiveBootSlot()</code>. Marking the unused slot as |
| active doesn't mean it will finish booting. The bootloader (or |
| system itself) can switch the active slot back if it doesn't read |
| a successful state. |
| </td> |
| </tr> |
| <tr> |
| <td>8</td> |
| <td> |
| Post-installation (described below) involves running a program |
| from the "new update" version while still running in the old |
| version. If defined in the OTA package, this step is |
| <strong>mandatory</strong> and the program must return with exit |
| code <code>0</code>; otherwise, the update fails. |
| </td> |
| </tr> |
| <td>9</td> |
| <td> |
| After the system successfully boots far enough into the new slot |
| and finishes the post-reboot checks, the now current slot |
| (formerly the "target slot") is marked as successful by calling |
| <code>markBootSuccessful()</code>. |
| </td> |
| <tr> |
| </table> |
| |
| <aside class="note"> |
| <strong>Note:</strong> Steps 3 and 4 take most of the update time as |
| they involve writing and downloading large amounts of data, and are |
| likely to be interrupted for reasons of policy or reboot. |
| </aside> |
| |
| <h3 id="post-installation">Post-installation</h3> |
| |
| <p> |
| For every partition where a post-install step is defined, |
| <code>update_engine</code> mounts the new partition into a specific |
| location and executes the program specified in the OTA relative to |
| the mounted partition. For example, if the post-install program is |
| defined as <code>usr/bin/postinstall</code> in the system partition, |
| this partition from the unused slot will be mounted in a fixed |
| location (such as <code>/postinstall_mount</code>) and the |
| <code>/postinstall_mount/usr/bin/postinstall</code> command is |
| executed. |
| </p> |
| |
| <p> |
| For post-installation to succeed, the old kernel must be able to: |
| </p> |
| |
| <ul> |
| <li> |
| <strong>Mount the new filesystem format</strong>. The filesystem |
| type cannot change unless there's support for it in the old |
| kernel, including details such as the compression algorithm used |
| if using a compressed filesystem (i.e. SquashFS). |
| </li> |
| <li> |
| <strong>Understand the new partition's post-install program format</strong>. |
| If using an Executable and Linkable Format (ELF) binary, it should |
| be compatible with the old kernel (e.g. a 64-bit new program |
| running on an old 32-bit kernel if the architecture switched from |
| 32- to 64-bit builds). Unless the loader (<code>ld</code>) is |
| instructed to use other paths or build a static binary, libraries |
| will be loaded from the old system image and not the new one. |
| </li> |
| </ul> |
| |
| <p> |
| For example, you could use a shell script as a post-install program |
| interpreted by the old system's shell binary with a <code>#!</code> |
| marker at the top), then set up library paths from the new |
| environment for executing a more complex binary post-install |
| program. Alternatively, you could run the post-install step from a |
| dedicated smaller partition to enable the filesystem format in the |
| main system partition to be updated without incurring backward |
| compatibility issues or stepping-stone updates; this would allow |
| users to update directly to the latest version from a factory image. |
| </p> |
| |
| <p> |
| The new post-install program is limited by the SELinux policies |
| defined in the old system. As such, the post-install step is |
| suitable for performing tasks required by design on a given device |
| or other best-effort tasks (i.e. updating the A/B-capable firmware |
| or bootloader, preparing copies of databases for the new version, |
| etc.). The post-install step is <strong>not suitable</strong> for |
| one-off bug fixes before reboot that require unforeseen permissions. |
| </p> |
| |
| <p> |
| The selected post-install program runs in the |
| <code>postinstall</code> SELinux context. All the files in the new |
| mounted partition will be tagged with <code>postinstall_file</code>, |
| regardless of what their attributes are after rebooting into that |
| new system. Changes to the SELinux attributes in the new system |
| won't impact the post-install step. If the post-install program |
| needs extra permissions, those must be added to the post-install |
| context. |
| </p> |
| |
| <h3 id="after_reboot">After reboot</h3> |
| |
| <p> |
| After rebooting, <code>update_verifier</code> triggers the integrity |
| check using dm-verity. This check starts before zygote to avoid Java |
| services making any irreversible changes that would prevent a safe |
| rollback. During this process, bootloader and kernel may also |
| trigger a reboot if verified boot or dm-verity detect any |
| corruption. After the check completes, <code>update_verifier</code> |
| marks the boot successful. |
| </p> |
| |
| <p> |
| <code>update_verifier</code> will read only the blocks listed in |
| <code>/data/ota_package/care_map.txt</code>, which is included in an |
| A/B OTA package when using the AOSP code. The Java system update |
| client, such as GmsCore, extracts <code>care_map.txt</code>, sets up |
| the access permission before rebooting the device, and deletes the |
| extracted file after the system successfully boots into the new |
| version. |
| </p> |
| |
| </body> |
| </html> |