| page.title=Contributors to Audio Latency |
| @jd:body |
| |
| <!-- |
| Copyright 2013 The Android Open Source Project |
| |
| Licensed under the Apache License, Version 2.0 (the "License"); |
| you may not use this file except in compliance with the License. |
| You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <div id="qv-wrapper"> |
| <div id="qv"> |
| <h2>In this document</h2> |
| <ol id="auto-toc"> |
| </ol> |
| </div> |
| </div> |
| |
| <p> |
| This page focuses on the contributors to output latency, |
| but a similar discussion applies to input latency. |
| </p> |
| <p> |
| Assuming the analog circuitry does not contribute significantly, then the major |
| surface-level contributors to audio latency are the following: |
| </p> |
| |
| <ul> |
| <li>Application</li> |
| <li>Total number of buffers in pipeline</li> |
| <li>Size of each buffer, in frames</li> |
| <li>Additional latency after the app processor, such as from a DSP</li> |
| </ul> |
| |
| <p> |
| As accurate as the above list of contributors may be, it is also misleading. |
| The reason is that buffer count and buffer size are more of an |
| <em>effect</em> than a <em>cause</em>. What usually happens is that |
| a given buffer scheme is implemented and tested, but during testing, an audio |
| underrun or overrun is heard as a "click" or "pop." To compensate, the |
| system designer then increases buffer sizes or buffer counts. |
| This has the desired result of eliminating the underruns or overruns, but it also |
| has the undesired side effect of increasing latency. |
| For more information about buffer sizes, see the video |
| <a href="https://youtu.be/PnDK17zP9BI">Audio latency: buffer sizes</a>. |
| |
| </p> |
| |
| <p> |
| A better approach is to understand the causes of the |
| underruns and overruns, and then correct those. This eliminates the |
| audible artifacts and may permit even smaller or fewer buffers |
| and thus reduce latency. |
| </p> |
| |
| <p> |
| In our experience, the most common causes of underruns and overruns include: |
| </p> |
| <ul> |
| <li>Linux CFS (Completely Fair Scheduler)</li> |
| <li>high-priority threads with SCHED_FIFO scheduling</li> |
| <li>priority inversion</li> |
| <li>long scheduling latency</li> |
| <li>long-running interrupt handlers</li> |
| <li>long interrupt disable time</li> |
| <li>power management</li> |
| <li>security kernels</li> |
| </ul> |
| |
| <h3 id="linuxCfs">Linux CFS and SCHED_FIFO scheduling</h3> |
| <p> |
| The Linux CFS is designed to be fair to competing workloads sharing a common CPU |
| resource. This fairness is represented by a per-thread <em>nice</em> parameter. |
| The nice value ranges from -19 (least nice, or most CPU time allocated) |
| to 20 (nicest, or least CPU time allocated). In general, all threads with a given |
| nice value receive approximately equal CPU time and threads with a |
| numerically lower nice value should expect to |
| receive more CPU time. However, CFS is "fair" only over relatively long |
| periods of observation. Over short-term observation windows, |
| CFS may allocate the CPU resource in unexpected ways. For example, it |
| may take the CPU away from a thread with numerically low niceness |
| onto a thread with a numerically high niceness. In the case of audio, |
| this can result in an underrun or overrun. |
| </p> |
| |
| <p> |
| The obvious solution is to avoid CFS for high-performance audio |
| threads. Beginning with Android 4.1, such threads now use the |
| <code>SCHED_FIFO</code> scheduling policy rather than the <code>SCHED_NORMAL</code> (also called |
| <code>SCHED_OTHER</code>) scheduling policy implemented by CFS. |
| </p> |
| |
| <h3 id="schedFifo">SCHED_FIFO priorities</h3> |
| <p> |
| Though the high-performance audio threads now use <code>SCHED_FIFO</code>, they |
| are still susceptible to other higher priority <code>SCHED_FIFO</code> threads. |
| These are typically kernel worker threads, but there may also be a few |
| non-audio user threads with policy <code>SCHED_FIFO</code>. The available <code>SCHED_FIFO</code> |
| priorities range from 1 to 99. The audio threads run at priority |
| 2 or 3. This leaves priority 1 available for lower priority threads, |
| and priorities 4 to 99 for higher priority threads. We recommend |
| you use priority 1 whenever possible, and reserve priorities 4 to 99 for |
| those threads that are guaranteed to complete within a bounded amount |
| of time, execute with a period shorter than the period of audio threads, |
| and are known to not interfere with scheduling of audio threads. |
| </p> |
| |
| <h3 id="rms">Rate-monotonic scheduling</h3> |
| <p> |
| For more information on the theory of assignment of fixed priorities, |
| see the Wikipedia article |
| <a href="http://en.wikipedia.org/wiki/Rate-monotonic_scheduling">Rate-monotonic scheduling</a> (RMS). |
| A key point is that fixed priorities should be allocated strictly based on period, |
| with higher priorities assigned to threads of shorter periods, not based on perceived "importance." |
| Non-periodic threads may be modeled as periodic threads, using the maximum frequency of execution |
| and maximum computation per execution. If a non-periodic thread cannot be modeled as |
| a periodic thread (for example it could execute with unbounded frequency or unbounded computation |
| per execution), then it should not be assigned a fixed priority as that would be incompatible |
| with the scheduling of true periodic threads. |
| </p> |
| |
| <h3 id="priorityInversion">Priority inversion</h3> |
| <p> |
| <a href="http://en.wikipedia.org/wiki/Priority_inversion">Priority inversion</a> |
| is a classic failure mode of real-time systems, |
| where a higher-priority task is blocked for an unbounded time waiting |
| for a lower-priority task to release a resource such as (shared |
| state protected by) a |
| <a href="http://en.wikipedia.org/wiki/Mutual_exclusion">mutex</a>. |
| See the article "<a href="avoiding_pi.html">Avoiding priority inversion</a>" for techniques to |
| mitigate it. |
| </p> |
| |
| <h3 id="schedLatency">Scheduling latency</h3> |
| <p> |
| Scheduling latency is the time between when a thread becomes |
| ready to run and when the resulting context switch completes so that the |
| thread actually runs on a CPU. The shorter the latency the better, and |
| anything over two milliseconds causes problems for audio. Long scheduling |
| latency is most likely to occur during mode transitions, such as |
| bringing up or shutting down a CPU, switching between a security kernel |
| and the normal kernel, switching from full power to low-power mode, |
| or adjusting the CPU clock frequency and voltage. |
| </p> |
| |
| <h3 id="interrupts">Interrupts</h3> |
| <p> |
| In many designs, CPU 0 services all external interrupts. So a |
| long-running interrupt handler may delay other interrupts, in particular |
| audio direct memory access (DMA) completion interrupts. Design interrupt handlers |
| to finish quickly and defer lengthy work to a thread (preferably |
| a CFS thread or <code>SCHED_FIFO</code> thread of priority 1). |
| </p> |
| |
| <p> |
| Equivalently, disabling interrupts on CPU 0 for a long period |
| has the same result of delaying the servicing of audio interrupts. |
| Long interrupt disable times typically happen while waiting for a kernel |
| <i>spin lock</i>. Review these spin locks to ensure they are bounded. |
| </p> |
| |
| <h3 id="power">Power, performance, and thermal management</h3> |
| <p> |
| <a href="http://en.wikipedia.org/wiki/Power_management">Power management</a> |
| is a broad term that encompasses efforts to monitor |
| and reduce power consumption while optimizing performance. |
| <a href="http://en.wikipedia.org/wiki/Thermal_management_of_electronic_devices_and_systems">Thermal management</a> |
| and <a href="http://en.wikipedia.org/wiki/Computer_cooling">computer cooling</a> |
| are similar but seek to measure and control heat to avoid damage due to excess heat. |
| In the Linux kernel, the CPU |
| <a href="http://en.wikipedia.org/wiki/Governor_%28device%29">governor</a> |
| is responsible for low-level policy, while user mode configures high-level policy. |
| Techniques used include: |
| </p> |
| |
| <ul> |
| <li>dynamic voltage scaling</li> |
| <li>dynamic frequency scaling</li> |
| <li>dynamic core enabling</li> |
| <li>cluster switching</li> |
| <li>power gating</li> |
| <li>hotplug (hotswap)</li> |
| <li>various sleep modes (halt, stop, idle, suspend, etc.)</li> |
| <li>process migration</li> |
| <li><a href="http://en.wikipedia.org/wiki/Processor_affinity">processor affinity</a></li> |
| </ul> |
| |
| <p> |
| Some management operations can result in "work stoppages" or |
| times during which there is no useful work performed by the application processor. |
| These work stoppages can interfere with audio, so such management should be designed |
| for an acceptable worst-case work stoppage while audio is active. |
| Of course, when thermal runaway is imminent, avoiding permanent damage |
| is more important than audio! |
| </p> |
| |
| <h3 id="security">Security kernels</h3> |
| <p> |
| A <a href="http://en.wikipedia.org/wiki/Security_kernel">security kernel</a> for |
| <a href="http://en.wikipedia.org/wiki/Digital_rights_management">Digital rights management</a> |
| (DRM) may run on the same application processor core(s) as those used |
| for the main operating system kernel and application code. Any time |
| during which a security kernel operation is active on a core is effectively a |
| stoppage of ordinary work that would normally run on that core. |
| In particular, this may include audio work. By its nature, the internal |
| behavior of a security kernel is inscrutable from higher-level layers, and thus |
| any performance anomalies caused by a security kernel are especially |
| pernicious. For example, security kernel operations do not typically appear in |
| context switch traces. We call this "dark time" — time that elapses |
| yet cannot be observed. Security kernels should be designed for an |
| acceptable worst-case work stoppage while audio is active. |
| </p> |