| # virtio-queue |
| |
| The `virtio-queue` crate provides a virtio device implementation for a virtio |
| queue, a virtio descriptor and a chain of such descriptors. |
| Two formats of virtio queues are defined in the specification: split virtqueues |
| and packed virtqueues. The `virtio-queue` crate offers support only for the |
| [split virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-240006) |
| format. |
| The purpose of the virtio-queue API is to be consumed by virtio device |
| implementations (such as the block device or vsock device). |
| The main abstraction is the `Queue`. The crate is also defining a state object |
| for the queue, i.e. `QueueState`. |
| |
| ## Usage |
| |
| Let’s take a concrete example of how a device would work with a queue, using |
| the MMIO bus. |
| |
| First, it is important to mention that the mandatory parts of the virtio |
| interface are the following: |
| |
| - the device status field → provides an indication of |
| [the completed steps](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001) |
| of the device initialization routine, |
| - the feature bits → |
| [the features](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001) |
| the driver/device understand(s), |
| - [notifications](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-170003), |
| - one or more |
| [virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-230005) |
| → the mechanism for data transport between the driver and device. |
| |
| Each virtqueue consists of three parts: |
| |
| - Descriptor Table, |
| - Available Ring, |
| - Used Ring. |
| |
| Before booting the virtual machine (VM), the VMM does the following set up: |
| |
| 1. initialize an array of Queues using the Queue constructor. |
| 2. register the device to the MMIO bus, so that the driver can later send |
| read/write requests from/to the MMIO space, some of those requests also set |
| up the queues’ state. |
| 3. other pre-boot configurations, such as registering a fd for the interrupt |
| assigned to the device, fd which will be later used by the device to inform |
| the driver that it has information to communicate. |
| |
| After the boot of the VM, the driver starts sending read/write requests to |
| configure things like: |
| |
| * the supported features; |
| * queue parameters. The following setters are used for the queue set up: |
| * `set_size` → for setting the size of the queue. |
| * `set_ready` → configure the queue to the `ready for processing` state. |
| * `set_desc_table_address`, `set_avail_ring_address`, |
| `set_used_ring_address` → configure the guest address of the constituent |
| parts of the queue. |
| * `set_event_idx` → it is called as part of the features' negotiation in |
| the `virtio-device` crate, and is enabling or disabling the |
| VIRTIO_F_RING_EVENT_IDX feature. |
| * the device activation. As part of this activation, the device can also create |
| a queue handler for the device, that can be later used to process the queue. |
| |
| Once the queues are ready, the device can be used. |
| |
| The steady state operation of a virtio device follows a model where the driver |
| produces descriptor chains which are consumed by the device, and both parties |
| need to be notified when new elements have been placed on the associate ring to |
| avoid busy polling. The precise notification mechanism is left up to the VMM |
| that incorporates the devices and queues (it usually involves things like MMIO |
| vm exits and interrupt injection into the guest). The queue implementation is |
| agnostic to the notification mechanism in use, and it exposes methods and |
| functionality (such as iterators) that are called from the outside in response |
| to a notification event. |
| |
| ### Data transmission using virtqueues |
| |
| The basic principle of how the queues are used by the device/driver is the |
| following, as showed in the diagram below as well: |
| |
| 1. when the guest driver has a new request (buffer), it allocates free |
| descriptor(s) for the buffer in the descriptor table, chaining as necessary. |
| 2. the driver adds a new entry with the head index of the descriptor chain |
| describing the request, in the available ring entries. |
| 3. the driver increments the `idx` with the number of new entries, the diagram |
| shows the simple use case of only one new entry. |
| 4. the driver sends an available buffer notification to the device if such |
| notifications are not suppressed. |
| 5. the device will at some point consume that request, by first reading the |
| `idx` field from the available ring. This can be directly achieved with |
| `Queue::avail_idx`, but we do not recommend to the consumers of the crate |
| to use this because it is already called behind the scenes by the iterator |
| over all available descriptor chain heads. |
| 6. the device gets the index of the descriptor chain(s) corresponding to the |
| read `idx` value. |
| 7. the device reads the corresponding descriptor(s) from the descriptor table. |
| 8. the device adds a new entry in the used ring by using `Queue::add_used`; the |
| entry is defined in the spec as `virtq_used_elem`, and in `virtio-queue` as |
| `VirtqUsedElem`. This structure is holding both the index of the descriptor |
| chain and the number of bytes that were written to the memory as part of |
| serving the request. |
| 9. the device increments the `idx` from the used ring; this is done as part of |
| the `Queue::add_used` that was mentioned above. |
| 10. the device sends a used buffer notification to the driver if such |
| notifications are not suppressed. |
| |
|  |
| |
| A descriptor is storing four fields, with the first two, `addr` and `len`, |
| pointing to the data in memory to which the descriptor refers, as shown in the |
| diagram below. The `flags` field is useful for indicating if, for example, the |
| buffer is device readable or writable, or if we have another descriptor chained |
| after this one (VIRTQ_DESC_F_NEXT flag set). `next` field is storing the index |
| of the next descriptor if VIRTQ_DESC_F_NEXT is set. |
| |
|  |
| |
| **Requirements for device implementation** |
| |
| * Abstractions from virtio-queue such as `DescriptorChain` can be used to parse |
| descriptors provided by the device, which represent input or output memory |
| areas for device I/O. A descriptor is essentially an (address, length) pair, |
| which is subsequently used by the device model operation. We do not check the |
| validity of the descriptors, and instead expect any validations to happen |
| when the device implementation is attempting to access the corresponding |
| areas. Early checks can add non-negligible additional costs, and exclusively |
| relying upon them may lead to time-of-check-to-time-of-use race conditions. |
| * The device should validate before reading/writing to a buffer that it is |
| device-readable/device-writable. |
| |
| ## Design |
| |
| `QueueT` is a trait that allows different implementations for a `Queue` |
| object for single-threaded context and multi-threaded context. The |
| implementations provided in `virtio-queue` are: |
| |
| 1. `Queue` → it is used for the single-threaded context. |
| 2. `QueueSync` → it is used for the multi-threaded context, and is simply |
| a wrapper over an `Arc<Mutex<Queue>>`. |
| |
| Besides the above abstractions, the `virtio-queue` crate provides also the |
| following ones: |
| |
| * `Descriptor` → which mostly offers accessors for the members of the |
| `Descriptor`. |
| * `DescriptorChain` → provides accessors for the `DescriptorChain`’s members |
| and an `Iterator` implementation for iterating over the `DescriptorChain`, |
| there is also an abstraction for iterators over just the device readable or |
| just the device writable descriptors (`DescriptorChainRwIter`). |
| * `AvailIter` - is a consuming iterator over all available descriptor chain |
| heads in the queue. |
| |
| ## Save/Restore Queue |
| |
| The `Queue` allows saving the state through the `state` function which returns |
| a `QueueState`. `Queue` objects can be created from a previously saved state by |
| using `QueueState::try_from`. The VMM should check for errors when restoring |
| a `Queue` from a previously saved state. |
| |
| ### Notification suppression |
| |
| A big part of the `virtio-queue` crate consists of the notification suppression |
| support. As already mentioned, the driver can send an available buffer |
| notification to the device when there are new entries in the available ring, |
| and the device can send a used buffer notification to the driver when there are |
| new entries in the used ring. There might be cases when sending a notification |
| each time these scenarios happen is not efficient, for example when the driver |
| is processing the used ring, it would not need to receive another used buffer |
| notification. The mechanism for suppressing the notifications is detailed in |
| the following sections from the specification: |
| - [Used Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-400007), |
| - [Available Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4800010). |
| |
| The `Queue` abstraction is proposing the following sequence of steps for |
| processing new available ring entries: |
| |
| 1. the device first disables the notifications to make the driver aware it is |
| processing the available ring and does not want interruptions, by using |
| `Queue::disable_notification`. Notifications are disabled by the device |
| either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY |
| is set in the `flags` field of the used ring, or if VIRTIO_F_EVENT_IDX is |
| negotiated, and `avail_event` value is not updated, i.e. it remains set to |
| the latest `idx` value of the available ring that was already notified by |
| the driver. |
| 2. the device processes the new entries by using the `AvailIter` iterator. |
| 3. the device can enable the notifications now, by using |
| `Queue::enable_notification`. Notifications are enabled by the device either |
| if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the `flags` field |
| of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and `avail_event` |
| value is set to the smallest `idx` value of the available ring that was not |
| already notified by the driver. This way the device makes sure that it won’t |
| miss any notification. |
| |
| The above steps should be done in a loop to also handle the less likely case |
| where the driver added new entries just before we re-enabled notifications. |
| |
| On the driver side, the `Queue` provides the `needs_notification` method which |
| should be used each time the device adds a new entry to the used ring. |
| Depending on the `used_event` value and on the last used value |
| (`signalled_used`), `needs_notification` returns true to let the device know it |
| should send a notification to the guest. |
| |
| ## Assumptions |
| |
| We assume the users of the `Queue` implementation won’t attempt to use the |
| queue before checking that the `ready` bit is set. This can be verified by |
| calling `Queue::is_valid` which, besides this, is also checking that the three |
| queue parts are valid memory regions. |
| We assume consumers will use `AvailIter::go_to_previous_position` only in |
| single-threaded contexts. |
| We assume the users will consume the entries from the available ring in the |
| recommended way from the documentation, i.e. device starts processing the |
| available ring entries, disables the notifications, processes the entries, |
| and then re-enables notifications. |
| |
| ## License |
| |
| This project is licensed under either of |
| |
| - [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0 |
| - [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause) |