| Linux kernel driver for Elastic Network Adapter (ENA) family: | 
 | ============================================================= | 
 |  | 
 | Overview: | 
 | ========= | 
 | ENA is a networking interface designed to make good use of modern CPU | 
 | features and system architectures. | 
 |  | 
 | The ENA device exposes a lightweight management interface with a | 
 | minimal set of memory mapped registers and extendable command set | 
 | through an Admin Queue. | 
 |  | 
 | The driver supports a range of ENA devices, is link-speed independent | 
 | (i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has | 
 | a negotiated and extendable feature set. | 
 |  | 
 | Some ENA devices support SR-IOV. This driver is used for both the | 
 | SR-IOV Physical Function (PF) and Virtual Function (VF) devices. | 
 |  | 
 | ENA devices enable high speed and low overhead network traffic | 
 | processing by providing multiple Tx/Rx queue pairs (the maximum number | 
 | is advertised by the device via the Admin Queue), a dedicated MSI-X | 
 | interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation, | 
 | and CPU cacheline optimized data placement. | 
 |  | 
 | The ENA driver supports industry standard TCP/IP offload features such | 
 | as checksum offload and TCP transmit segmentation offload (TSO). | 
 | Receive-side scaling (RSS) is supported for multi-core scaling. | 
 |  | 
 | The ENA driver and its corresponding devices implement health | 
 | monitoring mechanisms such as watchdog, enabling the device and driver | 
 | to recover in a manner transparent to the application, as well as | 
 | debug logs. | 
 |  | 
 | Some of the ENA devices support a working mode called Low-latency | 
 | Queue (LLQ), which saves several more microseconds. | 
 |  | 
 | Supported PCI vendor ID/device IDs: | 
 | =================================== | 
 | 1d0f:0ec2 - ENA PF | 
 | 1d0f:1ec2 - ENA PF with LLQ support | 
 | 1d0f:ec20 - ENA VF | 
 | 1d0f:ec21 - ENA VF with LLQ support | 
 |  | 
 | ENA Source Code Directory Structure: | 
 | ==================================== | 
 | ena_com.[ch]      - Management communication layer. This layer is | 
 |                     responsible for the handling all the management | 
 |                     (admin) communication between the device and the | 
 |                     driver. | 
 | ena_eth_com.[ch]  - Tx/Rx data path. | 
 | ena_admin_defs.h  - Definition of ENA management interface. | 
 | ena_eth_io_defs.h - Definition of ENA data path interface. | 
 | ena_common_defs.h - Common definitions for ena_com layer. | 
 | ena_regs_defs.h   - Definition of ENA PCI memory-mapped (MMIO) registers. | 
 | ena_netdev.[ch]   - Main Linux kernel driver. | 
 | ena_syfsfs.[ch]   - Sysfs files. | 
 | ena_ethtool.c     - ethtool callbacks. | 
 | ena_pci_id_tbl.h  - Supported device IDs. | 
 |  | 
 | Management Interface: | 
 | ===================== | 
 | ENA management interface is exposed by means of: | 
 | - PCIe Configuration Space | 
 | - Device Registers | 
 | - Admin Queue (AQ) and Admin Completion Queue (ACQ) | 
 | - Asynchronous Event Notification Queue (AENQ) | 
 |  | 
 | ENA device MMIO Registers are accessed only during driver | 
 | initialization and are not involved in further normal device | 
 | operation. | 
 |  | 
 | AQ is used for submitting management commands, and the | 
 | results/responses are reported asynchronously through ACQ. | 
 |  | 
 | ENA introduces a very small set of management commands with room for | 
 | vendor-specific extensions. Most of the management operations are | 
 | framed in a generic Get/Set feature command. | 
 |  | 
 | The following admin queue commands are supported: | 
 | - Create I/O submission queue | 
 | - Create I/O completion queue | 
 | - Destroy I/O submission queue | 
 | - Destroy I/O completion queue | 
 | - Get feature | 
 | - Set feature | 
 | - Configure AENQ | 
 | - Get statistics | 
 |  | 
 | Refer to ena_admin_defs.h for the list of supported Get/Set Feature | 
 | properties. | 
 |  | 
 | The Asynchronous Event Notification Queue (AENQ) is a uni-directional | 
 | queue used by the ENA device to send to the driver events that cannot | 
 | be reported using ACQ. AENQ events are subdivided into groups. Each | 
 | group may have multiple syndromes, as shown below | 
 |  | 
 | The events are: | 
 | 	Group			Syndrome | 
 | 	Link state change	- X - | 
 | 	Fatal error		- X - | 
 | 	Notification		Suspend traffic | 
 | 	Notification		Resume traffic | 
 | 	Keep-Alive		- X - | 
 |  | 
 | ACQ and AENQ share the same MSI-X vector. | 
 |  | 
 | Keep-Alive is a special mechanism that allows monitoring of the | 
 | device's health. The driver maintains a watchdog (WD) handler which, | 
 | if fired, logs the current state and statistics then resets and | 
 | restarts the ENA device and driver. A Keep-Alive event is delivered by | 
 | the device every second. The driver re-arms the WD upon reception of a | 
 | Keep-Alive event. A missed Keep-Alive event causes the WD handler to | 
 | fire. | 
 |  | 
 | Data Path Interface: | 
 | ==================== | 
 | I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx | 
 | SQ correspondingly). Each SQ has a completion queue (CQ) associated | 
 | with it. | 
 |  | 
 | The SQs and CQs are implemented as descriptor rings in contiguous | 
 | physical memory. | 
 |  | 
 | The ENA driver supports two Queue Operation modes for Tx SQs: | 
 | - Regular mode | 
 |   * In this mode the Tx SQs reside in the host's memory. The ENA | 
 |     device fetches the ENA Tx descriptors and packet data from host | 
 |     memory. | 
 | - Low Latency Queue (LLQ) mode or "push-mode". | 
 |   * In this mode the driver pushes the transmit descriptors and the | 
 |     first 128 bytes of the packet directly to the ENA device memory | 
 |     space. The rest of the packet payload is fetched by the | 
 |     device. For this operation mode, the driver uses a dedicated PCI | 
 |     device memory BAR, which is mapped with write-combine capability. | 
 |  | 
 | The Rx SQs support only the regular mode. | 
 |  | 
 | Note: Not all ENA devices support LLQ, and this feature is negotiated | 
 |       with the device upon initialization. If the ENA device does not | 
 |       support LLQ mode, the driver falls back to the regular mode. | 
 |  | 
 | The driver supports multi-queue for both Tx and Rx. This has various | 
 | benefits: | 
 | - Reduced CPU/thread/process contention on a given Ethernet interface. | 
 | - Cache miss rate on completion is reduced, particularly for data | 
 |   cache lines that hold the sk_buff structures. | 
 | - Increased process-level parallelism when handling received packets. | 
 | - Increased data cache hit rate, by steering kernel processing of | 
 |   packets to the CPU, where the application thread consuming the | 
 |   packet is running. | 
 | - In hardware interrupt re-direction. | 
 |  | 
 | Interrupt Modes: | 
 | ================ | 
 | The driver assigns a single MSI-X vector per queue pair (for both Tx | 
 | and Rx directions). The driver assigns an additional dedicated MSI-X vector | 
 | for management (for ACQ and AENQ). | 
 |  | 
 | Management interrupt registration is performed when the Linux kernel | 
 | probes the adapter, and it is de-registered when the adapter is | 
 | removed. I/O queue interrupt registration is performed when the Linux | 
 | interface of the adapter is opened, and it is de-registered when the | 
 | interface is closed. | 
 |  | 
 | The management interrupt is named: | 
 |    ena-mgmnt@pci:<PCI domain:bus:slot.function> | 
 | and for each queue pair, an interrupt is named: | 
 |    <interface name>-Tx-Rx-<queue index> | 
 |  | 
 | The ENA device operates in auto-mask and auto-clear interrupt | 
 | modes. That is, once MSI-X is delivered to the host, its Cause bit is | 
 | automatically cleared and the interrupt is masked. The interrupt is | 
 | unmasked by the driver after NAPI processing is complete. | 
 |  | 
 | Interrupt Moderation: | 
 | ===================== | 
 | ENA driver and device can operate in conventional or adaptive interrupt | 
 | moderation mode. | 
 |  | 
 | In conventional mode the driver instructs device to postpone interrupt | 
 | posting according to static interrupt delay value. The interrupt delay | 
 | value can be configured through ethtool(8). The following ethtool | 
 | parameters are supported by the driver: tx-usecs, rx-usecs | 
 |  | 
 | In adaptive interrupt moderation mode the interrupt delay value is | 
 | updated by the driver dynamically and adjusted every NAPI cycle | 
 | according to the traffic nature. | 
 |  | 
 | By default ENA driver applies adaptive coalescing on Rx traffic and | 
 | conventional coalescing on Tx traffic. | 
 |  | 
 | Adaptive coalescing can be switched on/off through ethtool(8) | 
 | adaptive_rx on|off parameter. | 
 |  | 
 | The driver chooses interrupt delay value according to the number of | 
 | bytes and packets received between interrupt unmasking and interrupt | 
 | posting. The driver uses interrupt delay table that subdivides the | 
 | range of received bytes/packets into 5 levels and assigns interrupt | 
 | delay value to each level. | 
 |  | 
 | The user can enable/disable adaptive moderation, modify the interrupt | 
 | delay table and restore its default values through sysfs. | 
 |  | 
 | The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK | 
 | and can be configured by the ETHTOOL_STUNABLE command of the | 
 | SIOCETHTOOL ioctl. | 
 |  | 
 | SKB: | 
 | The driver-allocated SKB for frames received from Rx handling using | 
 | NAPI context. The allocation method depends on the size of the packet. | 
 | If the frame length is larger than rx_copybreak, napi_get_frags() | 
 | is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer | 
 | content is copied (by CPU) to the SKB, and the buffer is recycled. | 
 |  | 
 | Statistics: | 
 | =========== | 
 | The user can obtain ENA device and driver statistics using ethtool. | 
 | The driver can collect regular or extended statistics (including | 
 | per-queue stats) from the device. | 
 |  | 
 | In addition the driver logs the stats to syslog upon device reset. | 
 |  | 
 | MTU: | 
 | ==== | 
 | The driver supports an arbitrarily large MTU with a maximum that is | 
 | negotiated with the device. The driver configures MTU using the | 
 | SetFeature command (ENA_ADMIN_MTU property). The user can change MTU | 
 | via ip(8) and similar legacy tools. | 
 |  | 
 | Stateless Offloads: | 
 | =================== | 
 | The ENA driver supports: | 
 | - TSO over IPv4/IPv6 | 
 | - TSO with ECN | 
 | - IPv4 header checksum offload | 
 | - TCP/UDP over IPv4/IPv6 checksum offloads | 
 |  | 
 | RSS: | 
 | ==== | 
 | - The ENA device supports RSS that allows flexible Rx traffic | 
 |   steering. | 
 | - Toeplitz and CRC32 hash functions are supported. | 
 | - Different combinations of L2/L3/L4 fields can be configured as | 
 |   inputs for hash functions. | 
 | - The driver configures RSS settings using the AQ SetFeature command | 
 |   (ENA_ADMIN_RSS_HASH_FUNCTION, ENA_ADMIN_RSS_HASH_INPUT and | 
 |   ENA_ADMIN_RSS_REDIRECTION_TABLE_CONFIG properties). | 
 | - If the NETIF_F_RXHASH flag is set, the 32-bit result of the hash | 
 |   function delivered in the Rx CQ descriptor is set in the received | 
 |   SKB. | 
 | - The user can provide a hash key, hash function, and configure the | 
 |   indirection table through ethtool(8). | 
 |  | 
 | DATA PATH: | 
 | ========== | 
 | Tx: | 
 | --- | 
 | end_start_xmit() is called by the stack. This function does the following: | 
 | - Maps data buffers (skb->data and frags). | 
 | - Populates ena_buf for the push buffer (if the driver and device are | 
 |   in push mode.) | 
 | - Prepares ENA bufs for the remaining frags. | 
 | - Allocates a new request ID from the empty req_id ring. The request | 
 |   ID is the index of the packet in the Tx info. This is used for | 
 |   out-of-order TX completions. | 
 | - Adds the packet to the proper place in the Tx ring. | 
 | - Calls ena_com_prepare_tx(), an ENA communication layer that converts | 
 |   the ena_bufs to ENA descriptors (and adds meta ENA descriptors as | 
 |   needed.) | 
 |   * This function also copies the ENA descriptors and the push buffer | 
 |     to the Device memory space (if in push mode.) | 
 | - Writes doorbell to the ENA device. | 
 | - When the ENA device finishes sending the packet, a completion | 
 |   interrupt is raised. | 
 | - The interrupt handler schedules NAPI. | 
 | - The ena_clean_tx_irq() function is called. This function handles the | 
 |   completion descriptors generated by the ENA, with a single | 
 |   completion descriptor per completed packet. | 
 |   * req_id is retrieved from the completion descriptor. The tx_info of | 
 |     the packet is retrieved via the req_id. The data buffers are | 
 |     unmapped and req_id is returned to the empty req_id ring. | 
 |   * The function stops when the completion descriptors are completed or | 
 |     the budget is reached. | 
 |  | 
 | Rx: | 
 | --- | 
 | - When a packet is received from the ENA device. | 
 | - The interrupt handler schedules NAPI. | 
 | - The ena_clean_rx_irq() function is called. This function calls | 
 |   ena_rx_pkt(), an ENA communication layer function, which returns the | 
 |   number of descriptors used for a new unhandled packet, and zero if | 
 |   no new packet is found. | 
 | - Then it calls the ena_clean_rx_irq() function. | 
 | - ena_eth_rx_skb() checks packet length: | 
 |   * If the packet is small (len < rx_copybreak), the driver allocates | 
 |     a SKB for the new packet, and copies the packet payload into the | 
 |     SKB data buffer. | 
 |     - In this way the original data buffer is not passed to the stack | 
 |       and is reused for future Rx packets. | 
 |   * Otherwise the function unmaps the Rx buffer, then allocates the | 
 |     new SKB structure and hooks the Rx buffer to the SKB frags. | 
 | - The new SKB is updated with the necessary information (protocol, | 
 |   checksum hw verify result, etc.), and then passed to the network | 
 |   stack, using the NAPI interface function napi_gro_receive(). |