DOCA SDK Documentation

DOCA Virtio-net Service Guide

This guide provides instructions on how to use the DOCA virtio-net service container on top of NVIDIA® BlueField®-3 networking platform.

Introduction

NVIDIA® BlueField® virtio-net enables users to create virtio-net PCIe devices in the system where the BlueField is connected. In a traditional virtualization environment, virtio-net devices can be emulated by QEMU from the hypervisor, or offloading part of the work (e.g., dataplane) to the NIC (e.g., vDPA). Compared to those solutions, virtio-net PCIe devices offload both data and control plane to the BlueField networking device. The PCIe virtio-net devices exposed to the hypervisor do not depend on QEMU or other software emulators/vendor drivers from the guest OS.

The solution is based on BlueField family technology on top of virtual switch and OVS, so that virtio-net devices can benefit from the full SDN and hardware offload methodologies.

virtio-vfs.png

Virtio-net Controller SystemD Service

Virtio-net-controller is a systemd service which runs the BlueField with a command-line interface (CLI) frontend to communicate with the service running in the background. The controller systemd service is enabled by default and runs automatically after certain firmware configurations are deployed. 

The processes virtio_net_emu and virtio_net_ha are created to manage live update and high availability.

vnet.png

Virtio-net Deployment

Updating OS Image on BlueField

To install the BFB bundle on the NVIDIA® BlueField®, run the following command from the Linux hypervisor:

[host]# sudo bfb-install --rshim <rshimN> --bfb <image_path.bfb>

For more information, refer to section "Deploying BlueField Software Using BFB from Host" in the NVIDIA BlueField DPU BSP documentation.

Updating NIC Firmware

From the BlueField networking platform, run:

[dpu]# sudo /opt/mellanox/mlnx-fw-updater/mlnx_fw_updater.pl --force-fw-update

For more information, refer to section "Upgrading Firmware" in the NVIDIA DOCA Installation Guide for Linux.

Configuring NIC Firmware

As default, DPU should be configured in DPU mode. A simple way to confirm DPU is running at DPU mode is to log into the BlueField Arm system and check if p0 and pf0hpf both exists by running command below.

[dpu]# ip link show

Virtio-net full emulation only works in DPU mode. For more information about DPU mode configuration, please refer to BlueField Modes of Operation.

Before enabling the virtio-net service, configure firmware via mlxconfig tool is required. There are examples on typical configurations, the table listed relevant mlxconfig entry descriptions.

For mlxconfig configuration changes to take effect, perform a BlueField system-level reset.

Mlxconfig Entries

Description

VIRTIO_NET_EMULATION_ENABLE

Must be set to TRUE, for virtio-net to be enabled

VIRTIO_NET_EMULATION_NUM_PF

Total number of PCIe functions (PFs) exposed by the device for virtio-net emulation. Those functions are persistent along with host/BlueField power cycle.

VIRTIO_NET_EMULATION_NUM_VF

The max number of virtual functions (VFs) that can be supported for each virtio-net PF

VIRTIO_NET_EMULATION_NUM_MSIX

Number of MSI-X vectors assigned for each PF of the virtio-net emulation device, minimal is 4.

VIRTIO_NET_EMULATION_NUM_VF_MSIX

Number of MSI-X vectors assigned for each VF of the virtio-net emulation device, minimal is 4. Relevant for BlueField-3 devices only.

PCI_SWITCH_EMULATION_ENABLE

When TRUE, the device exposes a PCIe switch. All PF configurations are applied on the switch downstream ports. In such case, each PF gets a different PCIe device on the emulated switch. This configuration allows exposing extra network PFs toward the host which can be enabled for virtio-net hot-plug devices.

PCI_SWITCH_EMULATION_NUM_PORT

The maximum number of emulated switch ports. Each port can hold a single PCIe device (emulated or not). This determines the supported maximum number of hot-plug virtio-net devices. The maximum number depends on hypervisor PCIe resource and cannot exceed 31.

Check system PCIe resource. Changing this entry to a big number may result in the host not booting up, which would necessitate disabling the BlueField device and clearing the host NVRAM.

PER_PF_NUM_SF

When TRUE, the SFs configuration is defined by TOTAL_SF and SF_BAR_SIZE for each PF individually. If they are not defined for a PF, device defaults are used.

PF_TOTAL_SF

The total number of scalable function (SF) partitions that can be supported for the current PF. Valid only when PER_PF_NUM_SF is set to TRUE. This number should be greater than the total number of virtio-net PFs (both static and hotplug) and VFs.

This entry differs between the BlueField and host side mlxconfig. It is also a system wide value, which is shared by virtio-net and other users. The DPU normally creates 1 SF as default per port. Consider this default SF into account when reserving the PF_TOTAL_SF.

PF_SF_BAR_SIZE

Log (base 2) of the BAR size of a single SF, given in KB. Valid only when PF_TOTAL_SF is non-zero and PER_PF_NUM_SF is set to TRUE.

PF_BAR2_ENABLE

When TRUE, BAR2 is exposed on all external host PFs (but not on the embedded Arm PFs/ECPFs). The BAR2 size is defined by the log_pf_bar2_size.

SRIOV_EN

Enable single-root I/O virtualization (SR-IOV) for virtio-net and native PFs

EXP_ROM_VIRTIO_NET_PXE_ENABLE

Enable expansion ROM option for PXE for virtio-net functions

All virtio EXP_ROM options should be configured from host side other than the BlueField platform's side, only static PF is supported.

EXP_ROM_VIRTIO_NET_UEFI_ARM_ENABLE

Enable expansion ROM option for UEFI for Arm based host for virtio-net functions

EXP_ROM_VIRTIO_NET_UEFI_x86_ENABLE

Enable expansion ROM option for UEFI for x86 based host for virtio-net functions

The maximum number of supported devices is listed below. It does not apply when there are hot-plug and VF created at the same time.

Static PF

Hot-plug PF

VF

31

31

1008

The maximum supported number of hotplug PFs depends on the host PCI resource, it may support less or none on specific systems. Refer to host BIOS specification.

Static PF

Static PF is defined as virtio-net PFs which are persistent even after DPU or host power cycle. It also supports creating SR-IOV VFs.

The following is an example for enabling the system with 4 static PFs (VIRTIO_NET_EMULATION_NUM_PF) only:

10 SFs (PF_TOTAL_SF) are reserved to take into account other application using the SFs.

[dpu]# mlxconfig -d 03:00.0 s \
VIRTIO_NET_EMULATION_ENABLE=1 \
VIRTIO_NET_EMULATION_NUM_PF=4 \
VIRTIO_NET_EMULATION_NUM_VF=0 \
VIRTIO_NET_EMULATION_NUM_MSIX=64 \
PCI_SWITCH_EMULATION_ENABLE=0 \
PCI_SWITCH_EMULATION_NUM_PORT=0 \
PER_PF_NUM_SF=1 \
PF_TOTAL_SF=64 \
PF_BAR2_ENABLE=0 \
PF_SF_BAR_SIZE=8 \
SRIOV_EN=0

Hotplug PF

Hotplug PF is defined as virtio-net PFs which can be hotplugged or unplugged dynamically after the system comes up. 

Hotplug PF does not support creating SR-IOV VFs.

The following is an example for enabling 16 hotplug PFs (PCI_SWITCH_EMULATION_NUM_PORT):

[dpu]# mlxconfig -d 03:00.0 s \
VIRTIO_NET_EMULATION_ENABLE=1 \
VIRTIO_NET_EMULATION_NUM_PF=0 \
VIRTIO_NET_EMULATION_NUM_VF=0 \
VIRTIO_NET_EMULATION_NUM_MSIX=64 \
PCI_SWITCH_EMULATION_ENABLE=1 \
PCI_SWITCH_EMULATION_NUM_PORT=16 \
PER_PF_NUM_SF=1 \
PF_TOTAL_SF=64 \
PF_BAR2_ENABLE=0 \
PF_SF_BAR_SIZE=8 \
SRIOV_EN=0 

SR-IOV VF

SR-IOV VF is defined as virtio-net VFs created on top of PFs. Each VF gets an individual virtio-net PCIe devices.

VFs cannot be dynamically created or destroyed, they can only change from X to 0, or from 0 to X.

VFs will be destroyed when reboot host or unbind PF from virtio-net kernel driver.

The following is an example for enabling 126 VFs per static PF—504 (4 PF x 126) VFs in total:

[dpu]# mlxconfig -d 03:00.0 s \
VIRTIO_NET_EMULATION_ENABLE=1 \
VIRTIO_NET_EMULATION_NUM_PF=4 \
VIRTIO_NET_EMULATION_NUM_VF=126 \
VIRTIO_NET_EMULATION_NUM_MSIX=64 \
VIRTIO_NET_EMULATION_NUM_VF_MSIX=6 \
PCI_SWITCH_EMULATION_ENABLE=0 \
PCI_SWITCH_EMULATION_NUM_PORT=0 \
PER_PF_NUM_SF=1 \
PF_TOTAL_SF=512 \
PF_BAR2_ENABLE=0 \
PF_SF_BAR_SIZE=8 \
NUM_VF_MSIX=0 \
SRIOV_EN=1

PF/VF Combinations

Creating static/hotplug PFs and VFs at the same time is supported.

The total sum of PCIe functions to the external host must not exceed 1008. For example:

  • If there are 2 PFs with no VFs (NUM_OF_VFS=0) and there is 1 RShim, then the remaining static functions is 1005 (1008-3).

  • If 1 virtio-net PF is configured (VIRTIO_NET_EMULATION_NUM_PF=1), then up to 1004 virtio-net VFs can be configured (VIRTIO_NET_EMULATION_NUM_VF=1004)

  • If 2 virtio-net PF (VIRTIO_NET_EMULATION_NUM_PF=2), then up to 502 virtio-net VFs can be configured (VIRTIO_NET_EMULATION_NUM_VF=502)

The following is an example for enabling 15 hotplug PFs, 2 static PFs, and 200 VFs (2 PFs x 100):

[dpu]# mlxconfig -d 03:00.0 s \
VIRTIO_NET_EMULATION_ENABLE=1 \
VIRTIO_NET_EMULATION_NUM_PF=2 \
VIRTIO_NET_EMULATION_NUM_VF=100 \
VIRTIO_NET_EMULATION_NUM_MSIX=10 \
VIRTIO_NET_EMULATION_NUM_VF_MSIX=6 \
PCI_SWITCH_EMULATION_ENABLE=1 \
PCI_SWITCH_EMULATION_NUM_PORT=15 \
PER_PF_NUM_SF=1 \
PF_TOTAL_SF=256 \
PF_BAR2_ENABLE=0 \
PF_SF_BAR_SIZE=8 \
NUM_VF_MSIX=0 \
SRIOV_EN=1

In hotplug virtio-net PFs and virtio-net SR-IOV VFs setups, only up to 15 hotplug devices are supported.

System Configuration

Host System Configuration

For hotplug device configuration, it is recommended to modify the hypervisor OS kernel boot parameters and add the options below:

pci=realloc

For SR-IOV configuration, first enable SR-IOV from the host. 

Refer to MLNX_OFED documentation under Features Overview and Configuration > Virtualization > Single Root IO Virtualization (SR-IOV) > Setting Up SR-IOV for instructions on how to do that.

Make sure to add the following options to Linux boot parameter.

intel_iommu=on iommu=pt

Add pci=assign-busses to the boot command line when creating more than 127 VFs. Without this option, the following errors may trigger from the host and the virtio driver would not probe those devices.

pci 0000:84:00.0: [1af4:1041] type 7f class 0xffffff
pci 0000:84:00.0: unknown header type 7f, ignoring device

Because the controller from the BlueField side provides hardware resources and acknowledges (ACKs) the request from the host's virtio-net driver, it is mandatory to reboot the host OS (or unload the virtio-net driver) first and the BlueField afterwards. This also applies to reconfiguring a controller from the BlueField platform (e.g., reconfiguring LAG). Unloading the virtio-net driver from host OS side is recommended.

BlueField System Configuration

Virtio-net full emulation is based on ASAP^2. For each virtio-net device created from host side, there is an SF representor created to represent the device from the BlueField side. It is necessary to have the SF representor in the same OVS bridge of the uplink representor.

The SF representor name is designed in a fixed pattern to map different type of devices.


Static PF

Hotplug PF

SR-IOV VF

SF Range

1000-1999

2000-2999

3000 and above

For example, the first static PF gets the SF representor of en3f0pf0sf1000 and the second hotplug PF gets the SF representor of en3f0pf0sf2001. It is recommended to verify the name of the SF representor from the sf_rep_net_device field in the output of virtnet list.

[dpu]# virtnet list
{
  ...
  "devices": [
    {
      "pf_id": 0,
      "function_type": "static PF",
      "transitional": 0,
      "vuid": "MT2151X03152VNETS0D0F2",
      "pci_bdf": "14:00.2",
      "pci_vhca_id": "0x2",
      "pci_max_vfs": "0",
      "enabled_vfs": "0",
      "msix_num_pool_size": 0,
      "min_msix_num": 0,
      "max_msix_num": 32,
      "min_num_of_qp": 0,
      "max_num_of_qp": 15,
      "qp_pool_size": 0,
      "num_msix": "64",
      "num_queues": "8",
      "enabled_queues": "7",
      "max_queue_size": "256",
      "msix_config_vector": "0x0",
      "mac": "D6:67:E7:09:47:D5",
      "link_status": "1",
      "max_queue_pairs": "3",
      "mtu": "1500",
      "speed": "25000",
      "rss_max_key_size": "0",
      "supported_hash_types": "0x0",
      "ctrl_mac": "D6:67:E7:09:47:D5",
      "ctrl_mq": "3",
      "sf_num": 1000,
      "sf_parent_device": "mlx5_0",
      "sf_parent_device_pci_addr": "0000:03:00.0",
      "sf_rep_net_device": "en3f0pf0sf1000",
      "sf_rep_net_ifindex": 15,
      "sf_rdma_device": "mlx5_4",
      "sf_cross_mkey": "0x18A42",
      "sf_vhca_id": "0x8C",
      "sf_rqt_num": "0x0",
      "aarfs": "disabled",
      "dim": "disabled"
    }
  ]
 }

Once SF representor name is located, add it to the same OVS bridge of the corresponding uplink representor and make sure the SF representor is up:

[dpu]# ovs-vsctl show
f2c431e5-f8df-4f37-95ce-aa0c7da738e0
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
                type: internal
        Port en3f0pf0sf0
            Interface en3f0pf0sf0
        Port p0
            Interface p0 
[dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf1000
[dpu]# ovs-vsctl show
f2c431e5-f8df-4f37-95ce-aa0c7da738e0
    Bridge ovsbr1
        Port ovsbr1
            Interface ovsbr1
                type: internal
        Port en3f0pf0sf0
            Interface en3f0pf0sf0
        Port en3f0pf0sf1000
            Interface en3f0pf0sf1000
        Port p0
            Interface p0
[dpu]# ip link set dev en3f0pf0sf1000 up

Usage

After firmware/system configuration and after system power cycle, the virtio-net devices should be ready to deploy.

First, make sure that mlxconfig options take effect correctly by issuing the following command: 

The output has a list with 3 columns: default configuration, current configuration, and next-boot configuration. Verify that the values under the 2nd column match the expected configuration.

[dpu]# mlxconfig -d 03:00.0 -e q | grep -i \*
*        PER_PF_NUM_SF                               False(0)        True(1)         True(1)
*        NUM_OF_VFS                                  16              0               0
*        PF_BAR2_ENABLE                              True(1)         False(0)        False(0)
*        PCI_SWITCH_EMULATION_NUM_PORT               0               8               8
*        PCI_SWITCH_EMULATION_ENABLE                 False(0)        True(1)         True(1)
*        VIRTIO_NET_EMULATION_ENABLE                 False(0)        True(1)         True(1)
*        VIRTIO_NET_EMULATION_NUM_VF                 0               126             126
*        VIRTIO_NET_EMULATION_NUM_PF                 0               1               1
*        VIRTIO_NET_EMULATION_NUM_MSIX               2               64              64
*        VIRTIO_NET_EMULATION_NUM_VF_MSIX            0               64              64
*        PF_TOTAL_SF                                 0               508             508
*        PF_SF_BAR_SIZE                              0               8               8

If the system is configured correctly, virtio-net-controller service should be up and running. If the service does not appear as active, double check the firmware/system configurations above.

[dpu]# systemctl status virtio-net-controller.service
● virtio-net-controller.service - Nvidia VirtIO Net Controller Daemon
   Loaded: loaded (/etc/systemd/system/virtio-net-controller.service; enabled; vendor preset: disabled)
   Active: active (running)
     Docs: file:/opt/mellanox/mlnx_virtnet/README.md
 Main PID: 30715 (virtio_net_cont)
    Tasks: 55
   Memory: 11.7M
   CGroup: /system.slice/virtio-net-controller.service
           ├─30715 /usr/sbin/virtio_net_controller
           ├─30859 virtio_net_emu
           └─30860 virtio_net_ha 

To reload or restart the service, run:

[dpu]# systemctl restart virtio-net-controller.service

When using "force kill" (i.e., kill -9 or kill -SIGKILL) for the virtio-net-controller service, users should use kill -9 -<pid of virtio_net_controller process, i.e. 30715 in previous example> (note the dash "-" before the pid).

Hotplug PF Devices

Creating PF Devices
  1. To create a hotplug virtio-net device, run:

    [dpu]# virtnet hotplug -i mlx5_0 -f 0x0 -m 0C:C4:7A:FF:22:93 -t 1500 -n 3 -s 1024
    
    


    Refer to "Virtnet CLI Commands" for full usage.

    This command creates one hotplug virtio-net device with MAC address 0C:C4:7A:FF:22:93, MTU 1500, and 3 virtio queues with a depth of 1024 entries. The device is created on the physical port of mlx5_0. The device is uniquely identified by its index. This index is used to query and update device attributes. If the device is created successfully, an output similar to the following appears:

    {
      "bdf": "15:00.0",
      "vuid": "MT2151X03152VNETS1D0F0",
      "id": 0,
      "transitional": 0,
      "sf_rep_net_device": "en3f0pf0sf2000",
      "mac": "0C:C4:7A:FF:22:93",
      "errno": 0,
      "errstr": "Success"
    }
    
    
  2. Add the representor port of the device to the OVS bridge and bring it up. Run:

    [dpu]# ovs-vsctl add-port <bridge> en3f0pf0sf2000
    [dpu]# ip link set dev en3f0pf0sf2000 up
    
    

    Once steps 1-2 are completed, the virtio-net PCIe device should be available from hypervisor OS with the same PCIe BDF.

    [host]# lspci | grep -i virtio
    15:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
    
    
  3. Probe virtio-net driver (e.g., kernel driver): 

    [host]# modprobe -v virtio-pci && modprobe -v virtio-net
    
    


  4. The virtio-net device should be created. There are two ways to locate the net device:

    • Check the dmesg from the host side for the corresponding PCIe BDF:

      [host]# dmesg | tail -20 | grep 15:00.0 -A 10 | grep virtio_net
      [3908051.494493] virtio_net virtio2 ens2f0: renamed from eth0
      
      


    • Check all net devices and find the corresponding MAC address:

      [host]# ip link show | grep -i "0c:c4:7a:ff:22:93" -B 1
      31: ens2f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
          link/ether 0c:c4:7a:ff:22:93 brd ff:ff:ff:ff:ff:ff
      
      


  5. Check that the probed driver and its BDF match the output of the hotplug device:

    [host]# ethtool -i ens2f0
    driver: virtio_net
    version: 1.0.0
    firmware-version:
    expansion-rom-version:
    bus-info: 0000:15:00.0
    supports-statistics: yes
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: no
    
    


Now the hotplug virtio-net device is ready to use as a common network device.

Destroying PF Devices

To hot-unplug a virtio-net device, run:

[dpu]# virtnet unplug -p 0
{'id': '0x1'}
{
  "errno": 0,
  "errstr": "Success"
}

The hotplug device and its representor are destroyed.

Host-Aware Attention Button Mode (AB Mode)

The Host-Aware Attention Button (AB) mode provides a coordinated mechanism for device removal utilizing the PCIe Attention Button. This mode ensures the host operating system has the opportunity to gracefully shut down the device driver prior to physical or logical removal.

When utilizing AB mode (-w 3), the hotplug and unplug lifecycle operates as follows:

  1. Hotplug initiation: The device is created with AB mode awareness, a state that persists for the lifetime of the device.

  2. Unplug notification: Upon an unplug request, the controller sends an Attention Button notification to the host, signaling a pending device removal.

  3. Host acknowledgment: The host OS receives the notification and gracefully shuts down the associated driver.

  4. Clean removal: Once the host acknowledges the shutdown, the device is cleanly removed from the system.

Usage Commands

Creating a device with AB mode:

Bash
[dpu]# virtnet hotplug -i mlx5_0 -f 0x0 -m 0C:C4:7A:FF:22:93 -t 1500 -n 3 -s 1024 -w 3

Removing a device with AB mode:

Bash
[dpu]# virtnet unplug -p 0 -w 3

Mode Matching

Devices hotplugged using AB mode (-w 3) strictly require the same AB mode flag (-w 3) during the unplug operation. Attempting to unplug an AB-mode device without specifying -w 3 will fail and return a mode mismatch error.

Verification and Querying
Checking AB Mode Support

Run the virtnet list command to verify if the underlying controller supports AB mode. Look for the hp_host_aware_ab_supported key in the JSON output.

Bash
{
    "controller": {
        "hp_host_awareness_supported": "1",
        "hp_host_aware_ab_supported": "1"
    }
}

Querying Device HP Mode

To verify which hotplug mode a specific device was created with, use the virtnet query command. An awareness mode of 3 indicates AB mode.

Bash
{
    "hp_host_awareness_mode": 3
}

Default AB Mode Configuration

To automate AB mode for all hotplug and unplug operations without manually specifying the -w 3 flag on the CLI, configure the virtnet.conf file:

force_ab_hotplug_default=1

Behavior when enabled:

  • Hotplug operations automatically default to AB mode (-w 3).

  • Unplug operations automatically default to AB mode (-w 3).

  • Explicit -w CLI options will always override this configuration.

  • If the underlying Host Channel Adapter (HCA) does not support AB mode, the controller will safely fall back to mode 0.

SR-IOV VF Devices

Creating SR-IOV VF Devices

After configuring the firmware and BlueField/host system with correct configuration, users can create SR-IOV VFs.

The following procedure provides an example of creating one VF on top of one static PF:

  1. Locate the virtio-net PFs exposed to the host side:

    [host]# lspci | grep -i virtio
    14:00.2 Network controller: Red Hat, Inc. Virtio network device
    
    
  2. Verify that the PCIe BDF matches the backend device from the BlueField side:

    [dpu]# virtnet list
    {
      ...
       "devices": [
        {
          "pf_id": 0,
          "function_type": "static PF",
          "transitional": 0,
          "vuid": "MT2151X03152VNETS0D0F2",
          "pci_bdf": "14:00.2",
          "pci_vhca_id": "0x2",
          "pci_max_vfs": "0",
          "enabled_vfs": "0",
          "msix_num_pool_size": 0,
          "min_msix_num": 0,
          "max_msix_num": 32,
          "min_num_of_qp": 0,
          "max_num_of_qp": 15,
          "qp_pool_size": 0,
          "num_msix": "64",
          "num_queues": "8",
          "enabled_queues": "7",
          "max_queue_size": "256",
          "msix_config_vector": "0x0",
          "mac": "D6:67:E7:09:47:D5",
          "link_status": "1",
          "max_queue_pairs": "3",
          "mtu": "1500",
          "speed": "25000",
          "rss_max_key_size": "0",
          "supported_hash_types": "0x0",
          "ctrl_mac": "D6:67:E7:09:47:D5",
          "ctrl_mq": "3",
          "sf_num": 1000,
          "sf_parent_device": "mlx5_0",
          "sf_parent_device_pci_addr": "0000:03:00.0",
          "sf_rep_net_device": "en3f0pf0sf1000",
          "sf_rep_net_ifindex": 15,
          "sf_rdma_device": "mlx5_4",
          "sf_cross_mkey": "0x18A42",
          "sf_vhca_id": "0x8C",
          "sf_rqt_num": "0x0",
          "aarfs": "disabled",
          "dim": "disabled"
        }
      ]
     }
    
    
  3. Probe virtio_pci and virtio_net modules from the host:

    [host]# modprobe -v virtio-pci && modprobe -v virtio-net 
    
    

    The PF net device should be created.

    [host]# ip link show | grep -i "4A:82:E3:2E:96:AB" -B 1
    21: ens2f2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
        link/ether 4a:82:e3:2e:96:ab brd ff:ff:ff:ff:ff:ff
    
    

    The MAC address and PCIe BDF should match between the BlueField side (virtnet list) and host side (ethtool).

    [host]# ethtool -i ens2f2
    driver: virtio_net
    version: 1.0.0
    firmware-version:
    expansion-rom-version:
    bus-info: 0000:14:00.2
    supports-statistics: yes
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: no
    supports-priv-flags: no
    
    
  4. To create SR-IOV VF devices on the host, run the following command with the PF PCIe BDF (0000:14:00.2 in this example):

    [host]# echo 1 > /sys/bus/pci/drivers/virtio-pci/0000\:14\:00.2/sriov_numvfs
    
    

    1 extra virtio-net device is created from the host: 

    [host]# lspci | grep -i virtio
    14:00.2 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
    14:00.4 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
    
    

    The BlueField side shows the VF information from virtnet list as well:

    [dpu]# virtnet list
        ...
        {
          "vf_id": 0,
          "parent_pf_id": 0,
          "function_type": "VF",
          "transitional": 0,
          "vuid": "MT2151X03152VNETS0D0F2VF1",
          "pci_bdf": "14:00.4",
          "pci_vhca_id": "0xD",
          "pci_max_vfs": "0",
          "enabled_vfs": "0",
          "num_msix": "12",
          "num_queues": "8",
          "enabled_queues": "7",
          "max_queue_size": "256",
          "msix_config_vector": "0x0",
          "mac": "16:FF:A2:6E:6D:A9",
          "link_status": "1",
          "max_queue_pairs": "3",
          "mtu": "1500",
          "speed": "25000",
          "rss_max_key_size": "0",
          "supported_hash_types": "0x0",
          "ctrl_mac": "16:FF:A2:6E:6D:A9",
          "ctrl_mq": "3",
          "sf_num": 3000,
          "sf_parent_device": "mlx5_0",
          "sf_parent_device_pci_addr": "0000:03:00.0",
          "sf_rep_net_device": "en3f0pf0sf3000",
          "sf_rep_net_ifindex": 18,
          "sf_rdma_device": "mlx5_5",
          "sf_cross_mkey": "0x58A42",
          "sf_vhca_id": "0x8D",
          "sf_rqt_num": "0x0",
          "aarfs": "disabled",
          "dim": "disabled"
         }
    
    
  5. Add the corresponding SF representor to the OVS bridge as the virtio-net PF and bring it up. Run:

    [dpu]# ovs-vsctl add-port <bridge> en3f0pf0sf3000
    [dpu]# ip link set dev en3f0pf0sf3000 up
    
    

Now the VF is functional. 

SR-IOV enablement from the host side takes a few minutes. For example, it may take 5 minutes to create 504 VFs.

It is recommended to disable VF autoprobe before creating VFs.

[host]# echo 0 > /sys/bus/pci/drivers/virtio-pci/<virtio_pf_bdf>/sriov_drivers_autoprobe
[host]# echo <num_vfs> > /sys/bus/pci/drivers/virtio-pci/<virtio_pf_bdf>/sriov_numvfs

Users can pass through the VFs directly to the VM after finishing. If using the VFs inside the hypervisor OS is required, bind the VF PCIe BDF:

[host]# echo <virtio_vf_bdf> > /sys/bus/pci/drivers/virtio-pci/bind

Keep in mind to reenable the autoprobe for other use cases:

[host]# echo 1 > /sys/bus/pci/drivers/virtio-pci/<virtio_pf_bdf>/sriov_drivers_autoprobe

MAC addresses are randomly generated for the new virtual functions (VFs).

Creating VFs for the same PF on different threads may cause the hypervisor OS to hang.

Destroying SR-IOV VF Devices

To destroy SR-IOV VF devices on the host, run:

[host]# echo 0 > /sys/bus/pci/drivers/virtio-pci/<virtio_pf_bdf>/sriov_numvfs

When the echo command returns from the host OS, it does not necessarily mean the BlueField side has finished its operations. To verify that the BlueField is done, and it is safe to recreate the VFs, either:

  • Check controller log from the BlueField and make sure you see a log entry similar to the following:

    [dpu]# journalctl -u virtio-net-controller.service -n 3 -f
    virtio-net-controller[5602]: [INFO] virtnet.c:675:virtnet_device_vfs_unload: static PF[0], Unload (1) VFs finished
    
    
  • Query the last VF from the BlueField side:

    [dpu]# virtnet query -p 0 -v 0 -b
    {'all': '0x0', 'vf': '0x0', 'pf': '0x0', 'dbg_stats': '0x0', 'brief': '0x1', 'latency_stats': '0x0', 'stats_clear': '0x0'}
    {
      "Error": "Device doesn't exist"
    }
    
    

Once VFs are destroyed, SFs created for virtio-net from the BlueField side are not destroyed but are saved into the SF pool for reuse later.

Restarting virtio-net-controller service while performing device create/destroy for either hotplug or VF is unsupported.

Assigning Virtio-net Device to VM

All virtio-net devices (static/hotplug PF and VF) support PCIe passthrough to a VM. PCIe passthrough allows the device to get better performance in the VM.

Assigning a virtio-net device to a VM can be done via virt-manager or virsh command.

Locating Virtio-net Devices

All virtio-net devices can be scanned by the PCIe subsystem in hypervisor OS and displayed as a standard PCIe device. Run the following command to locate the virtio-net devices devices with its PCIe BDF.

[host]# lspci | grep 'Virtio network'
00:09.1 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)

Using virt-manager

Start virt-manager, run the following command:

[host]# virt-manager

Make sure your system has xterm enabled to show the virt-manager GUI.

Double-click the virtual machine and open its Properties. Navigate to Details → Add hardware → PCIe host device.

image2019-3-8_12-50-6.png

Choose a virtio-net device virtual function according to its PCIe device (e.g., 00:09.1), reboot or start the VM.

Using virsh Command

  1. Run the following command to get the VM list and select the target VM by Name field:

    [host]# virsh list --all
     Id   Name                           State
    ----------------------------------------------
     1    host-101-CentOS-8.5           running
    
    
  2. Edit the VMs XML file, run: 

    [host]# virsh edit <VM_NAME>
    
    
  3. Assign the target virtio-net device PCIe BDF to the VM, using vfio as driver, replace BUS/SLOT/FUNCTION/BUS_IN_VM/SLOT_IN_VM/FUNCTION_IN_VM with corresponding settings.

    XML
    <hostdev mode='subsystem' type='pci' managed='no'>
      <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='<#BUS>' slot='<#SLOT>' function='<#FUNCTION>'/>
        </source>
      <address type='pci' domain='0x0000' bus='<#BUS_IN_VM>' slot='<#SLOT_IN_VM>' function='<#FUNCTION_IN_VM>'/>
    </hostdev>
    
    

    For example, assign target device 00.09.1 to the VM and its PCIe BDF within the VM is 01:00.0

    <hostdev mode='subsystem' type='pci' managed='no'>
      <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x00' slot='0x09' function='0x1'/>
        </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </hostdev>
    
    
  4. Destroy the VM if it is already started:

    [host]# virsh destory <VM_NAME>
    
    
  5. Start the VM with new XML configuration:

    [host]# virsh start <VM_NAME>
    
    

Configuration File

Configuration File Options

The controller service has an optional JSON format configuration file which allows users to customize several parameters. The configuration file should be defined on the DPU at /opt/mellanox/mlnx_virtnet/virtnet.conf. This file is read every time the controller starts. 

Controller systemd service should be restarted when there is configuration file change. Dynamic change of virtnet.conf is not supported.

In DOCA 3.3.0, the mrg_rxbuf and packed_vq parameters will no longer be supported in the virtnet.conf configuration file. Configuration of the mrg_rxbuf and packed_vq features must now be performed using the CLI modify device option.

Parameter

Default Value

Type

Description

ib_dev_p0

mlx5_0

String

RDMA device (e.g., mlx5_0) used to create SF on port 0. This port is the EMU manager when is_lag is 0.

ib_dev_p1

mlx5_1

String

RDMA device (e.g., mlx5_1) used to create SF on port 1

ib_dev_for_static_pf

mlx5_0

String

The RDMA device (e.g., mlx5_0) which the static virtio PF is created on

ib_dev_lag

Null

String

RDMA LAG device (e.g., mlx5_bond_0) used to create SF on LAG. Default value is mlx5_bond_0. This port is EMU manager when is_lag is 1. ib_dev_lag and ib_dev_p0/ib_dev_p1 cannot be configured simultaneously.

static_pf

N/A

List

The following sub-parameters can be used to configure the static PF:

Sub-parameter

Default Value

Type

Description

mac_base

Null

String

Base MAC address for static PFs. MACs are automatically assigned with the following pattern: mac_basepf_0, mac_base + 1pf_1, etc. 

Controller does not validate the MAC address (other than its length). The user must ensure the MAC is valid and unique.

The virtio driver on the host OS must be unloaded when restarting the controller if the mac_base setting is enabled for the first time or is modified.

features

Auto

Number

Virtio spec-defined feature bits for static PFs.

If unsure, leave features out of the JSON file and a default value is automatically assigned. The default value is determined dynamically when controller starts. Refer to section "DOCA Virtio-net Service Guide | Feature Bits" for more information.

mtu

1500

Number

The maximum transmission unit for the PF, can be ≤ 9216


is_lag

0

Number

Specifies whether LAG is used

If LAG is used, make sure to use the correct IB dev for static PF

single_port

0

Number

Specifies whether the DPU is a single port device. It is mutually exclusive with is_lag.

recovery

1

Number

Specifies whether recovery is enabled. If unspecified, recovery is enabled by default. To disable it, set recovery to 0. Refer to section "DOCA Virtio-net Service Guide | Recovery" for the items which are recovered and more information.

sf_pool_percent

0

Number

Determines the initial SF pool size as the percentage of PF_TOTAL_SF of mlxconfig. Valid range: [0, 100]. For instance, if the value is 5, an SF pool with 5% of PF_TOTAL_SF is created. 0 indicates that no SF pool is reserved beforehand (default).

PF_TOTAL_SF is shared by all applications. The user must ensure that the percent request is guaranteed, or else the controller would not be able to reserve the requested SFs resulting in failure.

sf_pool_force_destroy

0

Number

Specifies whether to destroy the SF pool. When set to 1, the controller destroys the SF pool when stopped/restarted (and the SF pool is recreated if sf_pool_percent is not 0 when starting). Otherwise, it does not. Default value is 0.

dpa_core_start

0

Number

Specifies the start DPA core for virtnet application. Valid only for NVIDIA® BlueField®-3 and up. Value must be greater than 0 and less than 11. Together with dpa_core_end, dpa_core_start defines how many DPA cores are used for the virtio-net data plane. 

This is advanced options when there are multiple DPA applications running at the same time. Regular user should keep this option as default.

The number of cores/EUs impacts the maximum number of VQs that can be created.

dpa_core_end

10

Number

Specifies the end DPA core for virtnet application. Valid only for BlueField-3 and up. Value must be greater than dpa_core_start and less than 11.

vf

N/A

List

The following sub-parameters can be used to configure the VF:

Sub-parameter

Default Value

Type

Description

mac_base

Null

String

Base MAC address for VFs. MACs are automatically assigned with the following pattern: mac_basevf_0, mac_base + 1vf_1, etc. 

Controller does not validate the MAC address (other than its length). The user must ensure the MAC is valid and unique.

The virtio driver on the guest OS must be unloaded when restarting the controller if themac_base setting is enabled for the first time or is modified.

features

Auto

Number

Virtio spec-defined feature bits for VFs.

If unsure, leave features out of the JSON file and a default value is automatically assigned. The default value is determined dynamically when controller starts. Refer to section "DOCA Virtio-net Service Guide | Feature Bits" for more information.

vfs_per_pf

0

Number

The number of VFs to create on each PF. For example: if vfs_per_pf is 100, then vf_0 on pf_1 will use mac_base + 100  as its MAC.

vfs_per_pfVIRTIO_NET_EMULATION_NUM_VF in mlxconfig.

User is responsible for ensuring, on each static PF, that the created VFs ≤ vfs_per_pf.

This parameter is mandatory if mac_base is specified.

max_queue_pairs

Auto

Number

Number of queue pairs to use. If not specified, default queue pair number is inherited from the parent PF.

max_queue_size

Auto

Number

Virtqueue size (i.e., vq depth) to use. If not specified, default vq size is inherited from the parent PF.

mtu

1500

Number

Maximum transmission unit for the VF, can be  9216.

virtio_spec_admin_legacy

0

Number

Enable (1) or disable (0) virtio spec legacy interface commands.

virtio_spec_admin_lm

0

Number

Enable (1) or disable (0) virtio spec live migration commands.

dpa_partition

N/A

String

DPA partition configuration file full path. Refer to section "DOCA Virtio-net Service Guide | DPA Configuration (SPRD)" for more information.

The DPA partition conf file is generated by the DOCA dpa-resource-mgmt tool. Refer to DOCA DPA Tools

Configuration requirements:

  • App name is virtio-net

  • The minimum number of EUs is 32

  • EU groups are not supported

Example of config.yaml input to dpa-resource-mgmt:

---
version: 25.04
---
DPA_APPS:
  virtio-net:
    - partition: ROOT
      affinity_core:
        - core: 1
          num_EUs: 16
        - core: 3
          num_EUs: 16
        - core: 5
          num_EUs: 16

Operational Constraint: ECPF Partitioning

For virtio-net deployments on the DPU, the DPA application resides on the ECPF device. Consequently, ECPF interactions are restricted to the default ROOT partition. You must utilize dpa-resource-mgmt to initialize the ROOT partition on the active device (e.g., mlx5_0, ib_dev_p0, or ib_dev_lag).

Parameter Precedence

Specifying a dpa_partition overrides individual core allocations, causing the dpa_core_start and dpa_core_end parameters to be ignored.

The number of cores/EUs impacts the maximum number of VQs that can be created.

eth_vq_workers

0

Number

Specifies the number of worker threads allocated for Ethernet Virtual Queue (VQ) lifecycle operations, such as queue creation and destruction.

  • Performance tuning: Because these operations are strictly I/O-bound and subject to firmware latency, configuring more threads than available CPU cores yields better performance. The underlying firmware supports 8-way parallelism, making an allocation of 8 to 24 workers optimal.

  • Auto-calculation: If left undefined (or set to 0), the system automatically provisions workers using the formula num_cpu*4, bounded by a minimum of 8 and a maximum of 24.

admin_cmd_workers

0

Number

Specifies the number of worker threads allocated for administrative command operations.

  • Performance tuning: Similar to ETH VQ operations, administrative commands are I/O-bound. Provisioning more threads than available CPU cores enhances responsiveness. This parameter is designed to provide a conservative baseline for administrative command parallelism.

  • Auto-calculation: If left undefined (or set to 0), the system automatically provisions workers using the formula num_cpu*2, bounded by a minimum of 4 and a maximum of 16.

force_ab_hotplug_default

0

Number

To automate AB mode for all hotplug and unplug operations without manually specifying the -w 3 flag on the CLI.

event_publisher

N/A

JSON object

Configuration for virtio-net event notifications.

Sub-parameter

Default Value

Type

Description

enabled

false

Boolean

Master enable/disable switch

broker_url

""

String

Must be nats://127.0.0.1:4222

subject_prefix

"virtio.vf"

String

Prefix for NATS subjects

connect_timeout_ms

2000

Integer

Broker connect attempt timeout in milliseconds. Range: 100–30000. 0 uses the default.

reconnect_backoff_ms

1000

Integer

Backoff between reconnect attempts in milliseconds. Range: 100–60000. 0 uses the default.

max_queue_depth

4096

Integer

Bounded publish queue depth. Range: 16–65536. 0 uses the default.

If the event_publisher section is entirely missing from virtnet.conf, event publishing is disabled by default, which means the controller starts normally with zero overhead.

Configuration File Examples

Validate the JSON format of the configuration file before restarting the controller, especially the syntax and symbols. Otherwise, the controller may fail to start.

Configuring LAG on Dual Port BlueField

Refer to "Link Aggregation" documentation for information on configuring BlueField in LAG mode.

Refer to the "Link Aggregation" page for information on configuring virtio-net in LAG mode.

Configuring Static PF on Dual Port BlueField

The following configures all static PFs to use mlx5_0 (port 0) as the data path device in a non-LAG configuration, and the default MAC and features for the PF:

{
  "ib_dev_p0": "mlx5_0",
  "ib_dev_p1": "mlx5_1",
  "ib_dev_for_static_pf": "mlx5_0",
  "is_lag": 0,
  "static_pf": {
    "mac_base": "08:11:22:33:44:55",
    "features": "0x230047082b"
  }
}

Configuring VF Specific Options

The following configures VFs with default parameters. With this configuration, each PF assigns the MAC based on mac_base up to 126 VFs. Each VF creates 4 queue pairs, with each queue having a depth of 256.

If vfs_per_pf is less than the VIRTIO_NET_EMULATION_NUM_VF in mlxconfig, and more VFs are created, duplicated MACs would be assigned to different VFs.

{
  "vf": {
    "mac_base": "06:11:22:33:44:55",
    "features": "0x230047082b",
    "vfs_per_pf": 126,
    "max_queue_pairs": 4,
    "max_queue_size": 256
  }
}

Virtio Live Migration Settings

The following table provides an example of configurations for the new options introduced for VirtIO Live Migration:

virtio_spec_admin_legacy

virtio_spec_admin_lm

Expected Result

1

1

Enables both legacy interface and VFIO kernel Live Migration commands

1

0

Enables legacy interface commands only

0

1

Enables VFIO kernel Live Migration commands only

0

0

Disable both legacy interface and VFIO kernel live migration commands

-

-

Supports

VDPA

Live Migration solutions

DPA Configuration (SPRD)

The Single Point of Resource Distribution (SPRD) is a centralized orchestration system that manages Data Path Accelerator (DPA) Execution Unit (EU) allocation across applications and Virtual HCAs (VHCAs). By centralizing this process, SPRD prevents ad-hoc per-application sizing and eliminates resource over-commitment.

Administrators define EU assignments to partitions and applications within a single YAML configuration file. The SPRD system validates this configuration, programs the hardware partitions and EU groups accordingly, and generates per-partition configuration files for the applications to consume.

Control and Integration
  • CLI utility: The system is managed using the dpa-resource-mgmt command-line tool.

  • Virtio-net integration: The virtio-net controller consumes the SPRD-generated output file via the dpa_partition configuration option to inherit its managed EU affinity.

SPRD Configuration Workflow for Virtio-net
Inspect Available Resources

Query the DPU to verify the ROOT partition and determine the set of free EUs before assigning them to virtio-net

dpa-resource-mgmt query -t resources -d mlx5_0

Create an Input SPRD YAML

Create an input file (e.g., input_vnet.yaml) that explicitly maps the virtio-net application to the ROOT partition and defines the required EUs per core.

Example for a virtio-net "solo app" deployment:

version: 26.01
DPA_APPS:
   "virtio-net":
       - partition: ROOT
         affinity_core:
             - core: 0
               num_EUs: 16
             - core: 1
               num_EUs: 16
             - core: 2
               num_EUs: 16
             - core: 3
               num_EUs: 16
             - core: 4
               num_EUs: 16
             - core: 5
               num_EUs: 16
             - core: 6
               num_EUs: 16
             - core: 7
               num_EUs: 16
             - core: 8
               num_EUs: 16
             - core: 9
               num_EUs: 16

Generate the SPRD Output YAML

Run the configuration command to generate the output file.

dpa-resource-mgmt config -d mlx5_0 -f input_vnet.yaml -v

Naming Conventions

When utilizing the ROOT partition on a DPU, the generated output file automatically inherits the partition name (e.g., ROOT.yaml). This is the exact file you must reference in the virtio-net controller using the dpa_partition configuration option.

Example SPRD Output & Validation

The generation command produces a parsed YAML file utilized by the application. The affinity_EUs list contains the absolute EU IDs reserved for virtio-net. During execution, SPRD converts these absolute IDs into partition-relative indices.

Example ROOT.yaml Output:

YAML
version: 26.01
DPA_APPS:
-   name: virtio-net
    number_of_affinity_EUs: 160
    affinity_EUs: [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 19, 21, 22,
        23, 24, 26, 27, 28, 29, 30, 31, 33, 34, 36, 37, 39, 40, 41, 42, 43, 44, 45,
        48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 65, 66, 67, 69,
        70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90,
        92, 93, 94, 95, 96, 97, 98, 99, 101, 102, 103, 104, 105, 106, 107, 108, 109,
        110, 111, 112, 113, 114, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
        126, 127, 129, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 143,
        144, 145, 147, 148, 149, 150, 151, 152, 154, 155, 156, 157, 158, 159, 160,
        161, 162, 163, 165, 166, 167, 168, 169, 172, 173, 174, 175, 177, 178, 179,
        181, 182, 183, 184, 185, 187, 188, 189]

Performance Validation (BlueField-3 DPU)

This specific SPRD configuration was validated in a "solo app" scenario (where virtio-net was the exclusive DPA application assigned EUs) to ensure all data-plane threads ran optimally across the 160 managed EUs.

Hardware and test profile:

  • Devices: 32 full emulation virtio-net VFs (16 TX, 16 RX)

  • Queue Configuration: Virtqueue (VQ) depth of 1024; 31 Queue Pairs (QPs) per device.

  • CQE Moderation: Count = 32, Period = 32

  • Traffic Profile: testpmd UDP, 16 streams, 64-byte message size.

Measured aggregate performance:

  • RX-Only: 72.6 Mpps

  • TX-Only: 81.05 Mpps

  • Bidirectional (RX + TX): 100.36 Mpps total

Virtnet CLI Commands

User Front End CLI

To communicate with the virtio-net-controller backend service, a user frontend program, virtnet, is installed on the BlueField which is based on remote procedure call (RPC) protocol with JSON format output.

Hotplug

This command hotplugs a virtio-net PCIe PF device exposed to the host side.

Syntax

virtnet hotplug -i IB_DEVICE -m MAC -t MTU -n MAX_QUEUES -s MAX_QUEUE_SIZE [-h] [-u SF_NUM] [-f FEATURES] [-l] [-w HP_HOST_AWARENESS]

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--ib_device

-i

String

Yes

RDMA device (e.g., mlx5_0) of the physical port on top of which the hotplug device is created.

Options:

  • mlx5_0 – port 1

  • mlx5_1 – port 2

  • mlx5_bond_0 – LAG

--features

-f

Hex Number

No

Feature bits to be enabled in hex format. Refer to the "Virtio-net Feature Bits" page.

Note that some features are enabled by default. Query the device to show the supported bits.

--mac

-m

Number

Yes

MAC address of the virtio-net device.

Controller does not validate the MAC address (other than its length). The user must ensure MAC is valid and unique.

--mtu

-t

Number

Yes

Maximum transmission unit (MTU) size of the virtio-net device. It must be less than the uplink rep MTU size.

--num_queues

-n

Number

Yes

Mutually exclusive with max_queue_pairs

Max number of virt queues could be created for the virtio-net device. TX, RX, ctrl queues are counted separately (e.g., 3 has 1 TX VQ, 1 RX VQ, 1 Ctrl VQ).

This option will be depreciated in the future.

--max_queue_pairs

-qp

Number

Yes

Mutually exclusive with num_queues.

Number of data VQ pairs. One VQ pair has one TX queue and one RX queue. It does not count control or admin VQ. From the host side, it appears as Pre-set maximums->Combined in ethtool -l <virtio-dev>.

--max_queue_size

-s

Number

Yes

Maximum number of buffers in the virt queue, between 0x4 and 0x8000. Must be power of 2.

--sf_num

-u

Number

No

SF number to be used for this hotplug device, must between 2000 and 2999.

--legacy

-l

N/A

No

Create legacy (transitional) hotplug device

Relevant for BlueField-2 only.

--hp_host_awareness

-w

Number

No

This setting determines how the device interacts with the host during hot plug operations. The following modes are available:

Value

Mode

Description

0

Device default (Default)

The device operates in its default mode. For virtio-net, the default is host-aware.

1

Host-aware

The device monitors the host’s state and proceeds only after the host has completed initialization (e.g., after the PCIe bus has been fully scanned) and is ready for new PCIe connections

2

Host-unaware

The hot plug operation can proceed regardless of the host’s state. This means the operation may complete even if the host has not finished initialization or cannot respond to events. 

Using this mode may result in undefined behavior on systems with older BIOS or OS versions. It is not recommended for such environments.

3

Host-aware attention button

The device will be attached /de-attached in a graceful mode, using the PCIe attention button sequence.

For optimal stability and compatibility, it is recommended to use either the device default mode (0) or host-aware mode (1).

Always ensure that your system's BIOS and OS are up to date to avoid compatibility issues with hot plug features.

The virtnet list CLI command indicates whether the controller supports hot plug host awareness:

#virtnet list

{
        "controller":   {
                …
                "emulation_manager":    "mlx5_0",
                "max_hotplug_devices":  "15",
                "hp_host_awareness_supported":  "1",        // <===== new indication
                "max_virt_net_devices": "15",
                "max_virt_queues":      "256",
                ...

Output

Entry

Type

Description

bdf

String

The PCIe BDF (bus:device:function) number enumerated by host. The user should see this PCIe device from host side.

vuid

String

Unique device SN. It can be used as an index to query/modify/unplug this device.

id

Num

Unique device ID. It can be used as an index to query/modify/unplug this device.

transitional

Num

Is the current device a transitional hotplug device.

  • 0 – modern device

  • 1 – transitional device

sf_rep_net_device

String

The SF representor name represents the virtio-net device. It should be added into the OVS bridge.

mac

String

The hotplug virtio-net device MAC address

errno

Num

Error number if hotplug failed.

  • 0 – success

  • non-0 – failed

errstr

String

Explanation of the error number

Example

The following example of hot plugging one device with MAC address 0C:C4:7A:FF:22:93, MTU 1500, and 1 pair of virtual queue (QP) pair with a depth of 1024 entries. The device is created on the physical port of mlx5_0.

# virtnet hotplug -i mlx5_0 -m 0C:C4:7A:FF:22:93 -t 1500 -qp 1 -s 1024
{
  "bdf": "15:00.0",
  "vuid": "MT2151X03152VNETS1D0F0",
  "id": 0,
  "transitional": 0,
  "sf_rep_net_device": "en3f0pf0sf2000",
  "mac": "0C:C4:7A:FF:22:93",
  "errno": 0,
  "errstr": "Success"
}

Unplug

This command unplugs a virtio-net PCIe PF device.

Syntax

virtnet unplug [-h] [-p PF | -u VUID] [-w HP_HOST_AWARENESS] [-T HOTPLUG_POWER_OFF_TIMEOUT]

Only one of --pf  and --vuid is needed to unplug the device.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--pf

-p

Number

Yes

Unique device ID returned when doing hotplug. Can be retrieved by using virtnet list.

--vuid

-u

String

Yes

Unique device SN returned when doing hotplug. Can be retrieved by using virtnet list.

--hp_host_awareness

-w

Number

No

This setting determines how the device interacts with the host during hot unplug operations. The following modes are available:

Value

Mode

Description

0

Device default (Default)

The device operates in its default mode. For virtio-net, the default is host-aware.

1

Host-aware

The device monitors the host’s state and proceeds only after the host has completed initialization (e.g., after the PCIe bus has been fully scanned) and is ready for new PCIe connections

2

Host-unaware

The hot unplug operation can proceed regardless of the host’s state. This means the operation may complete even if the host has not finished initialization or cannot respond to events. 

Using this mode may result in undefined behavior on systems with older BIOS or OS versions. It is not recommended for such environments.

For optimal stability and compatibility, it is recommended to use either the device default mode (0) or host-aware mode (1).

Always ensure that your system's BIOS and OS are up to date to avoid compatibility issues with hot unplug features.

The virtnet list CLI command indicates whether the controller supports hot unplug host awareness:

#virtnet list

{
        "controller":   {
                …
                "emulation_manager":    "mlx5_0",
                "max_hotplug_devices":  "15",
                "hp_host_awareness_supported":  "1",        // <===== new indication
                "max_virt_net_devices": "15",
                "max_virt_queues":      "256",
                ...

--hotplug_poweroff_timeout

-T

Number

No

Specifies the duration (in seconds) the controller waits for the host OS to power off during an unplug operation before executing a forced unplug.

  • Valid range: 1 to 900 seconds.

  • Default value: 10 seconds.

  • Important constraint: This timeout parameter is not applicable and will be ignored if the device is operating in Host-Aware Attention Button (AB) mode.

Output

Entry

Type

Description

errno

Num

Error number if operation failed

  • 0 – success

  • non-0 – failed

errstr

String

Explanation of the error number

Example

Unplug-hotplug device using the PF ID:

# virtnet unplug -p 0
{'id': '0x1'}
{
  "errno": 0,
  "errstr": "Success"
}

List

This command lists all existing virtio-net devices, with global information and individual information for each device.

Syntax

virtnet list [-h]

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

Output

The output has two main sections. The first section wrapped by the controller are global configurations and capabilities.

Entry

Type

Description

controller

String

Entries under this section is global information for the controller

emulation_manager

String

The RDMA device manager used to manage internal resources. Should be default mlx5_0.

max_hotplug_devices

String

Maximum number of devices that can be hotpluged

max_virt_net_devices

String

Total number of emulated devices managed by the device emulation manager

max_virt_queues

String

Maximum number of virt queues supported per device

max_tunnel_descriptors

String

Maximum number of descriptors the device can send in a single tunnel request

supported_features

String

Total list of features supported by device

supported_virt_queue_types

String

Currently supported virt queue types: Packed and Split

supported_event_modes

String

Currently supported event modes: no_msix_mode, qp_mode, msix_mode

hp_host_awareness_supported

String

Indicates whether hot plug host awareness is supported

Each device has its own section under devices.

Entry

Type

Description

devices

String

Entries under this section is per device information

pf_id

Number

Physical function ID

function_type

String

Function type: Static PF, hotplug PF, VF

transitional

Number

The current device a transitional hotplug device:

  • 0 – modern device

  • 1 – transitional device

vuid

String

Unique device SN, it can be used as an index to query/modify/unplug a device

pci_bdf

String

Bus:device:function to describe the virtio-net PCIe device

pci_vhca_id

Number

Virtual HCA identifier for the general virtio-net device. For debug purposes only.

pci_max_vfs

Number

Maximum number of virtio-net VFs that can be created for this PF. Valid only for PFs.

enabled_vfs

Number

Currently enabled number of virtio-net VFs for this PF

msix_num_pool_size

Number

Number of free dynamic MSIX available for the VFs on this PF

min_msix_num

Number

The minimum number of dynamic MSI-Xs that can be set for an virtio-net VF

max_msix_num

Number

The maximum number of dynamic MSI-Xs that can be set for an virtio-net VF

min_num_of_qp

Number

The minimum number of dynamic data VQ pairs (i.e., each pair has one TX and 1 RX queue) that can be set for an virtio-net VF

max_num_of_qp

Number

The minimum number of dynamic data VQ pairs (i.e., each pair has one TX and 1 RX queue) that can be set for an virtio-net VF

qp_pool_size

Number

Number of free dynamic data VQ pairs (i.e., each pair has one TX and 1 RX queue) available for the VFs on this PF

num_msix

Number

Maximum number of MSI-X available for this device

num_queues

Number

Maximum virtual queues can be created for this device, driver can choose to create less

enabled_queues

Number

Currently enabled number of virtual queues by the driver

max_queues_size

Number

Maximum virtual queue depth in byte can be created for each VQ, driver can use less

msix_config_vector

String

MSIX vector number used by the driver for the virtio config space. 0xFFFF means that no vector is requested.

mac

String

The virtio-net device permanent MAC address, can be only changed from controller side via modify command

link_status

Number

Link status of the virtio-net device on the driver side

  • 0 – down

  • 1 – up

max_queue_pairs

Number

Number of data VQ pairs. One VQ pair has one TX queue and one RX queue. Control or admin VQ are not counted. From the host side, it appears as Pre-set maximums->Combined in ethtool -l <virtio-dev>.

mtu

Number

The virtio-net device MTU. Default is 1500.

speed

Number

The virtio-net device link speed in Mb/s

rss_max_key_size

Number

The maximum supported length of the RSS key. Only applicable when VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is enabled.

supported_hash_types

Number

Supported hash types for this device in hex. Only applicable when VIRTIO_NET_F_HASH_REPORT is enabled:

  • VIRTIO_NET_HASH_TYPE_IPv4 (bit 0)

  • VIRTIO_NET_HASH_TYPE_TCPv4 (bit 1)

  • VIRTIO_NET_HASH_TYPE_UDPv4 (bit 2)

  • VIRTIO_NET_HASH_TYPE_IPv6 (bit 3)

  • VIRTIO_NET_HASH_TYPE_TCPv6 (bit 4)

  • VIRTIO_NET_HASH_TYPE_UDPv6 (bit 5)

ctrl_mac

String

Admin MAC address configured by driver. Not persistent with driver reload or host reboot.

ctrl_mq

Number

Number of queue pairs/channels configured by the driver. From the host side, it appears as Current hardware settings->Combined in ethtool -l <virtio-dev>.

sf_num

Number

Scalable function number used for this virtio-net device

sf_parent_device

String

The RDMA device to use to create the SF

sf_parent_device_pci_addr

String

The PCIe device address (bus:device:function) to use to create the SF

sf_rep_net_device

String

Represents the virtio-net device

sf_rep_net_ifindex

Number

The SF representor network interface index

sf_rdma_device

String

The SF RDMA device interface name

sf_cross_mkey

Number

The cross-device MKEY created for the SF. For debug purposes only.

sf_vhca_id

Number

Virtual HCA identifier for the SF. For debug purposes only.

rqt_num

Number

The RQ table ID used for this virtio-net device. For debug purposes only.

aarfs

String

Whether Accelerated Receive Flow Steering configuration is enabled or disabled

dim

String

Whether dynamic interrupt moderation (DIM) is enabled or disabled

Example

The following is an example of a list with 1 static PF created:

# virtnet list
{
  "controller": {
    "emulation_manager": "mlx5_0",
    "max_hotplug_devices": "0",
    "max_virt_net_devices": "1",
    "max_virt_queues": "256",
    "max_tunnel_descriptors": "6",
    "supported_features": {
      "value": "0x8b00037700ef982f",
      "    0": "VIRTIO_NET_F_CSUM",
      "    1": "VIRTIO_NET_F_GUEST_CSUM",
      "    2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS",
      "    3": "VIRTIO_NET_F_MTU",
      "    5": "VIRTIO_NET_F_MAC",
      "   11": "VIRTIO_NET_F_HOST_TSO4",
      "   12": "VIRTIO_NET_F_HOST_TSO6",
      "   15": "VIRTIO_NET_F_MRG_RXBUF",
      "   16": "VIRTIO_NET_F_STATUS",
      "   17": "VIRTIO_NET_F_CTRL_VQ",
      "   18": "VIRTIO_NET_F_CTRL_RX",
      "   19": "VIRTIO_NET_F_CTRL_VLAN",
      "   21": "VIRTIO_NET_F_GUEST_ANNOUNCE",
      "   22": "VIRTIO_NET_F_MQ",
      "   23": "VIRTIO_NET_F_CTRL_MAC_ADDR",
      "   32": "VIRTIO_F_VERSION_1",
      "   33": "VIRTIO_F_IOMMU_PLATFORM",
      "   34": "VIRTIO_F_RING_PACKED",
      "   36": "VIRTIO_F_ORDER_PLATFORM",
      "   37": "VIRTIO_F_SR_IOV",
      "   38": "VIRTIO_F_NOTIFICATION_DATA",
      "   40": "VIRTIO_F_RING_RESET",
      "   41": "VIRTIO_F_ADMIN_VQ",
      "   56": "VIRTIO_NET_F_HOST_USO",
      "   57": "VIRTIO_NET_F_HASH_REPORT",
      "   59": "VIRTIO_NET_F_GUEST_HDRLEN",
      "   63": "VIRTIO_NET_F_SPEED_DUPLEX"
    },
    "supported_virt_queue_types": {
      "value": "0x1",
      "    0": "SPLIT"
    },
    "supported_event_modes": {
      "value": "0x5",
      "    0": "NO_MSIX_MODE",
      "    2": "MSIX_MODE"
    }
  },
  "devices": [
    {
      "pf_id": 0,
      "function_type": "static PF",
      "transitional": 0,
      "vuid": "MT2306XZ00BNVNETS0D0F2",
      "pci_bdf": "e2:00.2",
      "pci_vhca_id": "0x2",
      "pci_max_vfs": "0",
      "enabled_vfs": "0",
      "msix_num_pool_size": 0,
      "min_msix_num": 0,
      "max_msix_num": 256,
      "min_num_of_qp": 0,
      "max_num_of_qp": 127,
      "qp_pool_size": 0,
      "num_msix": "256",
      "num_queues": "255",
      "enabled_queues": "0",
      "max_queue_size": "256",
      "msix_config_vector": "0xFFFF",
      "mac": "16:B0:E0:41:B8:0D",
      "link_status": "1",
      "max_queue_pairs": "127",
      "mtu": "1500",
      "speed": "100000",
      "rss_max_key_size": "0",
      "supported_hash_types": "0x0",
      "ctrl_mac": "00:00:00:00:00:00",
      "ctrl_mq": "0",
      "sf_num": 1000,
      "sf_parent_device": "mlx5_0",
      "sf_parent_device_pci_addr": "0000:03:00.0",
      "sf_rep_net_device": "en3f0pf0sf1000",
      "sf_rep_net_ifindex": 10,
      "sf_rdma_device": "mlx5_3",
      "sf_cross_mkey": "0x12642",
      "sf_vhca_id": "0x124",
      "sf_rqt_num": "0x0",
      "aarfs": "disabled",
      "dim": "disabled"
    }
  ]
}

Query

This command queries detailed information for a given device, including all VQ information if created.

Syntax

virtnet query [-h] {[-a] | [-p PF] [-v VF] | [-u VUID]} [--dbg_stats] [-b] [--latency_stats] [-q QUEUE_ID] [--stats_clear] [rx_drops [status|--drops-only]]

The options --pf--vf--vuid, and --all are mutually exclusive (except --pf and --vf which can be used together), but one of them must be applied.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--all

-a

N/A

No

Query all the detailed information for all available devices. It can be time consuming if a large number of devices is available.

--pf

-p

Number

No

Unique device ID for the PF. Can be retrieved by using virtnet list.

--vf

-v

Number

No

Unique device ID for the VF. Can be retrieved by using virtnet list.

--vuid

-u

String

No

Unique device SN for the device (PF/VF). Can be retrieved by using virtnet list.

--queue_id

-q

Number

No

Queue index of the device VQs

--brief

-b

N/A

No

Query brief information of the device (does not print VQ information)

--dbg_stats

N/A

N/A

No

Print debug counters and information

This option will be depreciated in the future.

--stats_clear

N/A

N/A

No

Clear all the debug counter stats

This option will be depreciated in the future.

rx_drops

N/A

N/A

No

Query RX drop counters. Shows total drops across all devices with a list of devices that have non-zero drops. Works in both sync and async modes.

When used with -p and/or -vrx_drops shows the total drop count for that specific device only.

rx_drops status

N/A

N/A

No

Show current async polling mode configuration.

rx_drops --drops-only

N/A

N/A

No

Show devices with drops, including per-RQ breakdown for each device.


Output

Output has two main sections.

  • The first section, wrapped by devices, are configuration and capabilities on the device level, the majority of which are the same as the list command. This section only covers the differences between the two.

    Entry

    Type

    Description

    devices

    String

    Entries under this section is per-device information

    pci_dev_id

    String

    Virtio-net PCIe device ID. Default: 0x1041. 

    This option will be depreciated in the future.

    pci_vendor_id

    String

    Virtio-net PCIe vendor ID. Default: 0x1af4.

    This option will be depreciated in the future.

    pci_class_code

    String

    Virtio-net PCIe device class code. Default: 0x20000. 

    This option will be depreciated in the future.

    pci_subsys_id

    String

    Virtio-net PCIe vendor ID. Default: 0x1041.

    This option will be depreciated in the future.

    pci_subsys_vendor_id

    String

    Virtio-net PCIe subsystem vendor ID. Default: 0x1af4. 

    This option will be depreciated in the future.

    pci_revision_id

    String

    Virtio-net PCIe revision ID. Default: 1.

    This option will be depreciated in the future.

    device_features

    String

    Enabled device feature bits according to the virtio spec. Refer to section "

    DOCA Virtio-net Service Guide | Feature Bits

    ".

    driver_features

    String

    Enabled driver feature bits according to the virtio spec. Valid only when the driver probes the device. Refer to "

    DOCA Virtio-net Service Guide | Feature Bits

    ".

    status

    String

    Device status field bit masks according to the virtio spec:

    • ACKNOWLEDGE (bit 0)

    • DRIVER (bit 1)

    • DRIVER_OK (bit 2)

    • FEATURES_OK (bit 3)

    • DEVICE_NEEDS_RESET (bit 6)

    • FAILED (bit 7)

    reset

    Number

    Shows if the current virtio-net device undergoing reset:

    • 0 – not undergoing reset

    • 1 – undergoing reset

    enabled

    Number

    Shows if the current virtio-net device is enabled:

    • 0 – disabled, likely FLR has occurred

    • 1 – enabled


  • The second section, wrapped by enabled-queues-info, provides per-VQ information:

    Entry

    Type

    Description

    index

    Number

    VQ index starting from 0 to enabled_queues

    size

    Number

    Driver VQ depth in bytes. It is bound by device max_queues_size.

    msix_vector

    Number

    The MSI-X vector number used for this VQ

    enable

    Number

    If current VQ is enabled or not

    • 0 – disabled

    • 1 – enabled

    notify_offset

    Number

    Driver reads this to calculate the offset from start of notification structure at which this virtqueue is located

    descriptor_address

    Number

    The physical address of the descriptor area

    driver_address

    Number

    The physical address of the driver area

    device_address

    Number

    The physical address of the device area

    received_desc

    Number

    Total number of received descriptors by the device on this VQ

    This option will be depreciated in the future.

    completed_desc

    Number

    Total number of completed descriptors by the device on this VQ

    This option will be depreciated in the future.

    bad_desc_errors

    Number

    Total number of bad descriptors received on this VQ

    This option will be depreciated in the future.

    error_cqes

    Number

    Total number of error CQ entries on this VQ

    This option will be depreciated in the future.

    exceed_max_chain

    Number

    Total number of chained descriptors received that exceed the maximum allowed chain by device

    This option will be depreciated in the future.

    invalid_buffer

    Number

    Total number of times the device tried to read or write buffer that is not registered to the device

    This option will be depreciated in the future.

    batch_number

    Number

    The number of RX descriptors for the last received packet. Relevant for BlueField-3 only.

    This option will be depreciated in the future.

    dma_q_used_number

    Number

    The DMA q index used for this VQ. Relevant for BlueField-3 only.

    This option will be depreciated in the future.

    handler_schd_number

    Number

    Scheduler number for this VQ. Relevant for BlueField-3 only.

    This option will be depreciated in the future.

    aux_handler_schd_number

    Number

    Aux scheduler number for this VQ. Relevant for BlueField-3 only.

    This option will be depreciated in the future.

    max_post_desc_number

    Number

    Maximum number of posted descriptors on this VQ. Relevant for DPA.

    This option will be depreciated in the future.

    total_bytes

    Number

    Total number of bytes handled by this VQ. Relevant for BlueField-3 only

    This option will be depreciated in the future.

    rq_cq_max_count

    Number

    Event generation moderation counter of the queue. Relevant for RQ.

    This option will be depreciated in the future.

    rq_cq_period

    Number

    Event generation moderation timer for the queue in 1µsec granularity. Relevant for RQ.

    This option will be depreciated in the future.

    rq_cq_period_mode

    Number

    Current period mode for RQ

    • 0x0 – default_mode – use device best defaults

    • 0x1 – upon_eventqueue_period timer restarts upon event generation

    • 0x2 – upon_cqequeue_period timer restarts upon completion generation

    This option will be depreciated in the future.

Example

The following is an example of querying the information of the first PF:

# virtnet query -p 0
{
  "devices": [
    {
      "pf_id": 0,
      "function_type": "static PF",
      "transitional": 0,
      "vuid": "MT2349X00018VNETS0D0F1",
      "pci_bdf": "23:00.1",
      "pci_vhca_id": "0x1",
      "pci_max_vfs": "0",
      "enabled_vfs": "0",
      "pci_dev_id": "0x1041",
      "pci_vendor_id": "0x1af4",
      "pci_class_code": "0x20000",
      "pci_subsys_id": "0x1041",
      "pci_subsys_vendor_id": "0x1af4",
      "pci_revision_id": "1",
       "device_feature": {
        "value": "0x8930032300ef182f",
        "    0": "VIRTIO_NET_F_CSUM",
        "    1": "VIRTIO_NET_F_GUEST_CSUM",
        "    2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS",
        "    3": "VIRTIO_NET_F_MTU",
        "    5": "VIRTIO_NET_F_MAC",
        "   11": "VIRTIO_NET_F_HOST_TSO4",
        "   12": "VIRTIO_NET_F_HOST_TSO6",
        "   16": "VIRTIO_NET_F_STATUS",
        "   17": "VIRTIO_NET_F_CTRL_VQ",
        "   18": "VIRTIO_NET_F_CTRL_RX",
        "   19": "VIRTIO_NET_F_CTRL_VLAN",
        "   21": "VIRTIO_NET_F_GUEST_ANNOUNCE",
        "   22": "VIRTIO_NET_F_MQ",
        "   23": "VIRTIO_NET_F_CTRL_MAC_ADDR",
        "   32": "VIRTIO_F_VERSION_1",
        "   33": "VIRTIO_F_IOMMU_PLATFORM",
        "   37": "VIRTIO_F_SR_IOV",
        "   40": "VIRTIO_F_RING_RESET",
        "   41": "VIRTIO_F_ADMIN_VQ",
        "   52": "VIRTIO_NET_F_VQ_NOTF_COAL",
        "   53": "VIRTIO_NET_F_NOTF_COAL",
        "   56": "VIRTIO_NET_F_HOST_USO",
        "   59": "VIRTIO_NET_F_GUEST_HDRLEN",
        "   63": "VIRTIO_NET_F_SPEED_DUPLEX"
      },
      "driver_feature": {
        "value": "0x8000002300ef182f",
        "    0": "VIRTIO_NET_F_CSUM",
        "    1": "VIRTIO_NET_F_GUEST_CSUM",
        "    2": "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS",
        "    3": "VIRTIO_NET_F_MTU",
        "    5": "VIRTIO_NET_F_MAC",
        "   11": "VIRTIO_NET_F_HOST_TSO4",
        "   12": "VIRTIO_NET_F_HOST_TSO6",
        "   16": "VIRTIO_NET_F_STATUS",
        "   17": "VIRTIO_NET_F_CTRL_VQ",
        "   18": "VIRTIO_NET_F_CTRL_RX",
        "   19": "VIRTIO_NET_F_CTRL_VLAN",
        "   21": "VIRTIO_NET_F_GUEST_ANNOUNCE",
        "   22": "VIRTIO_NET_F_MQ",
        "   23": "VIRTIO_NET_F_CTRL_MAC_ADDR",
        "   32": "VIRTIO_F_VERSION_1",
        "   33": "VIRTIO_F_IOMMU_PLATFORM",
        "   37": "VIRTIO_F_SR_IOV",
        "   63": "VIRTIO_NET_F_SPEED_DUPLEX"
      },
      "status": {
        "value": "0xf",
        "    0": "ACK",
        "    1": "DRIVER",
        "    2": "DRIVER_OK",
        "    3": "FEATURES_OK"
      },
      "reset": "0",
      "enabled": "1",
      "num_msix": "64",
      "num_queues": "63",
      "enabled_queues": "63",
      "max_queue_size": "256",
      "msix_config_vector": "0x0",
      "mac": "4E:6A:E1:41:D8:BE",
      "link_status": "1",
      "max_queue_pairs": "31",
      "mtu": "1500",
      "speed": "200000",
      "rss_max_key_size": "0",
      "supported_hash_types": "0x0",
      "ctrl_mac": "4E:6A:E1:41:D8:BE",
      "ctrl_mq": "31",
      "sf_num": 1000,
      "sf_parent_device": "mlx5_0",
      "sf_parent_device_pci_addr": "0000:03:00.0",
      "sf_rep_net_device": "en3f0pf0sf1000",
      "sf_rep_net_ifindex": 12,
      "sf_rdma_device": "mlx5_2",
      "sf_cross_mkey": "0xC042",
      "sf_vhca_id": "0x7E8",
      "sf_rqt_num": "0x0",
      "aarfs": "disabled",
      "dim": "disabled",
      "enabled-queues-info": [
        {
          "index": "0",
          "size": "256",
          "msix_vector": "0x1",
          "enable": "1",
          "notify_offset": "0",
          "descriptor_address": "0x10cece000",
          "driver_address": "0x10cecf000",
          "device_address": "0x10cecf240",
          "received_desc": "256",
          "completed_desc": "0",
          "bad_desc_errors": "0",
          "error_cqes": "0",
          "exceed_max_chain": "0",
          "invalid_buffer": "0",
          "batch_number": "64",
          "dma_q_used_number": "6",
          "handler_schd_number": "4",
          "aux_handler_schd_number": "3",
          "max_post_desc_number": "0",
          "total_bytes": "0",
          "rq_cq_max_count": "0",
          "rq_cq_period": "0",
          "rq_cq_period_mode": "1"
        },
        ......
        }
      ]
    }
  ]
}

Stats

This command is recommended for obtaining all packet counter information. The existing packet counter information available using the virtnet list and virtnet query commands, but will be deprecated in the future.

This command retrieves the packet counters for a specified device, including detailed information for all Rx and Tx virtqueues (VQs).

To enable/disable byte wise packet counters for each Rx queue, use the following command:

virtnet modify {[-p PF] [-v VF]} device -pkt_cnt {enable,disable}

  • When enabled, byte-wise packet counters are initialized to zero.

  • When disabled, the previous values are retained for debugging purposes. The command will still return these old, disabled counter values.

Packet counters are attached to an RQ. Thus, RQ must be created first. This means that the virtio-net device should be probed by the driver on the host OS before running the commands above.

Syntax

virtnet stats [-h] {[-p PF] [-v VF] | [-u VUID]} [-q QUEUE_ID]

The options --pf--vf, and --vuid are mutually exclusive (except --pf and --vf which can be used together), but one of them must be applied.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--pf

-p

Number

No

Unique device ID for the PF. Can be retrieved by using virtnet list.

--vf

-v

Number

No

Unique device ID for the VF. Can be retrieved by using virtnet list.

--vuid

-u

String

No

Unique device SN for the device (PF/VF). Can be retrieved by using virtnet list.

--queue_id

-q

Number

No

Queue index of the device RQs or SQs

Output

The output has two sections.

  • The first section wrapped by device are device details along with the packet counter statics enable state.

    Entry

    Type

    Description

    device

    String

    Entries under this section is per-device information

    pf_id

    String

    Physical function ID

    packet_counters

    String

    Indicates whether the packet counters feature is enabled or disabled

  • The second section wrapped by queues-stats are information for each receive VQ.

    Entry

    Type

    Description

    VQ Index

    Number

    The VQ index starts at 0 (the first RQ) and continues up to the last SQ

    rx_64_or_less_octet_packets

    Number

    The number of packets received with a size of 0 to 64 bytes. Relevant for BlueField-3 RQ.

    rx_65_to_127_octet_packets

    Number

    The number of packets received with a size of 65 to 127 bytes. Relevant for BlueField-3 RQ.

    rx_128_to_255_octet_packets

    Number

    The number of packets received with a size of 128 to 255 bytes. Relevant for BlueField-3 RQ.

    rx_256_to_511_octet_packets

    Number

    The number of packets received with a size of 256 to 511 bytes. Relevant for BlueField-3 RQ.

    rx_512_to_1023_octet_packets

    Number

    The number of packets received with a size of 512 to 1023 bytes. Relevant for BlueField-3 RQ.

    rx_1024_to_1522_octet_packets

    Number

    The number of packets received with a size of 1024 to 1522 bytes. Relevant for BlueField-3 RQ.

    rx_1523_to_2047_octet_packets

    Number

    The number of packets received with a size of 1523 to 2047 bytes. Relevant for BlueField-3 RQ.

    rx_2048_to_4095_octet_packets

    Number

    The number of packets received with a size of 2048 to 4095 bytes. Relevant for BlueField-3 RQ.

    rx_4096_to_8191_octet_packets

    Number

    The number of packets received with a size of 4096 to 8191 bytes. Relevant for BlueField-3 RQ.

    rx_8192_to_9022_octet_packets

    Number

    The number of packets received with a size of 8192 to 9022 bytes. Relevant for BlueField-3 RQ.

    received_desc

    Number

    Total number of received descriptors by the device on this VQ

    completed_desc

    Number

    Total number of completed descriptors by the device on this VQ

    bad_desc_errors

    Number

    Total number of bad descriptors received on this VQ

    error_cqes

    Number

    Total number of error CQ entries on this VQ

    exceed_max_chain

    Number

    Total number of chained descriptors received that exceed the max allowed chain by device

    invalid_buffer

    Number

    Total number of times the device tried to read or write a buffer which is not registered to the device

    batch_number

    Number

    The number of RX descriptors for the last received packet. Relevant for BlueField-3.

    dma_q_used_number

    Number

    The DMA q index used for this VQ. Relevant for BlueField-3.

    handler_schd_number

    Number

    Scheduler number for this VQ. Relevant for BlueField-3.

    aux_handler_schd_number

    Number

    Aux scheduler number for this VQ. Relevant for BlueField-3.

    max_post_desc_number

    Number

    Maximum number of posted descriptors on this VQ. Relevant for DPA.

    total_bytes

    Number

    Total number of bytes handled by this VQ. Relevant for BlueField-3.

    rq_cq_max_count

    Number

    Event generation moderation counter of the queue. Relevant for RQ.

    rq_cq_period

    Number

    Event generation moderation timer for the queue in 1µsec granularity. Relevant for RQ.

    rq_cq_period_mode

    Number

    Current period mode for RQ

    • 0x0 – default_mode – use device best defaults

    • 0x1 – upon_eventqueue_period timer restarts upon event generation

    • 0x2 – upon_cqequeue_period timer restarts upon completion generation

Example

The following is an example of querying the packet statistics information of PF 0 and VQ 0 (i.e., RQ):

# virtnet stats -p 0 -q 0
{'pf': '0x0', 'queue_id': '0x0'}
{
  "device": {
    "pf_id": 0,
    "packet_counters": "Enabled",
    "queues-stats": [
      {
        "VQ Index": 0,
        "rx_64_or_less_octet_packets": 0,
        "rx_65_to_127_octet_packets": 259,
        "rx_128_to_255_octet_packets": 0,
        "rx_256_to_511_octet_packets": 0,
        "rx_512_to_1023_octet_packets": 0,
        "rx_1024_to_1522_octet_packets": 0,
        "rx_1523_to_2047_octet_packets": 0,
        "rx_2048_to_4095_octet_packets": 199,
        "rx_4096_to_8191_octet_packets": 0,
        "rx_8192_to_9022_octet_packets": 0,
        "received_desc": "4096",
        "completed_desc": "0",
        "bad_desc_errors": "0",
        "error_cqes": "0",
        "exceed_max_chain": "0",
        "invalid_buffer": "0",
        "batch_number": "64",
        "dma_q_used_number": "0",
        "handler_schd_number": "44",
        "aux_handler_schd_number": "43",
        "max_post_desc_number": "0",
        "total_bytes": "0",
        "err_handler_schd_num": "0",
        "rq_cq_max_count": "0",
        "rq_cq_period": "0",
        "rq_cq_period_mode": "1"
      }
    ]
  }
}

Modify Device

This command modifies the attributes of a given device.

When dynamic MSI-X mode is enabled, the user should provision the VF from the DPU side before attaching a VF to the VM.

When dynamic MSI-X mode is disabled, the default number of MSI-X vectors is according to VIRTIO_NET_EMULATION_NUM_VF_MSIX value.

Syntax

The modify command supports three subcommands: devicequeue, and global.

virtnet modify [-h] [-p PF] [-v VF] [-u VUID] [-a] {device,queue,global} ...

The options --pf--vf--vuid, and --all are mutually exclusive (except --pf and --vf which can be used together), but one of them must be applied.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--all

-a

N/A

No

Modify all available device attributes depending on the selection of device or queue

--pf

-p

Number

No

Unique device ID for the PF. May be retrieved using virtnet list.

--vf

-v

Number

No

Unique device ID for the VF. May be retrieved using virtnet list.

--vuid

-u

String

No

Unique device SN for the device (PF/VF). May be retrieved by using virtnet list.

device

N/A

Number

No

Modify device specific options

queue

N/A

N/A

No

Modify queue specific options

global

N/A

N/A

No

Modify global controller settings

Device Options
virtnet modify device [-h] [-m MAC] [-t MTU] [-e SPEED] [-l LINK]
                           [-s STATE] [-f FEATURES]
                           [-o SUPPORTED_HASH_TYPES] [-k RSS_MAX_KEY_SIZE]
                           [-r RX_MODE] [-n MSIX_NUM] [-q MAX_QUEUE_SIZE]
                           [-b RX_DMA_Q_NUM] [-dc {enable,disable}]
                           [-pkt_cnt {enable,disable}] [-aarfs {enable,disable}]
                           [-qp MAX_QUEUE_PAIRS] [-dim {enable,disable}]

Option

Abbr

Argument Type

Required

Description

--help

-h

String

No

Show the help message and exit

--mac

-m

Number

No

The virtio-net device MAC address

--mtu

-t

Number

No

The virtio-net device MTU

--speed

-e

Number

No

The virtio-net device link speed in Mb/s

--link

-l

Number

No

The virtio-net device link status

  • 0 – down

  • 1 – up

--state

-s

Number

No

The virtio-net device status field bit masks according to the virtio spec:

  • ACKNOWLEDGE (bit 0)

  • DRIVER (bit 1)

  • DRIVER_OK (bit 2)

  • FEATURES_OK (bit 3)

  • DEVICE_NEEDS_RESET (bit 6)

  • FAILED (bit 7)

--features

-f

Hex Number / Feature Name / Pattern

No

Configures the virtio-net device feature bits in accordance with the virtio specification. Administrators can explicitly set the base feature mask, or dynamically enable and disable specific features using hexadecimal bitmasks or predefined feature names.

Supported syntax and patterns:

  • Enable features (+): Append a plus sign before a bitmask or feature name to enable it.Examples: +0x8000, +VIRTIO_NET_F_MRG_RXBUF, or +MRG_RXBUF

  • Disable features (-): Append a minus sign before a bitmask or feature name to disable it.Examples: -0x400000000, -VIRTIO_NET_F_MRG_RXBUF, or -MRG_RXBUF

  • Set exact features: Provide a raw bitmask to explicitly overwrite and set the feature bits.Example: 0x8100000300e7182f

  • Combined operations: Set a base mask and sequentially append + or - operations to modify specific bits in a single, complex command.Format: <bitmask>?([+-]<bitmask|name>)*Example: 0x8100000300e7182f+0x400000000-MRG_RXBUF

--supported_hash_types

-o

Hex Number

No

Supported hash types for this device in hex. Only applicable when VIRTIO_NET_F_HASH_REPORT is enabled.

  • VIRTIO_NET_HASH_TYPE_IPv4 (bit 0)

  • VIRTIO_NET_HASH_TYPE_TCPv4 (bit 1)

  • VIRTIO_NET_HASH_TYPE_UDPv4 (bit 2)

  • VIRTIO_NET_HASH_TYPE_IPv6 (bit 3)

  • VIRTIO_NET_HASH_TYPE_TCPv6 (bit 4)

  • VIRTIO_NET_HASH_TYPE_UDPv6 (bit 5)

--rss_max_key_size

-k

Number

No

The maximum supported length of RSS key. Only applicable when VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is enabled.

--rx_mode

-r

Hex Number

No

The RX mode exposed to the driver:

  • 0 – promisc

  • 1 – all-multi

  • 2 – all-uni

  • 3 – no-multi

  • 4 – no-uni

  • 5 – no-broadcast

--msix_num

-n

Number

No

Maximum number of VQs (both data and ctrl/admin VQ). It is bound by the cap of max_virt_queues at the controller level (virtnet list).

--max_queue_size

-q

Number

No

Maximum number of buffers in the VQ. The queue size value is always a power of 2. The maximum queue size value is 32768.

--max_queue_pairs

-qp

Number

No

Number of data VQ pairs. One VQ pair has one TX queue and one RX queue. Control or admin VQs are not counted. From the host side, it appears as Pre-set maximums->Combined in ethtool -l <virtio-dev>.

--rx_dma_q_num

-b

Number

No

Modify max RX DMA queue number

--drop_counter

-dc

String

No

Enable/disable virtio-net drop counter

--packet_counter

-pkt_cnt

String

No

Enable/disable virtio-net device packet counter stats 

--aarfs_config

-aarfs

String

No

Enable/disable auto-AARFS. Only applicable for PF devices (static PF and hotplug PF).

--dim_config

-dim

String

No

Enable/disable dynamic interrupt moderation (DIM)

The following modify options require unbinding the virtio device from virtio-net driver in the guest OS:

  • mac

  • mtu

  • features

  • msix_num

  • max_queue_size

  • max_queue_pairs

For example:

  1. On the guest OS: 

    [host]# echo "bdf of virtio-dev" > /sys/bus/pci/drivers/virtio-pci/unbind
    
    
  2. On the DPU side:

    1. Modify the max queue size of device:

      [dpu]# virtnet modify -p 0 -v 0 device -q 2048
      
      
    2. Modify the MSI-X number of VF device:

      [dpu]# virtnet modify -p 0 -v 0 device -n 8
      
      
    3. Modify the MAC address of virtio physical device ID 0 (or with its "VUID string", which can be obtained through virtnet list/query):

      [dpu]# virtnet modify -p 0 device -m 0C:C4:7A:FF:22:93
      
      
    4. Modify the maximum number of queue pairs of VF device:

      [dpu]# virtnet modify -p 0 -v 0 device -qp 2
      
      
  3. On the guest OS: 

    [host]# echo "bdf of virtio-dev" > /sys/bus/pci/drivers/virtio-pci/bind
    
    
Enabling and Disabling Virtio-net Features

Configuring virtio-net features involves verifying what the DPU supports, modifying the device configuration, and confirming that the host driver successfully negotiated the new features.

Verify Supported Features

Check the full list of features supported by the underlying controller using the virtnet list command. 

virtnet list

Example JSON output:

"supported_features": {
    "value":        "0x8b00037700ef982f",
    "    0":        "VIRTIO_NET_F_CSUM",
    "    1":        "VIRTIO_NET_F_GUEST_CSUM",
    "    2":        "VIRTIO_NET_F_CTRL_GUEST_OFFLOADS",
    "    3":        "VIRTIO_NET_F_MTU",
    "    5":        "VIRTIO_NET_F_MAC",
    "   11":        "VIRTIO_NET_F_HOST_TSO4",
    "   12":        "VIRTIO_NET_F_HOST_TSO6",
    "   15":        "VIRTIO_NET_F_MRG_RXBUF",
    "   34":        "VIRTIO_F_RING_PACKED",
    "   63":        "VIRTIO_NET_F_SPEED_DUPLEX"
}

Important Constraints
  • Feature names and bit numbers strictly follow the virtio-net specification.

  • The specific list of supported features will vary depending on the device type, driver capabilities, and current application version.

Check Currently Enabled Device Features

Verify which features are currently enabled on your target device (e.g., Physical Function 0) by inspecting the device_feature section of the query output. 

virtnet query -p 0 -b

Example JSON output:

"pci_bdf": "0f:00.2",
"device_feature": {
    "value":        "0x8100032300ef982f",
    "    0":        "VIRTIO_NET_F_CSUM",
    "   32":        "VIRTIO_F_VERSION_1",
    "   33":        "VIRTIO_F_IOMMU_PLATFORM",
    "   63":        "VIRTIO_NET_F_SPEED_DUPLEX"
}

Modify Device Features
Driver Unbonding Required

You cannot modify device features while the device driver is actively bound. You must unbind the device driver before executing any virtnet modify device commands. Refer to the "Modify Device" section for exact unbinding steps.

Once unbound, use the -f flag to enable, disable, or explicitly set the feature bitmask.

  • To enable a feature (+): 

    # Syntax: virtnet modify -p <pf> device -f +<feature_name>
    virtnet modify -p 0 device -f +VIRTIO_F_RING_PACKED
    
    
  • To disable a feature (-): 

    # Syntax: virtnet modify -p <pf> device -f -<feature_name>
    virtnet modify -p 0 device -f -VIRTIO_F_RING_PACKED
    

  • To set an explicit feature vector (bitmask) – You can overwrite the entire feature vector using a hex mask. For example, to add VIRTIO_F_RING_PACKED (bit 34) to a base vector of 0x8100032300EF182F, you calculate the logical OR (0x8100032300EF182F | 0x400000000 = 0x8100032700EF182F) and apply it:

    # Syntax: virtnet modify -p <pf> device -f <features_bitmask>
    virtnet modify -p 0 device -f 0x8100032700EF182F
    

Verify Driver Negotiation

After modifying the features and rebonding the driver, you must verify that the host operating system successfully negotiated the new features.

A feature is only fully active if it appears in both the device_feature and driver_feature lists. 

virtnet query -p 0 -b

Example output confirming VIRTIO_F_RING_PACKED (bit 34) was successfully negotiated:

"pci_bdf": "0f:00.2",

"device_feature": {
    "   34":        "VIRTIO_F_RING_PACKED"
},
"driver_feature": {
    "   34":        "VIRTIO_F_RING_PACKED"
}

Queue Options
virtnet modify queue [-h] -e {event,cqe} -n PERIOD -c MAX_COUNT

Option

Abbr

Argument Type

Required

Description

--help

-h

String

No

Show the help message and exit

--period_mode

-e

String

No

RQ period mode: event or cqe. Default is selected by device for the best result.

--period

-n

Number

No

The event generation moderation timer for the queue in 1µsec granularity

--max_count

-c

Number

No

The max event generation moderation counter of the queue

Global Options
virtnet modify global dc -mode <async|sync> --poll_freq <seconds>

Option

Abbr

Argument Type

Required

Description

--help

-h

String

No

Show the help message and exit

--mode=<async|sync>

N/A

String

Yes

Set the drop counter query mode. async enables background polling with cached queries. sync (default) uses real-time hardware queries.

--poll_freq

N/A

Number

No

Polling interval in seconds when using mode=async. Valid range: 1–600. Default: 5.

Global output:

Entry

Type

Description

mode

String

Current drop counter query mode: async or sync

poll_freq_sec

Number

Polling interval in seconds (only shown when dc mode is async)

Global examples:

  • Enable async mode with default polling interval:

    [dpu]# virtnet modify global dc -mode=async
    {
        "dc_mode": "async",
        "poll_freq_sec": 5
    }
    

  • Enable async mode with custom polling interval:

    [dpu]# virtnet modify global dc -mode=async -poll_freq 10
    {
        "dc_mode": "async",
        "poll_freq_sec": 10
    }
    

  • Disable async mode (return to sync):

    [dpu]# virtnet modify global dc -mode=sync
    {
        "dc_mode": "sync"
    }
    
    

    The async mode setting is persisted across controller restarts. No manual action is needed to restore it after a restart.

Output

Entry

Type

Description

errno

Number

Error number:

  • 0 – success

  • Non-0 – failed

errstr

String

Explanation of the error number

Example

To modify the link status of the first VF on the first PF to be down:

# virtnet modify -p 0 device -l 0
{'pf': '0x0', 'all': '0x0', 'subcmd': '0x0', 'link': '0x0'}
{
  "errno": 0,
  "errstr": "Success"
}

Log

This command manages the log level of virtio-net-controller.

Syntax

virtnet log [-h] -l {info,err,debug}

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--level

-l

String

Yes

Change the log level of virtio_net_controller from the journal. Default is DEBUG.

Output

Entry

Type

Description

Stdout

String

Success or failed with message

Example

To change the log level to info:

# virtnet log -l info
{'level': 'info'}
"Success"

To monitor current log output of the controller service with the latest 100 lines printed out:

$ journalctl -u virtio-net-controller -f -n 100

Validate

This command validates configurations of virtio-net-controller.

Syntax

virtnet validate [-h] -f PATH_TO_FILE

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--file

-f

String

No

Validate the JSON format of the virtnet.conf file of the virtio_net_controller

Output

Entry

Type

Description

Stdout

String

Success or failed with message

Example

To check if virtnet.conf is a valid JSON file:

# virtnet validate -f /opt/mellanox/mlnx_virtnet/virtnet.conf
/opt/mellanox/mlnx_virtnet/virtnet.conf is valid

Version

This command prints current and updated version of virtio-net-controller.

Syntax

virtnet version [-h]

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

Output

Entry

Type

Description

Original Controller

String

The original controller version

Destination Controller

String

The to be updated controller version

Example

Check current and next available controller version:

# virtnet version
[
  {
    "Original Controller": "v24.10.17"
  },
  {
    "Destination Controller": "v24.10.19"
  }
]

Update

This command performs a live update to another version installed on the OS. Instead of a complete shutdown and recreating all existing devices, this procedure updates to the new version with minimal down time.

Syntax

virtnet update [-h] [-s | -t]

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--start

-s

N/A

No

Start live update virtio-net-controller

--status

-t

N/A

No

Check live update status

Output

Entry

Type

Description

stdout

String

If the update started successfully

Example

To start the live update process, run:

# virtnet update -s
{'start': '0x1'}
"Update started, use 'virtnet update -t' or check logs for status"

To check the update status during the update process:

# virtnet update -t
{'status': '0x1'}
{
  "status": "inactive",
  "last live update status": "success",
  "time_used (s)": 0.604152
}

Restart

This command performs a fast restart of the virtio-net-controller service. Compared to regular restart (using systemctl restart virtio-net-controller) this command has shorter down time per device.

Syntax

virtnet restart [-h]

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

Output

Entry

Type

Description

stdout

String

If the fast restart finishes successfully

  • SUCCESS

  • Failed to fast restart

Example

To start the live update process, run:

# virtnet restart
SUCCESS

Health

This command shows health information for given devices.

The virtio-net driver must be loaded for this command to show valid information.

Syntax

virtnet health [-h] {[-a] | [-p PF] [-v VF] | [-u VUID]} [show]

The options --pf--vf--vuid, and --all are mutually exclusive (except --pf and --vf which can be used together), but one of them must be applied.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--all

-a

N/A

No

Query all the detailed information for all available devices. It can be time consuming if a large number of devices is available.

--pf

-p

Number

No

Unique device ID for the PF. Can be retrieved by using virtnet list.

--vf

-v

Number

No

Unique device ID for the VF. Can be retrieved by using virtnet list.

--vuid

-u

String

No

Unique device SN for the device (PF/VF). Can be retrieved by using virtnet list.

Sub-command

Required

Description

show

Yes

Show health information for given devices

Output

Entry

Type

Description

pf_id

Number

Physical function ID

type

String

Function type: Static PF, hotplug PF, VF

vuid

String

Unique device SN, it can be used as an index to query/modify/unplug a device

dev_status

String

Device status field bit masks according to the virtio spec:

  • ACKNOWLEDGE (bit 0)

  • DRIVER (bit 1)

  • DRIVER_OK (bit 2)

  • FEATURES_OK (bit 3)

  • DEVICE_NEEDS_RESET (bit 6)

  • FAILED (bit 7)

health_status

String

  • Good

  • Fatal

health_recover_counter

Number

The number of recoveries has been performed

dev_health_details

Dictionary

Two types of health information are included: control_plane_errorsand data_plane_errors,

where control_plane_errors has following specific errors reported, with value either 0 or 1:

  • sf_rqt_update_err

  • sf_drop_create_err

  • sf_tir_create_err

  • steer_rx_domain_err

  • steer_rx_table_err

  • sf_flows_apply_err

  • aarfs_flow_init_err

  • vlan_flow_init_err

  • drop_cnt_config_err

and data_plane_errors has following specific errors reported, with value either 0 or 1:

  • sq_stall

  • dma_q_stall

  • spurious_db_invoke

  • aux_not_invoked

  • dma_q_errors

  • host_read_errors

Detailed descriptions of each error can be found in Health Statistics.

Example

The following is an example of showing the information of the first PF:

# virtnet health -p 0 show
{'pf': '0x0', 'all': '0x0', 'subcmd': '0x0'}
{
  "pf_id": 0,
  "type": "static PF",
  "vuid": "MT2306XZ00BPVNETS0D0F1",
  "dev_status": {
    "value": "0xf",
    "    0": "ACK",
    "    1": "DRIVER",
    "    2": "DRIVER_OK",
    "    3": "FEATURES_OK"
  },
  "health_status": "Good",
  "health_recover_counter": 0,
  "dev_health_details": {
    "control_plane_errors": {
      "sf_rqt_update_err": 0,
      "sf_drop_create_err": 0,
      "sf_tir_create_err": 0,
      "steer_rx_domain_err": 0,
      "steer_rx_table_err": 0,
      "sf_flows_apply_err": 0,
      "aarfs_flow_init_err": 0,
      "vlan_flow_init_err": 0,
      "drop_cnt_config_err": 0
    },
    "data_plane_errors": {
      "sq_stall": 0,
      "dma_q_stall": 0,
      "spurious_db_invoke": 0,
      "aux_not_invoked": 0,
      "dma_q_errors": 0,
      "host_read_errors": 0
    }
  }
}

Error Code

CLI commands will return non-zero error code upon failure. All error numbers are negative. When an error occurs from the log, it could return an error number as well.

If the error number is greater than -1000, it is standard error. Please refer to Linux error code at errno

If the error number is less or equal -1000, please refer to the table below for the explanation.

Errno

Error Name

Error Description

-1000

VIRTNET_ERR_DEV_FEATURE_VALIDATE

Failed to validate device feature

-1001

VIRTNET_ERR_DEV_NOT_FOUND

Failed to find device

-1002

VIRTNET_ERR_DEV_NOT_PLUGGED

Failed - Device is not hotplugged

-1003

VIRTNET_ERR_DEV_NOT_STARTED

Failed - Device did not start

-1004

VIRTNET_ERR_DRIVER_PROBED

Failed - Virtio driver should not be loaded

-1005

VIRTNET_ERR_EPOLL_ADD

Failed to add epoll

-1006

VIRTNET_ERR_ID_OUT_OF_RANGE

Failed - ID input exceeds the max range

-1007

VIRTNET_ERR_VUID_INVALID

Failed - VUID is invalid

-1008

VIRTNET_ERR_MAC_INVALID

Failed - MAC is invalid

-1009

VIRTNET_ERR_MSIX_INVALID

Failed - MSIX is invalid

-1010

VIRTNET_ERR_MTU_INVALID

Failed - MTU is invalid

-1011

VIRTNET_ERR_PORT_CONTEXT_NOT_FOUND

Failed to find port context

-1012

VIRTNET_ERR_REC_CONFIG_LOAD

Failed to load config from recovery file

-1013

VIRTNET_ERR_REC_CONFIG_SAVE

Failed to save config into recovery file

-1014

VIRTNET_ERR_REC_FILE_CREATE

Failed to create recovery file

-1015

VIRTNET_ERR_REC_MAC_DEL

Failed to delete MAC in recovery file

-1016

VIRTNET_ERR_REC_MAC_LOAD

Failed to load MAC from recovery file

-1017

VIRTNET_ERR_REC_MAC_SAVE

Failed to save MAC into recovery file

-1018

VIRTNET_ERR_REC_MQ_SAVE

Failed to save MQ into recovery file

-1019

VIRTNET_ERR_REC_PFNUM_LOAD

Failed to load PF number from recovery file

-1020

VIRTNET_ERR_REC_RX_MODE_SAVE

Failed to save RX mode into recovery file

-1021

VIRTNET_ERR_REC_SF_SAVE

Failed to save PF and SF number into recovery file

-1022

VIRTNET_ERR_REC_SFNUM_LOAD

Failed to load SF number from recovery file

-1023

VIRTNET_ERR_SF_MAC_FLOW_APPLY

Failed to apply MAC flow by SF 

-1024

VIRTNET_ERR_SF_MQ_UPDATE

Failed to update MQ by SF

-1025

VIRTNET_ERR_SF_RX_MODE_SET

Failed to set RX mode by SF

-1026

VIRTNET_ERR_SNAP_NET_CTRL_OPEN

Failed to open SNAP device control

-1027

VIRTNET_ERR_SNAP_CROSS_MKEY_CREATE

Failed to create SNAP cross mkey

-1028

VIRTNET_ERR_SNAP_DMA_Q_CREATE

Failed to create SNAP DMA Q

-1029

VIRTNET_ERR_SNAP_NET_DEV_QUERY

Failed to query SNAP device

-1030

VIRTNET_ERR_SNAP_NET_DEV_MODIFY

Failed to modify SNAP device

-1031

VIRTNET_ERR_SNAP_PF_HOTPLUG

Failed to hotplug SNAP PF

-1032

VIRTNET_ERR_VQ_PERIOD_UPDATE

Failed to update VQ period

-1033

VIRTNET_ERR_QUEUE_SIZE_INVALID

Failed - Queue size is invalid

-1034

VIRTNET_ERR_SF_PORT_ADD

Failed to add SF port

-1035

VIRTNET_ERR_WQ_WORKQUEUE_ALLOC

Failed to alloc workqueue

-1036

VIRTNET_ERR_ETH_VQS_OPERATION_ALLOC

Failed to alloc eth VQS operation

-1037

VIRTNET_ERR_ETH_VQS_OPERATION_COMP

Failed to complete eth VQS operation

-1038

VIRTNET_ERR_JSON_OBJ_NOT_EXIST

Failed - JSON obj does not exist

-1039

VIRTNET_ERR_DEV_LOAD_PREP

Failed to prepare device load

-1040

VIRTNET_ERR_DEV_SW_MIGRATION

Failed to sw migrate a device

-1041

VIRTNET_ERR_DEV_IS_SW_MIGRATING

Failed - Device is migrating

-1042

VIRTNET_ERR_MAX_QUEUE_SIZE

Error - queue size must be greater than 2 and is power of 2

-1043

VIRTNET_ERR_MSIX_LESS_EQUAL_THREE

Warning - this device won't function, don't try to probe with virtio driver

-1044

VIRTNET_ERR_SF_POOL_CREATING

SF pool is creating try again later

-1046

VIRTNET_ERR_INVALID_OPTION

Option is not supported

-1047

VIRTNET_ERR_SF_CREATE

Failed to create SF

-1048

VIRTNET_ERR_DEV_SF_NUM_OUT_OF_RANGE

SF number for hotplug device should be between 2000 and 2999

-1049

VIRTNET_ERR_DEV_SF_NUM_USED

SF number is already used

-1050

VIRTNET_ERR_QUEUE_NUMBER_INVALID

Queue index is invalid

-1051

VIRTNET_ERR_SPEED_INVALID

Invalid speed please check help menu for supported link speeds

-1052

VIRTNET_ERR_SUPPORTED_HASH_TYPES_INVALID

Invalid hash types please check help menu for supported hash types

-1053

VIRTNET_ERR_RSS_MAX_KEY_SIZE_INVALID

Invalid rss max key size supported key size is 40

-1054

VIRTNET_ERR_REC_OFFLOADS_SAVE

Failed to save OFFLOADS into recovery file

-1055

VIRTNET_ERR_SF_OFFLOADS_UPDATE

Failed to update OFFLOADS by SF

-1056

VIRTNET_ERR_READ_LINK

Failed to readlink

-1057

VIRTNET_ERR_PATH_FORMAT

Error - Path format is invalid

-1058

VIRTNET_ERR_Q_COUNTER_ALLOC

Failed to alloc q counter

-1059

VIRTNET_ERR_REC_DIRTY_LOG_SAVE

Failed to save dirty log

-1060

VIRTNET_ERR_REC_DIRTY_LOG_DEL

Failed to delete dirty log

-1061

VIRTNET_ERR_REC_LM_STATUS_SAVE

Failed to save LM status

-1062

VIRTNET_ERR_REC_LM_STATUS_REC

Failed to found LM status record

-1063

VIRTNET_ERR_REC_DEV_MODE_SAVE

Failed to save dev mode

-1064

VIRTNET_ERR_REC_DEV_MODE_REC

Failed to found dev mode record

-1065

VIRTNET_ERR_UNPLUG_NOT_READY

Error - Device is not ready to be unplugged please check host and retry

-1066

VIRTNET_ERR_REC_MAC_TABLE_DEL

Failed to delete MAC table in recovery file

-1067

VIRTNET_ERR_REC_MAC_TABLE_LOAD

Failed to load MAC table from recovery file

-1068

VIRTNET_ERR_REC_MAC_TABLE_SAVE

Failed to save MAC table into recovery file

-1069

VIRTNET_ERR_REC_HASH_CFG_DEL

Failed to delete hash cfg in recovery file

-1070

VIRTNET_ERR_REC_HASH_CFG_LOAD

Failed to load hash cfg from recovery file

-1071

VIRTNET_ERR_REC_HASH_CFG_SAVE

Failed to save hash cfg into recovery file

-1072

VIRTNET_ERR_DEV_VF_GET

Failed to get VF device

-1073

VIRTNET_ERR_MAX_QUEUES_INVALID

Failed - QUEUES is invalid

-1074

VIRTNET_ERR_DEBUGFS_SAVE

Failed to save into debugfs file

-1075

VIRTNET_ERR_DEBUGFS_DEL

Failed to delete from debugfs file

Debug

The virtnet_cli tool provides debug commands for the event publisher.

  • To view publisher configuration: 

    virtnet debug vnet_event config

  • To view publisher counters:

    virtnet debug vnet_event stats
    CounterMeaningenqueuedTotal events successfully enqueued to the publish ring buffer.dropped_queue_fullEvents dropped because the queue was at max_queue_depth.json_encode_failEvents that failed JSON serialization (should always be 0).transport_publish_failEvents that the worker thread failed to publish to NATS.reconnect_attemptsNumber of times the worker attempted to reconnect to the broker.last_errorLast negative errno from a failed operation (0 = no error).

  • To enable verbose debug logging (runtime):

    virtnet debug vnet_event --log_level 1
    

    This enables per-event syslog traces showing the NATS subject, JSON preview, and publish outcome. Set back to 0 to disable.

Feature Guidance

Counters

Packet Statistics

To query the packet counters, use stats command.

[dpu]# virtnet stats [-h] {[-p PF] [-v VF] | [-u VUID]} [-q QUEUE_ID]

The options --pf--vf and --vuid are mutually exclusive, but one of them must be applied.

Option

Abbr

Argument Type

Required

Description

--help

-h

N/A

No

Show the help message and exit

--pf

-p

Number

No

Unique device ID for the PF. Can be retrieved by using virtnet list.

--vf

-v

Number

No

Unique device ID for the VF. Can be retrieved by using virtnet list.

--vuid

-u

String

No

Unique device SN for the device (PF/VF). Can be retrieved by using virtnet list.

--queue_id

-q

Number

No

Queue index of the device RQs or SQs

This command is recommended for obtaining all packet counter information. The existing packet counter information available through the virtnet list and virtnet query commands will be deprecated in the future.

The following command queries PF 0 and VQ 0 (i.e., RQ):

[dpu]# virtnet stats -p 0 -q 0

Output:

# virtnet stats -p 0 -q 0
{'pf': '0x0', 'queue_id': '0x0'}
{
  "device": {
    "pf_id": 0,
    "packet_counters": "Enabled",
    "queues-stats": [
      {
        "VQ Index": 0,
        "rx_64_or_less_octet_packets": 0,
        "rx_65_to_127_octet_packets": 259,
        "rx_128_to_255_octet_packets": 0,
        "rx_256_to_511_octet_packets": 0,
        "rx_512_to_1023_octet_packets": 0,
        "rx_1024_to_1522_octet_packets": 0,
        "rx_1523_to_2047_octet_packets": 0,
        "rx_2048_to_4095_octet_packets": 199,
        "rx_4096_to_8191_octet_packets": 0,
        "rx_8192_to_9022_octet_packets": 0,
        "received_desc": "4096",
        "completed_desc": "0",
        "bad_desc_errors": "0",
        "error_cqes": "0",
        "exceed_max_chain": "0",
        "invalid_buffer": "0",
        "batch_number": "64",
        "dma_q_used_number": "0",
        "handler_schd_number": "44",
        "aux_handler_schd_number": "43",
        "max_post_desc_number": "0",
        "total_bytes": "0",
        "err_handler_schd_num": "0",
        "rq_cq_max_count": "0",
        "rq_cq_period": "0",
        "rq_cq_period_mode": "1"
      }
    ]
  }
}

The output has two sections.

  • The first section, wrapped by device, are device details along with the packet counter statics enable state.

    Entry

    Type

    Description

    device

    String

    Entries under this section is per device information

    pf_id

    String

    Physical function ID

    packet_counters

    String

    packet counters feature: enabled/disabled

  • The second section, wrapped by queues-stats, are information for each receive VQ.

    Entry

    Type

    Description

    VQ Index

    Number

    The VQ index starts at 0 (the first RQ) and continues up to the last SQ

    rx_64_or_less_octet_packets

    Number

    The number of packets received with a size of 0 to 64 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_65_to_127_octet_packets

    Number

    The number of packets received with a size of 65 to 127 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_128_to_255_octet_packets

    Number

    The number of packets received with a size of 128 to 255 bytes. Relevant for BlueField-3 RQ when packet counter is enabled.

    rx_256_to_511_octet_packets

    Number

    The number of packets received with a size of 256 to 511 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_512_to_1023_octet_packets

    Number

    The number of packets received with a size of 512 to 1023 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_1024_to_1522_octet_packets

    Number

    The number of packets received with a size of 1024 to 1522 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_1523_to_2047_octet_packets

    Number

    The number of packets received with a size of 1523 to 2047 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_2048_to_4095_octet_packets

    Number

    The number of packets received with a size of 2048 to 4095 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_4096_to_8191_octet_packets

    Number

    The number of packets received with a size of 4096 to 8191 bytes. Relevant for BlueField-3 RQ when

    packet counter

    is enabled.

    rx_8192_to_9022_octet_packets

    Number

    The number of packets received with a size of 8192 to 9022 bytes. Relevant for BlueField-3 RQ when packet counter is enabled.

    received_desc

    Number

    Total number of received descriptors by the device on this VQ

    completed_desc

    Number

    Total number of completed descriptors by the device on this VQ

    bad_desc_errors

    Number

    Total number of bad descriptors received on this VQ

    error_cqes

    Number

    Total number of errors CQ entries on this VQ

    exceed_max_chain

    Number

    Total number of chained descriptors received that exceed the max allowed chain by the device

    invalid_buffer

    Number

    Total number of times device tried to read or write buffer that is not registered to the device

    batch_number

    Number

    The number of RX descriptors for the last received packet. Relevant for BlueField-3.

    dma_q_used_number

    Number

    The DMA q index used for this VQ. Relevant for BlueField-3.

    handler_schd_number

    Number

    Scheduler number for this VQ. Relevant for BlueField-3.

    aux_handler_schd_number

    Number

    Aux scheduler number for this VQ. Relevant for BlueField-3.

    max_post_desc_number

    Number

    Maximum number of posted descriptors on this VQ. Relevant for DPA.

    total_bytes

    Number

    Total number of bytes handled by this VQ. Relevant for BlueField-3.

    rq_cq_max_count

    Number

    Event generation moderation counter of the queue. Relevant for RQ.

    rq_cq_period

    Number

    Event generation moderation timer for the queue in 1µsec granularity. Relevant for RQ.

    rq_cq_period_mode

    Number

    Current period mode for RQ

    • 0x0 – default_mode – use device best defaults

    • 0x1 – upon_eventqueue_period timer restarts upon event generation

    • 0x2 – upon_cqequeue_period timer restarts upon completion generation

    The second section wrapped by queues-stats IS information for each receive VQ.

VQ Statistics

To query Rx VQ statistics, use the corresponding VQ index. For example, If there are 3 queues configured then to query Rx, VQ uses queue 0, Tx VQ uses queue 1, and Ctrl VQ uses queue 2.

The following is the command to query PF 0, VF 0, and VQ 0 (i.e., Rx).

[dpu]# virtnet query -p 0 -v 0 -q 0

Output:

"enabled-queues-info": [
  {
    "index": "0",
    "size": "256",
    "msix_vector": "0x1",
    "enable": "1",
    "notify_offset": "0",
    "descriptor_address": "0xffffe000",
    "driver_address": "0xfffff000",
    "device_address": "0xfffff240",
    "received_desc": "256",
    "completed_desc": "19",
    "bad_desc_errors": "0",
    "error_cqes": "0",
    "exceed_max_chain": "0",
    "invalid_buffer": "0",
    "batch_number": "64",
    "dma_q_used_number": "0",
    "handler_schd_number": "4",
    "aux_handler_schd_number": "3",
    "max_post_desc_number": "0",
    "total_bytes": "6460",
    "rq_cq_max_count": "0",
    "rq_cq_period": "0",
    "rq_cq_period_mode": "1"
  }

The following are some of the important VQ counters:

Counter Name

Description

total_bytes

Number of bytes received 

received_desc

Number of available descriptors received by device

completed_desc

Number of available descriptors completed by the device

error_cqes

Number of error CQEs received on the queue

bad_desc_errors

Number of bad descriptors received

exceed_max_chain

Number of chained descriptors received that exceed the max allowed chain by device

invalid_buffer

Number of times device tried to read or write buffer that is  not registered to the device

RQ Drop Counter

When DPA is the data path provider, each RQ has its corresponding drop counter, which counts the number of packets dropped inside the DPA virtio RQs. 

The drop could also happen from the uplink or SF.

The drop counter only increments (initial value being 0), and its value gets reset to 0 when disabled.

Enabling/Disabling Drop Counters

RQ drop counter can be enabled and disabled per device as follows (using VF 0 on PF 0): 

[dpu]# virtnet modify -p 0 -v 0 device -dc enable
[dpu]# virtnet modify -p 0 -v 0 device -dc disable

Drop counter is attached to a RQ, thus RQ must be created first. This means that the virtio-net device should be probed by the driver on the host OS before running the commands above.

Querying Drop Counters
Per-device Query

To query the drop counter value(s) for a specific device, run: 

[dpu]# virtnet query -p 0 -v 0 | grep num_desc_drop_pkts

If there is more than one RQ for a device, the drop count is the sum of all RQs' values.

Global Drop Counter Summary

To query the total drop count across all devices with a single command: 

virtnet query rx_drops

Output:

Entry

Type

Description

packets

Number

Total RX drop count across all devices

devices_with_drops

Array

List of devices with non-zero drop counts. Each entry contains pf, type (PF or VF), packets, and optionally vf.

Examples:

  • Query total drops across all devices: 

    [dpu]# virtnet query -p 0 -v 0 | grep num_desc_drop_pkts
    
    JSON output:
    {
        "packets": 98777777,
        "devices_with_drops": [
            { "pf": 0, "type": "PF", "packets": 12345 },
            { "pf": 0, "vf": 3, "type": "VF", "packets": 45678 },
            { "pf": 0, "vf": 17, "type": "VF", "packets": 23456 }
        ]
    }
    

  • Query with no drops present:

    [dpu]# virtnet query -p 0 -v 0 | grep num_desc_drop_pkts
    
    JSON output:
    {
        "packets": 0,
        "devices_with_drops": []
    }
    

Per-device Total Query

To query the total drop count for a single PF or VF (sum of all its RQs):

virtnet query -p <PF> [-v <VF>] rx_drops

Output:

Entry

Type

Description

pf

Number

PF index

vf

Number

VF index (only present for VF devices)

type

String

PF or VF

packets

Number

Total RX drop count for this device (sum of all RQs)

Examples:

  • Query total drops for a specific VF:

    [dpu]# virtnet query -p 0 -v 3 rx_drops
    
    JSON output:
    {
        "pf": 0,
        "vf": 3,
        "type": "VF",
        "packets": 45678
    }
    

  • Query total drops for a PF (includes the PF's own drops only; does not include VFs):

    [dpu]# virtnet query -p 0 rx_drops
    
    JSON output:
    {
        "pf": 0,
        "type": "PF",
        "packets": 12345
    }
    

    Querying a PF returns only that PF's own drop counters. It does not include drops from its VFs.

Devices with Drops and Per-queue Detail

To list only devices that have drops, with per-RQ breakdown: 

virtnet query rx_drops --drops-only

Example:

[dpu]# virtnet query rx_drops --drops-only

JSON output:

{
    "packets": 69134,
    "devices_with_drops": [
        {
            "pf": 0, "vf": 3, "type": "VF", "packets": 45678,
            "queues": [
                { "index": 0, "packets": 12345 },
                { "index": 2, "packets": 23456 },
                { "index": 4, "packets": 9877 },
                { "index": 6, "packets": 0 }
            ]
        },
        {
            "pf": 0, "vf": 17, "type": "VF", "packets": 23456,
            "queues": [
                { "index": 0, "packets": 23456 },
                { "index": 2, "packets": 0 }
            ]
        }
    ]
}

Check Async Polling Status
virtnet query rx_drops status

Output:

Entry

Type

Description

dc_mode

String

Current drop counter query mode: async or sync

poll_freq_sec

Number

Polling interval in seconds (only shown when dc_mode is async)

last_poll_time_ms

Number

Timestamp of last completed poll cycle in milliseconds (only shown when dc_mode is async and at least one poll has completed)

Example: 

[dpu]# virtnet query rx_drops status

JSON output:

{
    "dc_mode": "async",
    "poll_freq_sec": 5,
    "last_poll_time_ms": 1740394532123
}
Async Drop Counter Mode

By default, drop counter queries are performed synchronously — each query reads the counter value directly from hardware firmware. This is accurate but can be slow in large-scale deployments with hundreds of VFs.

Async mode starts a background polling thread that periodically queries hardware and caches the results. When async mode is enabled, all drop counter queries (including virtnet query and virtnet query rx_drops) return cached values instantly.

Enable Async Mode
virtnet modify global dc --mode=<async|sync> --poll_freq <seconds>

Parameters:

Option

Argument Type

Required

Description

--mode=<async

sync>

String

Yes

--poll_freq

Number

No

Polling interval in seconds when using mode=async. Valid range: 1–600. Default: 5.

Output:

Entry

Type

Description

mode

String

Current drop counter query mode: async or sync

poll_freq_sec

Number

Polling interval in seconds (only shown when dc_mode is async)

Examples:

  • Enable with default polling interval:

    [dpu]# virtnet modify global dc -mode=async
    
    JSON output:
    {
        "mode": "async",
        "poll_freq_sec": 5
    }

  • Enable with custom polling interval:

    [dpu]# virtnet modify global dc --mode=async --poll_freq 10
    
    JSON output:
    {
        "mode": "async",
        "poll_freq_sec": 10
    }
    IntervalTrade-off1–5 secondsNear real-time, slightly higher CPU5–30 secondsGood balance for most deployments30–600 secondsMinimal CPU, suitable for infrequent monitoring

Disable Async Mode (Return to Sync)
[dpu]# virtnet modify global dc --mode=sync

JSON output: 

{
    "dc_mode": "sync"
}
Check Current Mode
[dpu]# virtnet query rx_drops status

Output when async:

{
    "dc_mode": "async",
    "poll_freq_sec": 5,
    "last_poll_time_ms": 1740394532123
}

Output when sync:

{
    "dc_mode": "sync"
}

The async mode setting is persisted across controller restarts via the recovery file at /opt/mellanox/mlnx_virtnet/recovery/global_config. No manual action is needed to restore the setting after a restart.

Large-scale Deployment Example

For deployments with many VFs (e.g., 576), use async mode to avoid performance bottlenecks: 

# Step 1: Enable drop counters on all VFs
for vf in $(seq 0 575); do
    virtnet modify -p 0 -v $vf device -dc enable
done

# Step 2: Enable async polling
virtnet modify global dc --mode=async --poll_freq 5

# Step 3: Monitor drops — single command replaces 576 individual queries
virtnet query rx_drops

# Step 4: Drill down to a specific VF if needed
virtnet query -p 0 -v 3 rx_drops

Packet Counter

Relevant for BlueField-3 only.

The packet counter feature helps the user query the byte-wise packet counters for each Rx queue.

By default, byte-wise packet counters are disabled as that negatively impacts performance. When the user is interested in the debug, enable the packet counter feature using the below command

Packet counter can be enabled and disabled as follows (using VF 0 on PF 0):

[dpu]# virtnet modify -p 0 -v 0 device -pkt_cnt enable
[dpu]# virtnet modify -p 0 -v 0 device -pkt_cnt disable
  • When enabled, byte-wise packet counters are initialized to zero.

  • When disabled, the previous values are retained for debugging purposes. The command will still return these old, disabled counter values.

Packet counters are attached to an RQ. Thus, RQ must be created first. This means that the virtio-net device should be probed by the driver on the host OS before running the commands above.

Health Statistics

Relevant for BlueField-3 only.

The health statistics are for displaying real-time health information of a specific device.

Output example (using VF 0 on PF 0):

[dpu]# virtnet health -p 0 -v 0 show
{
  "pf_id": 0,
  "vf_id": 0,
  "type": "VF",
  "vuid": "MT2306XZ00BPVNETS0D0F2",
  "dev_status": {
    "value": "0xf",
    "    0": "ACK",
    "    1": "DRIVER",
    "    2": "DRIVER_OK",
    "    3": "FEATURES_OK"
  },
  "health_status": "Good",
  "health_recover_counter": 0,
  "dev_health_details": {
    "control_plane_errors": {
      "sf_rqt_update_err": 0,
      "sf_drop_create_err": 0,
      "sf_tir_create_err": 0,
      "steer_rx_domain_err": 0,
      "steer_rx_table_err": 0,
      "sf_flows_apply_err": 0,
      "aarfs_flow_init_err": 0,
      "vlan_flow_init_err": 0,
      "drop_cnt_config_err": 0
    },
    "data_plane_errors": {
      "sq_stall": 0,
      "dma_q_stall": 0,
      "spurious_db_invoke": 0,
      "aux_not_invoked": 0,
      "dma_q_errors": 0,
      "host_read_errors": 0
    }
  }

Where:

  • health_status represents the overall status of the device (Good or Fatal

  • dev_health_details has two sections, control_plane_errors and data_plane_errors, as explained in the following table:

    Counter Name

    Description

    Control Plane Errors

    sf_rqt_update_err

    Counter tallying receive queue table update failures

    sf_drop_create_err

    Counter tallying drop RQ creation failures

    sf_tir_create_err

    Counter tallying TIR create failures

    steer_rx_domain_err

    Counter tallying RX steering rule creation failures

    steer_rx_table_err

    Counter tallying RX table creation failures

    sf_flows_apply_err

    Counter tallying packet flow rule creation failures

    aarfs_flow_init_err

    Counter tallying packet flow initialization failures

    vlan_flow_init_err

    Counter tallying VLAN flow rule initialization failures

    drop_cnt_config_err

    Counter tallying drop counter configuration failures

    Data  Plane Errors

    sq_stall

    One or more network send queues stalled without getting completions. This leads traffic stalling for packets flowing over this VQ.

    dma_q_stall

    QP which is paired to itself issues a read request from the DPA to the host to read either available index or descriptor table. This request does not result in a completion and hangs in a loop waiting for a response.

    spurious_db_invoke

    Doorbell handler is repeatedly invoked but DPA finds no new data to be read and posted. This could be due to a faulty driver or issue on the DPA side.

    aux_not_invoked

    To speed up descriptor processing, an auxiliary execution (EU) unit is used if available. The primary thread invokes this EU and waits for the expected thread to run on the auxiliary execution unit. If this EU is not invoked, the primary thread hangs.

    dma_q_errors

    QP which is paired to itself issues a read request from the DPA to the host to read either an available index or the descriptor table. This request results in an error and the QP becomes unavailable. An internal mechanism detects this error QP and recycles it for use at later stage.

Dynamic Interruption Moderation

Dynamic Interrupt Moderation (DIM) adjusts the interrupt moderation settings to optimize packet processing. For guest OS kernels older than version 6.8, DIM offloads this function to the DPU, reducing the interrupt rate from the guest OS.

By lowering the interrupt rate in high-bandwidth traffic scenarios, DIM enhances CPU utilization for both the hypervisor and guest VMs, while maintaining nearly the same bandwidth.

DIM is only supported on BlueField-3.

For example, the following table shows the benefit of using DIM:


Tx Interrupt Rate (K irq/s)

Rx Interrupt Rate (K irq/s)

Tx Throughput (Gb/s)

Rx Throughput (Gb/s)

DIM Enabled

7.3

7.5

171

181

DIM Disabled

7.5

23.7

175

181

The following test parameters:

  • Guest OS kernel version – 5.11.0

  • Number of virtio-net device – 1

  • Number of QPs – 31

  • Queue depth – 1024

  • MTU – 1500

  • Benchmark – iPerf with 31 streams

Configuring DIM

DIM is a per-device configuration. To enable or disable it, use this command:

[dpu]# virtnet modify -p <pf> [-v <vf>] device -dim {enable | disable}

Configuration example:

  1. Unload drivers from the guest-OS side:

    [host]# modprobe -rv virtio_net && modprobe -rv virtio_pci
    
    
  2. Enable DIM:

    [dpu]# virtnet modify -p 0 device -dim enable
    {'pf': '0x0', 'all': '0x0', 'subcmd': '0x0', 'dim_config': 'enable'}
    {
      "errno": 0,
      "errstr": "Success"
    }
    
    

    Using disable disables DIM.

  3. Load the drivers:

    [host]# modprobe -v virtio_pci && modprobe -v virtio_net
    
    
  4. Query the device to verify dim is enabled:

    [dpu]# virtnet query -p 0 -b | grep -i dim
          "dim": "enabled"
    
    

High Availability

High availability (HA) is essential in network infrastructure to ensure continuous performance with minimal downtime, even during failures.

To support HA, the virtio-net-controller process creates the auxiliary processes virtio-net-emu and virtio-net-ha. The virtio-net-emu process handles primary controller functions, while virtio-net-ha manages HA. virtio-net-ha saves and oversees critical resources from virtio-net-emu and restores it to a working state if a failure occurs. The two processes communicate through IPC messages.

ha-diagram.png

High availability is only supported on BlueField-3 and after.

The following table provides possible expected behaviors:

Scenarios

Behavior

Downtime Per Device (sec)

Fallback Action

Virtio-net-emu process crashes (e.g., Segfault)

The virtio-net-ha process tries to automatically recover all devices

< 1

The virtnet restart

command

if recovery failed

Device/VQ/SF create/destroy failures

HA makes sure the existing device is not affected

N/A

Retry or restart service

DPA command timeout

No action from HA; DPA is likely stuck

N/A

The virtnet restart

command

Jumbo MTU

Jumbo MTU is critical for increasing the efficiency of Ethernet and network processing by reducing the protocol overhead (ratio of headers and payload size).

To enable support for jumbo MTU, run the following virtnet command:

[dpu]# virtnet modify -p 0 -v 0 device -t 9216

The example sets the MTU to 9126 for VF 0 on PF 0.

Jumbo MTU is only supported starting from the following version:


Release

Upstream

VM kernel: 4.18.0-193.el8.x86_64

(VM Linux version supports big MTU after 4.11 

Ubuntu

DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04

Virtnet controller

v1.7 or v1.6.26

To configure jumbo MTU (e.g., using VF 0 on PF 0):

  1. Change the MTU of the uplink and SF representor from the BlueField:

    [dpu]# ifconfig p0 mtu 9216
    [dpu]# ifconfig en3f0pf0sf3000 mtu 9216
    
    

    If a bond is configured, change the MTU of the bond rather than p0

    [dpu]# ifconfig bond0 mtu 9216
    [dpu]# ifconfig en3f0pf0sf3000 mtu 9216
    
    
  2. Restart the virtio-net-controller from the BlueField:

    [dpu]# systemctl restart virtio-net-controller
    
    
  3. Unload the virtio driver from the host OS:

    [host]# modprobe -rv virtio-net
    
    
  4. Change the corresponding device MTU on the BlueField:

    [dpu]# virtnet modify -p 0 -v 0 device -t 9216
    
    
  5. Reload virtio driver from the host OS:

    [host]# modprobe -v virtio-net
    
    
  6. Check virtqueue MTU configuration is correct on the BlueField:

    [dpu]# virtnet query -p 0 -v 0 --dbg_stats | grep jumbo_mtu
        "jumbo_mtu": 1
        "jumbo_mtu": 1
    
    
  7. Change the MTU of virtio-net interface from the host OS:

    [host]# ifconfig <vnet> mtu 9216
    
    


It is common to use link aggregation (LAG) or bond interfaces to increase reliability, availability, or bandwidth of networking devices. Virtio-net devices support this mode via DPU-side LAG configurations.

To configure the virtio-net-controller in LAG mode must follow a specific procedure due to the dependency on mlx5 RDMA device:

  1. Stop the virtio-net-controller to avoid resource leakage (which would be caused by LAG destroying the existing mlx5 RDMA device and creating a new bond RDMA device).

    [dpu]# systemctl stop virtio-net-controller.service
    
    
  2. Configure the LAG interface for two uplink interfaces from the DPU side. Refer to the "Link Aggregation" page for detailed steps.

    The virtio-net-controller service starts by default. If DPU is rebooted during LAG configuration, it is necessary to stop the controller before creating a bond interfaces from the DPU side.

  3. Update the controller configuration file to use bond interface.

    [dpu]# cat /opt/mellanox/mlnx_virtnet/virtnet.conf
    {
      "ib_dev_lag": "mlx5_bond_0",
      "ib_dev_for_static_pf": "mlx5_bond_0",
      "is_lag": 1,
    }
    
    
  4. Start the controller for the new configuration to take effect.

    [dpu]# systemctl start virtio-net-controller.service
    
    

Live Migration

Live Migration Using vHost Acceleration Software Stack

Virtio VF PCIe devices can be attached to the guest VM using the vhost acceleration software stack. This enables performing live migration of guest VMs.

virtio-vf-pcie-devices-for-vhost-acceleration.png

This section provides the steps to enable VM live migration using virtio VF PCIe devices along with vhost acceleration software.

vdpa-over-virtio-full-emulation-design.png

Prerequisites
  • Minimum hypervisor kernel version – Linux kernel 5.15 (for VFIO SR-IOV support)

  • To use high-availability (the additional vfe-vhostd-ha service which can persist datapath when vfe-vhostd crashes), this kernel patch must be applied.

Install vHost Acceleration Software Stack

Vhost acceleration software stack is built using open-source BSD licensed DPDK.

  • To install vhost acceleration software:

    1. Clone the software source code:

      [host]# git clone https://github.com/Mellanox/dpdk-vhost-vfe
      
      

      The latest release tag isvfe-24.10.0-rc2.

    2. Build software: 

      [host]# apt-get install libev-dev -y
      [host]# apt-get install libev-libevent-dev  -y
      [host]# apt-get install uuid-dev  -y
      [host]# apt-get install libnuma-dev -y
      [host]# meson build --debug -Denable_drivers=vdpa/virtio,common/virtio,common/virtio_mi,common/virtio_ha  
      [host]# ninja -C build install
      
      
  • To install QEMU:

    Upstream QEMU later than 8.1 can be used or the following NVIDIA QEMU.

    1. Clone NVIDIA QEMU sources. 

      [host]# git clone git@github.com:Mellanox/qemu.git -b stable-8.1-presetup
      [host]# git checkout 24aaba9255
      
      

      Latest stable commit is 24aaba9255.

    2. Build NVIDIA QEMU. 

      [host]# mkdir bin 
      [host]# cd bin 
      [host]# ../configure --target-list=x86_64-softmmu --enable-kvm 
      [host]# make -j24
      
      
Configure vHost on Hypervisor
    1. Configure 1G huge pages:

      [host]# mkdir /dev/hugepages1G
      [host]# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages1G
      [host]# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
      [host]# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
      
      
    2. Enable qemu:commandline in VM XML by adding the xmlns:qemu option:

      XML
      <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
      
      
    3. Assign a memory amount and use 1GB page size for huge pages in VM XML:

      XML
       <memory unit='GiB'>4</memory>
       <currentMemory unit='GiB'>4</currentMemory>
       <memoryBacking>
          <hugepages>
            <page size='1' unit='GiB'/>
          </hugepages>
       </memoryBacking>
      
      
    4. Set the memory access for the CPUs to be shared:

      XML
      <cpu mode='custom' match='exact' check='partial'>
        <model fallback='allow'>Skylake-Server-IBRS</model>
        <numa>
          <cell id='0' cpus='0-1' memory='4' unit='GiB' memAccess='shared'/>
        </numa>
      </cpu>
      
      
    5. Add a virtio-net interface in VM XML:

      XML
      <qemu:commandline>
        <qemu:arg value='-chardev'/>
        <qemu:arg value='socket,id=char0,path=/tmp/vhost-net0,server=on'/>
        <qemu:arg value='-netdev'/>
        <qemu:arg value='type=vhost-user,id=vhost1,chardev=char0,queues=4'/>
        <qemu:arg value='-device'/>
        <qemu:arg value='virtio-net-pci,netdev=vhost1,mac=00:00:00:00:33:00,vectors=10,page-per-vq=on,rx_queue_size=1024,tx_queue_size=1024,mq=on,disable-legacy=on,disable-modern=off'/>
      </qemu:commandline>
      
      
Run vHost Acceleration Service
  1. Bind the virtio PF devices to the vfio-pci driver:

    [host]# modprobe vfio vfio_pci 
    [host]# echo 1 > /sys/module/vfio_pci/parameters/enable_sriov   
    [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/vfio-pci/new_id 
    [host]# echo 0x1af4 0x1042 > /sys/bus/pci/drivers/vfio-pci/new_id
    [host]# echo <pf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    [host]# echo <vf_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    [host]# echo <pf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind 
    [host]# echo <vf_bdf> > /sys/bus/pci/drivers/vfio-pci/bind   
    [host]# lspci -vvv -s <pf_bdf> | grep "Kernel driver"
    Kernel driver in use: vfio-pci 
    [host]# lspci -vvv -s <vf_bdf> | grep "Kernel driver"
    Kernel driver in use: vfio-pci
    
    

    Example of <pf_bdf> or <vf_bdf> format: 0000:af:00.3

  2. Run the vhost acceleration software service by starting the vfe-vhostd service:

    [host]# systemctl start vfe-vhostd
    
    

    A log of the service can be viewed by running the following:

    [host]# journalctl -u vfe-vhostd
    
    
  3. Provision the virtio-net PF:

    [host]# /usr/local/bin/vfe-vhost-cli mgmtpf -a <pf_bdf>
    
    

    Wait on the virtio-net-controller to finish handling PF FLR.

  4. Enable SR-IOV and create a VF (or more):

    [host]# echo 1 > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs 
    [host]# lspci | grep Virtio
    0000:af:00.1 Ethernet controller: Red Hat, Inc. Virtio network device 
    0000:af:00.3 Ethernet controller: Red Hat, Inc. Virtio network device
    
    
  5. Add a VF representor to the OVS bridge on the BlueField: 

    [dpu]# virtnet query -p 0 -v 0 | grep sf_rep_net_device
    "sf_rep_net_device": "en3f0pf0sf3000", 
    [dpu]# ovs-vsctl add-port ovsbr1 en3f0pf0sf3000
    
    
  6. Provision the virtio-net VF:On BlueField, change VF MAC address or other device options:  [dpu]# virtnet modify -p 0 -v 0 device -m 00:00:00:00:33:00 Add VF into vfe-dpdk  [host]# /usr/local/bin/vfe-vhost-cli vf -a <vf_bdf> -v /tmp/vhost-net0 If the SR-IOV is disabled and reenabled, the user must re-provision the VFs. 00:00:00:00:33:00 is a virtual MAC address used in VM XML.

Start the VM
[host]# virsh start <vm_name>

HA Service

Running the vfe-vhostd-ha service allows the datapath to persist should vfe-vhostd crash:

[host]# systemctl start vfe-vhostd-ha

Simple Live Migration
  1. Prepare two identical hosts and perform the provisioning of the virtio device to DPDK on both.

  2. Boot the VM on one server:

    [host]# virsh migrate --verbose --live --persistent <vm_name> qemu+ssh://<dest_node_ip_addr>/system --unsafe
    
    


Remove Device

When finished with the virtio devices, use following commands to remove them from DPDK:

[host]# /usr/local/bin/vfe-vhost-cli vf -r <vf_bdf>
[host]# /usr/local/bin/vfe-vhost-cli mgmtpf -r <pf_bdf>

During live migration, the device state may change temporarily. As a result, Linux NetworkManager may reset the associated network interface properties (e.g., IP address).

To prevent NetworkManager from managing a specific interface, run: 

nmcli device set {device-interface} managed no

Live Migration Using VFIO With Full Emulation

Virtio VF PCIe devices can be attached to the guest VM using the virtio-vfio-pci driver. This enables performing live migration of guest VMs.

This section demonstrates how to perform basic live migration of a QEMU VM with a virtio VF assigned to it. It does not explain how to create VMs using libvirt or directly via QEMU.

image-2025-4-16_19-10-46.png

Prerequisites
  1. Minimum Hypervisor kernel version - Linux kernel 6.13-rc2 with virtio_vfio_pci and IOMMU dirty page tracking

  2. Minimum qemu version - 9.1

  3. Minimum libvirt version - 9.2

DPU Configuration
  1. Install new virtio-net-controller (version 25.04 or newer) on source and destination systems.

  2. Add the following flags on the source and destination systems.

    [dpu]# vim /opt/mellanox/mlnx_virtnet/virtnet.conf
    {
      ...
      "virtio_spec_admin_legacy": 1,
      "virtio_spec_admin_lm": 1
    }
    

  3. Restart the controller

  4. Provision device attributes. [After loading virtio-pci-vfio driver and before starting the VM]

  5. Get the MAC of the source device

    [dpu]# virtnet query -p $pf_id -v $vf_id | grep "\"mac"
    

  6. Set the MAC of the destination device

    [dpu]# virtnet modify -p $dst_pf_id -v $dst_vf_id device -m $mac
    

Kernel Configuration

Needs to be compiled with the driver virtio_vfio_pci enabled. (i.e. CONFIG_VIRTIO_VFIO_PCI).

To load the driver, run:

[host]# modprobe virtio_vfio_pci

QEMU Configuration
  1. Needs to be compiled with VFIO_PCI enabled (this is enabled by default).

  2. Add the following to qemu.conf:

    user = "root"
    group = "root"
    cgroup_device_acl = [
        "/dev/null", "/dev/full", "/dev/zero",
        "/dev/random", "/dev/urandom",
        "/dev/ptmx", "/dev/kvm",
        "/dev/iommu", "/dev/vfio/devices/vfio0",
        "/dev/vfio/devices/vfio1"
    ]
    

  3. Restart libvirt

Host Preparation

As stated earlier, creating the VMs is beyond the scope of this guide, and we assume that they have already been created. However, the VM configuration should be a migratable configuration, similarly to how it is done without virtio VFs.

The steps below should be done before running the VMs.

  1. Create the VFs that will be assigned to the VMs.

    [host]# echo "<NUM_OF_VFS>" > /sys/bus/pci/devices/<PF_BDF>/sriov_numvfs
    

  2. Unbind the VFs from virtio-pci, run:

    [host]# echo '<VF_BDF>' > /sys/bus/pci/drivers/virtio-pci/unbind
    

  3. Assign the VFs to the VMsEdit the VMs XML file, run: [host]# virsh edit <VM_NAME> Enable qemu:commandline in VM XML by adding the xmlns:qemu option: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> Assign the VFs to the VM by adding the following under the device tag: <hostdev mode='subsystem' type='pci' managed='no'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0xb1' slot='0x00' function='0x4'/> </source> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/> </hostdev> The domain, bus, slot, and function values above are dummy values; replace them with your VFs values.

  4. Set the source VMEdit the source VM XML file, run: [host]# virsh edit <VM_NAME> Set up the source VM by adding the following under domain tag: <qemu:commandline> <qemu:arg value='-object'/> <qemu:arg value='iommufd,id=iommufd0'/> <qemu:arg value='-snapshot'/> </qemu:commandline> <qemu:override> <qemu:device alias='hostdev0'> <qemu:frontend> <qemu:property name='enable-migration' type='string' value='on'/> <qemu:property name='iommufd' type='string' value='iommufd0'/> </qemu:frontend> </qemu:device> </qemu:override> To save the file, the above "xmlns:qemu" attribute of the "domain" tag must also be added.

  5. Set the destination VM in incoming modeEdit the destination VM XML file, run: [host]# virsh edit <VM_NAME> Set the destination VM in migration incoming mode by adding the following under the domain tag: <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> [...] <qemu:commandline> <qemu:arg value='--incoming'/> <qemu:arg value='tcp:<DEST_IP>:<DEST_PORT>'/> <qemu:arg value='-object'/> <qemu:arg value='iommufd,id=iommufd0'/> </qemu:commandline> <qemu:override> <qemu:device alias='hostdev0'> <qemu:frontend> <qemu:property name='enable-migration' type='string' value='on'/> <qemu:property name='iommufd' type='string' value='iommufd0'/> </qemu:frontend> </qemu:device> </qemu:override> </domain> To save the file, the above "xmlns:qemu" attribute of the "domain" tag must also be added.

  6. Bind the VFs to virtio_vfio_pci driverDetach the VFs from libvirt management, run: [host]# virsh nodedev-detach pci_<VF_BDF> Unbind the VFs from vfio-pci driver (the VFs are automatically bound to it after running "virsh nodedev-detach"), run: [host]# echo '<VF_BDF>' > /sys/bus/pci/drivers/vfio-pci/unbind Set driver override, run: [host]# echo 'virtio_vfio_pci' > /sys/bus/pci/devices/<VF_BDF>/driver_override Bind the VFs to virtio_vfio_pci driver, run: [host]# echo '<VF_BDF>' > /sys/bus/pci/drivers/virtio_vfio_pci/bind

Running the Migration
  1. Start the VMs in source and in destination, run:

    [host]# virsh start <VM_NAME>
    

  2. Enable switchover-ack QEMU migration capability. Run the following commands both in the source and the destination:

    [host]# virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_capability return-path on"
    [host]# virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_capability switchover-ack on"
    

  3. [Optional] Configure the migration bandwidth and downtime limit on the source side:

    [host]# virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_parameter max-bandwidth <VALUE>"
    [host]# virsh qemu-monitor-command <VM_NAME> --hmp "migrate_set_parameter downtime-limit <VALUE>"
    

  4. Start migration by running the migration command on the source side:

    [host]# virsh qemu-monitor-command <VM_NAME> --hmp "migrate -d tcp:<DEST_IP>:<DEST_PORT>"
    

  5. Check the migration status by running the info command on the source side:

[host]# virsh qemu-monitor-command <VM_NAME> --hmp "info migrate"

When the migration status is completed it means the migration has finished successfully.

During live migration, the device state may change temporarily. As a result, Linux NetworkManager may reset the associated network interface properties (e.g., IP address).

To prevent NetworkManager from managing a specific interface, run: 

nmcli device set {device-interface} managed no

Live Update

Live update minimizes network interface downtime by performing online upgrade of the virtio-net controller without necessitating a full restart.

Requirements

To perform a live update, the user must install a newer version of the controller either using the rpm or deb package (depending on the OS distro used). Run:

For Ubuntu/Debian

[dpu]# dpkg --force-all -i virtio-net-controller-x.y.z-1.mlnx.aarch64.deb

For CentOS/RedHat

[dpu]# rpm -Uvh virtio-net-controller-x.y.z-1.mlnx.aarch64.rpm --force

Check Versions

Before staring live update, the following command can be used to check the version of the original and destination controllers:

[dpu]# virtnet version
  {
    "Original Controller": "v24.10.13"
  },
  {
    "Destination Controller": "v24.10.16"
  }

Start Updating

If no errors occur, issue the following command to start the live update process:

[dpu]# virtnet update -s

If an error indicates that the update command is unsupported, this means the controller version you are attempting to install is outdated. Reinstalling the correct version resolves the issue.

Check Status

During the update process, the following command may be used to check the update status:

[dpu]# virtnet update -t

Example output:

{
  "status": "inactive",                       # updating status, whether live update is finished or ongoing
  "last live update status": "success",       # last live update status
  "time_used (s)": 1.655439                   # time cost for last live update
}

During the update, it is recommended to not issue any virtnet CLI command.

When the update process completes successfully, the command virtnet update status reflects the status accordingly

If a device is actively migrating, the existing virtnet commands appear as "migrating" for that specific device so that the user can retry later.

Limitation

When live update is in progress, hotplug/unplug and VF creation/deletion are not supported.

Mergeable Rx Buffer

The Mergeable Rx Buffer is a receive-side-only performance enhancement. When successfully negotiated with the driver, this feature allows the device to utilize multiple descriptors to accommodate a single jumbo-sized packet received from the network. It significantly improves memory utilization and throughput in environments configured for large Maximum Transmission Units (MTUs), such as 9K jumbo frames.

Configuration

Administrators control this feature using the VIRTIO_NET_F_MRG_RXBUF (bit 15) feature flag.

  • Default State: Disabled.

  • Scope: Can be enabled on a per-device basis. 

Refer to the "DOCA Virtio-net Service Guide | Enabling and Disabling Virtio net Features" section for the exact virtnet modify command syntax.

Limitations

Before enabling the Mergeable Rx Buffer, carefully review the following environmental constraints:

Limitation

Description

Strict MTU ceiling

The absolute maximum supported MTU when utilizing this feature is 9000. The feature will fail to operate if the MTU is set to 9216.

Performance degradation (standard MTU)

Because the number of descriptors per Work Queue Entry (WQE) depends on the MTU size, enabling this feature with a default MTU (1500) is not recommended and will negatively impact performance.

Performance degradation (small packets)

The system will experience a performance drop when processing high rates of small-sized packets (e.g., 64 bytes) from the wire. Reserve this feature exclusively for heavy jumbo-frame traffic.

Feature Incompatibility

The Mergeable Rx Buffer feature is strictly incompatible with the Packed Virtqueue feature (VIRTIO_F_RING_PACKED). You cannot enable both simultaneously on the same device.

Performance Tuning

Number of Queues and MSIX

Driver Configuration

The virtio-net driver can configure the number of combined channels via ethtool. This determines how many virtqueues (VQs) can be used for the netdev. Normally, more VQs result in better overall throughput when multi-threaded (e.g., iPerf with multiple streams).

[host]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       31
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       15

Therefore, it is common to pick a larger number (less than pre-set maximums) of channels using the following command.

Normally, configuring the combined number of channels to be the same as number of CPUs available on the guest OS will yield good performance.

[host]# ethtool -L eth0 combined 31
[host]# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       31
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          n/a
Combined:       31

Device Configuration

To reach the best performance, it is required to make sure each tx/rx queue has an assigned MSIX. Check the information of a particular device and make sure num_queues is less than num_msix.

[dpu]# virtnet query -p 0 -b | grep -i num_
      "num_msix": "64",
      "num_queues": "8",

If num_queues is greater than num_msix, it is necessary to change mlxconfig to reserve more MSIX than queues. It is determined by the VIRTIO_NET_EMULATION_NUM_VF_MSIX and VIRTIO_NET_EMULATION_NUM_MSIX. Please refer to the "DOCA Virtio-net Service Guide | Virtio net Deployment" page for more information.

Queue Depth

By default, queue depth is set to 256. It is common to use a larger queue depth (e.g., 1024). This cannot be requested from the driver side but must be done from the device side.

Refer to the "DOCA Virtio-net Service Guide | Virtnet CLI Commands" page to learn how to modify device max_queue_size.

MTU

To improve performance, the user can use jumbo MTU. Refer to "DOCA Virtio-net Service Guide | Jumbo MTU" page for information regarding MTU configuration.

Device State Recovery

The recovery process is critical for restoring both control plane and data plane statuses during disruptive events, such as a controller restart, a live update, or a live migration.

The system relies on persistent JSON files stored in the /opt/mellanox/mlnx_virtnet/recovery directory. Each physical function (PF) or virtual function (VF) device maintains a corresponding recovery file, uniquely named after the device's VUID.

Recovery File Structure

The controller saves the following configuration states to the recovery file and automatically restores them when necessary:

Entry Name

Type

Description

port_ib_dev

String

The RDMA device name on which the virtio-net device is created.

pf_id

Number

The ID of the Physical Function (PF).

vf_id

Number

The ID of the Virtual Function (VF). (Valid for VFs only).

function_type

String

Identifies the function as either a pf or vf.

bdf_raw

Number

The virtio-net device Bus:Device:Function (BDF) represented as a uint16 type.

device_type

String

Specifies if the device is static or hotplug. (Valid for PFs only).

mac

String

The MAC address of the device.

pf_num

Number

The PCIe function number.

sf_num

Number

The Sub-Function (SF) number utilized for this virtio-net device.

mq

Number

The number of multi-queues created for this virtio-net device.

rx_mode

Number

A 32-bit value representing reception modes. Bits 0–5 correspond to:

  • 0: Promiscuous

  • 1: All multicast

  • 2: All unicast

  • 3: No multicast

  • 4: No unicast

  • 5: No broadcast

vlan_tags

Array

An array storing VLAN IDs (0–4095) configured for VLAN filtering.

mac_table

Array

Contains MAC addresses for filtering, alongside metadata (e.g., entry count, first multicast index, and overflow flags).

offloads

Number

Stores the active state of hardware offload features (e.g., VIRTIO_NET_F_GUEST_CSUM).

sf_parent_device

String

The Sub-Function (SF) parent device identifier.

rx_mode_cmd

Number

The RX mode command state.

mac_cmd

Number

The MAC command state.

vlan_table_cmd

Number

The VLAN table command state.

announce_cmd

Number

The Announce command state.

announce

Number

The Announce flag status.

lm_status

Number

The current Live Migration status.

dev_mode

Number

The operational device mode.

dirty_log

String

Dirty log tracking information (vital for live migration).

transitional

Number

Transitional mode flags.

sys_path

String

The system path required for SF restoration.

sys_name

String

The system name required for SF restoration.

hash_cfg

Object

The active Hash configuration.

pkt_cnt

Number

Packet count mode flags.

net_dim

Number

Net-DIM (Dynamic Interrupt Moderation) mode flags.

aarfs

Number

aRFS (Accelerated Receive Flow Steering) mode flags.

hp_host_awareness_mode

Number

Hotplug host awareness mode configuration (e.g., AB mode status).

Example Recovery File

The following JSON payload illustrates a standard recovery file for a hotplugged PF device:

{
  "port_ib_dev": "mlx5_0",
  "pf_id": 0,
  "function_type": "pf",
  "bdf_raw": 57611,
  "device_type": "hotplug",
  "mac": "0c:c4:7a:ff:22:93",
  "pf_num": 0,
  "sf_num": 2000,
  "mq": 3
}

Use Cases

Depending on the actions of the BlueField or host, recovery may or may not be performed. Please refer to the following table for individual scenarios:


DPU Actions

Host Actions

Restart Controller

Live Update

Hot Unplug

Destroy VFs

Unload Driver

Power Cycle Host & DPU

Warm Reboot

Live Migration

Static PF

Recover

Recover

N/A

N/A

Recover

No recover

Recover

Recover

Hotplug PF

Recover

Recover

No recover

N/A

Recover

No recover

Recover

Recover

VF

Recover

Recover

N/A

Recovery file deleted

No Recover

No recover

No recover

Recover

These recovery files are internal to the controller and should not be modified.

Controller recovery is enabled by default and does not need user configuration or intervention. When the mlxconfig settings used by the controller take effect, the newly started controller service automatically deletes all recovery files.

Recovery File Validation and Corruption Handling

During startup, the controller strictly validates the integrity of all stored JSON recovery files before applying any state restorations. If the controller detects corrupted, malformed, or invalid data in any single recovery file during the startup sequence, it will automatically purge all recovery files in the directory and perform a fresh restart.

Transitional Device

A transitional device is a virtio device which supports drivers conforming to virtio specification 1.x and legacy drivers operating under virtio specification 0.95 (i.e., legacy mode) so servers with old Linux kernels can still utilize virtio-based technology. 

Currently, only transitional VF devices are supported.

Host kernel version must be newer than v6.9.

When using this feature, vfe-vdpa-dpdk solutions cannot be used anymore, including vfe-vdpa-dpdk live migration.

Libvirt does not support the virtio_vfio_pci kernel driver. Use the QEMU command line to start the VM instead.

Transitional Virtio-net VF Device

  1. Configure virtio-net SR-IOV. Refer to "DOCA Virtio-net Service Guide | Virtio net Deployment" for details.

  2. Modify the configuration file to add the "virtio_spec_admin_legacy": 1 option.

    [dpu]# cat /opt/mellanox/mlnx_virtnet/virtnet.conf
    {
    ...
    "virtio_spec_admin_legacy": 1,
    ...
    }
    
    


  3. Restart the virtio-net controller for the configuration to take effect:

    [dpu]# systemctl restart virtio-net-controller.service
    
    


  4. Create virtio-net VF devices on the host:

    [host]# modprobe -r virtio_net
    [host]# modprobe -r virtio_pci
    [host]# modprobe virtio_net
    [host]# modprobe virtio_pci
    [host]# echo <vf_num> > /sys/bus/pci/devices/<pf_bdf>/sriov_numvfs
    
    


  5. Bind the VF devices with the virtio_vfio_pci kernel driver:

    [host]# echo <vf_bdf> > /sys/bus/pci/devices/<vf_bdf>/driver/unbind
    [host]# echo 0x1af4 0x1041 > /sys/bus/pci/drivers/virtio_vfio_pci/new_id
    [host]# modprobe -v virtio_vfio_pci
    [host]# lspci -s <vf_bdf> -vvv | grep -i virtio_vfio_pci
    Kernel driver in use: virtio_vfio_pci
    
    


  6. Add the following option into the QEMU command line to passthrough the VF device into the VM:

    -device vfio-pci,host=<vf_bdf>,id=hostdev0,bus=pci.<#BUS_IN_VM>,addr=<#FUNC_IN_VM>
    
    


  7. Load virtio-net driver as legacy mode inside the VM:

    [vm]# modprobe -r virtio_net
    [vm]# modprobe -r virtio_pci
    [vm]# modprobe virtio_pci force_legacy=1
    [vm]# modprobe virtio_net
    [vm]# lspci -s <vf_bdf_in_vm> -n
    00:0a.0 0200: 1af4:1000
    
    


  8. Verify that the VF is a transitional device:

    [dpu]# virtnet query -p <pf_id> -v <vf_id> | grep transitional
          "transitional": 1,
    
    


VF Dynamic MSIX

In virtio-net controller, each VF gets the same number of MSIX and virtqueues (VQs) so that each data VQ has a MSIX assigned. This means that changing the number of MSIX updates the number of VQs.

By default, each VF is assigned with the same number of MSIX, the default number is determined by the minimum of NUM_VF_MSIX and VIRTIO_NET_EMULATION_NUM_MSIX.

Using dynamic VF MSIX, a VF can be assigned with more MSIX/queues than its default. MSIX hardware resources of all VF devices are managed by PF via a shared MSIX pool. The user can reduce the MSIX of one VF, thus releasing its MSIX resources to the shared pool. On the other hand, another VF can be assigned with more MSIX than its default to gain more performance.

image-2024-4-18_15-17-0.png

Firmware Configuration

The emulation VF device uses VIRTIO_NET_EMULATION_NUM_VF_MSIX to set the MSIX number.

VIRTIO_NET_EMULATION_NUM_VF_MSIX is available to set the MSIX number of the emulation VF device. For the emulation VF device, uses the new configuration VIRTIO_NET_EMULATION_NUM_VF_MSIX instead of the old configuration NUM_VF_MSIX.

  • If VIRTIO_NET_EMULATION_NUM_VF_MSIX!=0, VIRTIO_NET_EMULATION_NUM_ MSIX is used for the PF only, and VF uses VIRTIO_NET_EMULATION_NUM_VF_MSIX.

    For example, to configure the default MSIX number for a VF to 32:

    [dpu]# mlxconfig -y -d 03:00.0 s VIRTIO_NET_EMULATION_NUM_ MSIX=32 VIRTIO_NET_EMULATION_NUM_VF_MSIX=32
    
    


  • If VIRTIO_NET_EMULATION_NUM_VF_MSIX==0, VIRTIO_NET_EMULATION_NUM_ MSIX is used for the PF and VF.

The default number of MSIX for each VF is determined by minimum(NUM_VF_MSIX, VIRTIO_NET_EMULATION_NUM_MSIX). For example, to configure the default MSIX number for a VF to 32:

[dpu]# mlxconfig -y -d 03:00.0 s VIRTIO_NET_EMULATION_NUM_MSIX=32 NUM_VF_MSIX=32

Power cycle the BlueField and host to have the mlxconfig taking effect.

MSIX

MSIX Capability

The MSIX pool for VFs is managed by their PF. To check the share pool size, run the following command (using PF 0 as example):

[dpu]# virtnet list | grep -i '"pf_id": 0' -A 8 | grep -i msix_num_pool_size

By default, the share pool size is empty (0), since all MSIX resources have already been allocated to VFs evenly. Upon reducing the MSIX of one or more VFs, the reduced MSIX is released back to the pool.

However, the number of MSIX can be assigned to a given VF is also bound by capability. To check those caps, run the following command:

[dpu]# virtnet list | grep -i '"pf_id": 0' -A 10 | grep -i max_msix_num
[dpu]# virtnet list | grep -i '"pf_id": 0' -A 10 | grep -i min_msix_num

To check the currently assigned number of MSIX, run the following command:

[dpu]# virtnet query -p 0 -v 0 | grep num_msix

If num_msix is less than max_msix_num cap, more MSIX can be assigned to the VF.

Reallocating VF MSIX

To allocate more MSIX to one VF, there should be MSIX available from the pool. This is done by reducing the MSIX from another VF(s).

The following example shows the steps to reallocate MSIX from VF1 to VF0, assuming that each VF has 32 MSIX available as default:

  1. Unbind both VF devices from host driver.

    [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    
    
  2. Reduce the MSIX of VF1.

    [dpu]# virtnet modify -p 0 -v 1 device -n 4
    
    
  3. Check pool size of PF0.

    [dpu]# virtnet list | grep -i '"pf_id": 0' -A 8 | grep -i msix_num_pool_size
    
    

    Confirm the reduced MSIX are added to the share pool.

  4. Increase the MSIX of VF0.

    [dpu]# virtnet modify -p 0 -v 0 device -n 48
    
    
  5. Check the MSIX of VF0.

    [dpu]# virtnet query -p 0 -v 0 | grep -i num_msix
    
    
  6. Bind both VF devices to host driver.

    [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/bind
    [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/bind
    
    

    The number of MSIX must be an even number greater than 4.

MSIX Limitations
  • MSIX and QP configuration is mutually exclusive (i.e., only one of them can be configured at a time). For example, the following modify command should result in failure:

    [dpu]# virtnet modify -p 0 -v 1 device -qp 2 -n 6
    
    
  • To use a VF, make sure to assign a valid MSIX number:

    [dpu]# virtnet modify -p 0 -v 1 device -n 10
    
    

    The minimum number of MSIX resources required for the VF to load the host driver is 4 if VIRTIO_NET_F_CTRL_VQ is negotiated, or 2 if it is not.

  • The MSIX resources of a VF can be reduced to 0, but doing so prevents the VF from functioning.

    [dpu]# virtnet modify -p 0 -v 1 device -n 0
    
    

Queue Pairs

Queue pairs (QPs) are the number of data virtio queue (VQ) pairs. Each VQ pair has one transmit (TX) queue and one receive (RX) queue.  These pairs are dedicated to handling data traffic and do not include control or admin VQs.

QP Capability

The QP pool for VFs is managed by their PF.

To check the shared pool size, run the following command (using PF 0 as example):

[dpu]# virtnet list | grep -i '"pf_id": 0' -A 13 | grep -i qp_pool_size

By default, the shared pool size is empty (0), since all QP resources have already been allocated to VFs evenly. Upon reducing the QP of one or more VFs, the reduced QP is released back into the pool.

However, the number of QPs assignable to a VF depends on its supported capabilities. To verify these capabilities, run the following command:

[dpu]# virtnet list | grep -i '"pf_id": 0' -A 12 | grep -i max_num_of_qp
[dpu]# virtnet list | grep -i '"pf_id": 0' -A 12 | grep -i min_num_of_qp

To check the currently assigned number of QPs, run the following command:

[dpu]# virtnet query -p 0 -v 0 | grep max_queue_pairs

If max_queue_pairs is less than max_num_of_qp cap, then more QPs can be assigned to the VF.

Reallocating VF QPs

To allocate more QPs to one VF, there should be QPs available from the pool as explained in the previous section.

The following example illustrates the process of reallocating a QP from VF1 to VF0, assuming that each VF initially has 32 QPs available by default:

  1. Unbind both VF devices from the host driver:

    [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/unbind
    
    
  2. Reduce the number of QPs VF1 has:

    [dpu]# virtnet modify -p 0 -v 1 device -qp 1
    
    
  3. Check the pool size of PF0 and confirm that the reduced number of QPs are added to the shared pool:

    [dpu]# virtnet list | grep -i '"pf_id": 0' -A 13 | grep -i qp_pool_size
    
    
  4. Increase the number of QPs VF0 has:

    [dpu]# virtnet modify -p 0 -v 0 device -qp 23
    
    
  5. Check the number of QPs VF0 has:

    [dpu]# virtnet query -p 0 -v 0 | grep -i max_queue_pairs
    
    
  6. Bind both VF devices to the host driver:

    [host]# echo <vf0_bdf> > /sys/bus/pci/drivers/virtio-pci/bind
    [host]# echo <vf1_bdf> > /sys/bus/pci/drivers/virtio-pci/bind
    
    

    The number of QPs must be greater than 0.

QP Limitations
  • QP and MSIX configuration is mutually exclusive (i.e., only one of them can be configured at a time). For example, the following modify command should result in failure:

    [dpu]# virtnet modify -p 0 -v 1 device -qp 2 -n 6
    
    
  • To use a VF, assign it with a valid QP number:

    [dpu]# virtnet modify -p 0 -v 1 device -n 4
    
    

    The minimum number of QP resources which allows the VF to load the host driver is 1.

  • The QP resources of a VF can be reduced to 0. However, the VF would not be functional in this case.

    [dpu]# virtnet modify -p 0 -v 1 device -qp 0
    
    

Virt Queue Types

Virt queues (VQs) are the mechanism for bulk data transport on virtio devices. Each device can have zero or more VQs.

VQs can be in one of the following modes:

  • Split

  • Packed

When changing the supported VQ types, make sure to unload the guest driver first so the device can modify the supported feature bits.

Split VQ

Currently the default VQ type. Split VQ format is the only format supported by version 1.0 of the virtio spec.

In split VQ mode, each VQ is separated into three parts:

  • Descriptor table – occupies the descriptor area

  • Available ring – occupies the driver area

  • Used ring – occupies the device area

Each of these parts is physically-contiguous in guest memory. Split VQ has a very simple design, but its sparse memory usage puts pressure on CPU cache utilization and requires several PCIe transactions for each descriptor.

Configuration

The following shows how the output of the virtnet list command appears only when split VQ mode is enabled:

 "supported_virt_queue_types": {
      "value": "0x1",
      "    0": "SPLIT"
    },

Packed VQ

Packed Virtqueue addresses the inherent limitations of the legacy Split VQ design by merging the three separate descriptor rings into a single, contiguous location within the virtual environment's guest memory.

This streamlined memory layout significantly reduces the number of PCIe transactions required and improves CPU cache utilization per descriptor access, leading to better overall network performance.

Prerequisites

Packed VQ is supported from kernel 5.0 onwards, specifically requiring the virtio-support-packed-ring commit within the guest operating system.

Configuration

Administrators control this feature using the VIRTIO_F_RING_PACKED (bit 34) feature flag.

  • Default State: Disabled.

  • Scope: Can be enabled on a per-device basis.

Refer to the "DOCA Virtio-net Service Guide | Enabling and Disabling Virtio net Features" section for the exact virtnet modify command syntax.

Limitations

The following features are not currently supported when packed VQ is enabled:

  • Mergeable Rx buffer

  • Jumbo MTU

  • UDP segmentation offload (USO)

  • RSS hash report

Virtio-net Feature Bits

Per virtio spec, virtio the device negotiates with the virtio driver on the supported features when the driver probes the device. The final negotiated features are a subset of the features supported by the device.

From the controller's perspective, all feature bits can be supported by a device are populated by virtnet list. Each individual virtio-net device is able to choose the feature bits supported by itself.

The following is a list of the feature bits currently supported by controller:

  • VIRTIO_NET_F_CSUM

  • VIRTIO_NET_F_GUEST_CSUM

  • VIRTIO_NET_F_CTRL_GUEST_OFFLOADS

  • VIRTIO_NET_F_MTU

  • VIRTIO_NET_F_MAC

  • VIRTIO_NET_F_HOST_TSO4

  • VIRTIO_NET_F_HOST_TSO6

  • VIRTIO_NET_F_MRG_RXBUF

  • VIRTIO_NET_F_STATUS

  • VIRTIO_NET_F_CTRL_VQ

  • VIRTIO_NET_F_CTRL_RX

  • VIRTIO_NET_F_CTRL_VLAN

  • VIRTIO_NET_F_GUEST_ANNOUNCE

  • VIRTIO_NET_F_MQ

  • VIRTIO_NET_F_CTRL_MAC_ADDR

  • VIRTIO_F_VERSION_1

  • VIRTIO_F_IOMMU_PLATFORM

  • VIRTIO_F_RING_PACKED

  • VIRTIO_F_ORDER_PLATFORM

  • VIRTIO_F_SR_IOV

  • VIRTIO_F_NOTIFICATION_DATA

  • VIRTIO_F_RING_RESET

  • VIRTIO_F_ADMIN_VQ

  • VIRTIO_NET_F_HOST_USO

  • VIRTIO_NET_F_HASH_REPORT

  • VIRTIO_NET_F_GUEST_HDRLEN

  • VIRTIO_NET_F_SPEED_DUPLEX

For more information on these bits, refer to the VIRTIO Version 1.2 Specifications.

Virtio-net Event Notifications

Virtio-net Event Notifications provide real-time, asynchronous notifications of VF lifecycle and state changes from the virtio-net-controller on the DPU to external consumers, such as orchestrators, monitoring systems, and management agents.

Events are published as JSON messages over a NATS message broker. The design is best-effort with bounded queues, meaning the controller's critical data-path and Live Migration (LM) paths are never blocked by event delivery.

Supported Event Types

Event Type

When Emitted

VF_CREATED

VF device successfully opened

VF_DESTROYED

VF device closed/torn down

VF_SUSPENDED

VF suspended (LM quiesce)

VF_RESUMED

VF resumed (LM un-quiesce)

VF_DRIVER_STATE_CHANGED

VF driver state transition (de-duplicated)

VF_LM_STATE_CHANGED

LM state transition

Prerequisites

  • The virtio-net-controller RPM package (provided by NVIDIA). NATS support is built-in by default; no additional build-time setup is required.

  • A running NATS broker on the same DPU (localhost), listening on 127.0.0.1:4222. The nats-server binary is not included in the BFB image and must be installed separately by the user.

  • Tested versions:

    Component

    Version

    Notes

    nats-server

    2.12.4

    Broker binary (user-installed)

    nats.c (C client)

    3.12.0

    Build-time dependency (pre-installed in BFB)

    The feature relies only on basic NATS publish/subscribe functionality. Newer compatible versions of nats-server are expected to work but have not been validated. 

    Localhost Only

    The NATS broker must run on the same DPU as the virtio-net-controller. Remote broker connections are not supported at this time. The event channel does not currently implement TLS encryption or authentication, so NVIDIA does not take responsibility for securing remote connections. Binding the broker to 127.0.0.1 ensures that event traffic stays local to the DPU.

Setting Up the NATS Broker

The NATS broker (nats-server) is a lightweight, standalone binary. It is not included in the BFB image and must be installed by the user. It must run on the DPU itself, bound to 127.0.0.1 (localhost only).

  1. Install nats-server:Option A: Package manager (recommended for production)  # Ubuntu / Debian sudo apt-get install nats-server # RHEL / Rocky / CentOS sudo dnf install nats-server Option B: Download prebuilt binary  NATS_VER=2.12.4 ARCH=linux-arm64 # or linux-amd64 curl -fL -o nats-server.tar.gz \ "https://github.com/nats-io/nats-server/releases/download/v${NATS_VER}/nats-server-v${NATS_VER}-${ARCH}.tar.gz" tar -xzf nats-server.tar.gz sudo cp nats-server-v${NATS_VER}-${ARCH}/nats-server /usr/local/bin/

  2. Start the broker: 

    nats-server -a 127.0.0.1 -p 4222 &

  3. Verify the broker is running: 

    # Quick check -- NATS exposes a monitoring HTTP endpoint:
    curl http://localhost:8222/varz 2>/dev/null | head -5
    
    # Or simply:
    nats-server --help  # Confirms the binary is installed

  4. For production, run nats-server as a systemd service: 

    # /etc/systemd/system/nats-server.service
    [Unit]
    Description=NATS messaging server
    After=network.target
    
    [Service]
    ExecStart=/usr/local/bin/nats-server -a 127.0.0.1 -p 4222
    Restart=always
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target
    sudo systemctl daemon-reload
    sudo systemctl enable --now nats-server
    sudo systemctl status nats-server

NATS Subject Scheme

Events are published to NATS subjects with the following structure:

<subject_prefix>.<pf_index>.<vf_index>.<category>.<name>
Subject Mapping

Event Type

Subject Suffix

VF_CREATED

lifecycle.created

VF_DESTROYED

lifecycle.destroyed

VF_SUSPENDED

lm.suspended

VF_RESUMED

lm.resumed

VF_DRIVER_STATE_CHANGED

driverstate.changed

VF_LM_STATE_CHANGED

lm.statechanged

Example: 

virtio.vf.0.1.lifecycle.created       # PF 0, VF 1 created
virtio.vf.0.3.lm.suspended           # PF 0, VF 3 suspended for LM
virtio.vf.1.0.driverstate.changed     # PF 1, VF 0 driver state change
Subscribing with Wildcards

NATS wildcard subjects allow flexible filtering:

virtio.vf.>              # All events, all VFs
virtio.vf.0.>            # All events for PF 0
virtio.vf.0.2.lifecycle.* # All lifecycle events for PF 0, VF 2
virtio.vf.*.*.lm.* # All LM events across all PFs/VFs

JSON Event Schema (v1)

Every event is published as a single JSON object. Schema version is 1.

  • Example: VF created

    {
      "schema_version": 1,
      "type": "VF_CREATED",
      "timestamp_ns": "123456789012345",
      "vuid": "MT2333ABCDEF0123",
      "pf_index": 0,
      "vf_index": 1,
      "driver_state": "UNKNOWN"
    }

  • Example: VF suspended (LM)

    {
      "schema_version": 1,
      "type": "VF_SUSPENDED",
      "timestamp_ns": "223456789012345",
      "vuid": "MT2333ABCDEF0123",
      "pf_index": 0,
      "vf_index": 1,
      "lm_state": "SUSPENDED",
      "driver_state": "DRIVER_OK"
    }

  • Example: Driver state changed

    {
      "schema_version": 1,
      "type": "VF_DRIVER_STATE_CHANGED",
      "timestamp_ns": "323456789012345",
      "vuid": "MT2333ABCDEF0123",
      "pf_index": 0,
      "vf_index": 1,
      "driver_state": "DRIVER_OK"
    }

Field reference:

Field

Type

Always Present

Description

schema_version

number

Yes

Always 1 for v1

type

string

Yes

Event type name

timestamp_ns

string

Yes

CLOCK_MONOTONIC nanoseconds (string to preserve uint64 precision)

vuid

string

Yes

VF unique identifier

pf_index

number

Yes

Physical function index

vf_index

number

Yes

Virtual function index

lm_state

string

No

Present only when relevant. Values: SUSPENDED, RUNNING

driver_state

string

Yes

Values: UNKNOWN, RESET, ACKNOWLEDGE, DRIVER, FEATURES_OK, DRIVER_OK, DEVICE_NEEDS_RESET, FAILED

Consuming Events

There are two ways to consume VF events from the NATS broker:

  • Native NATS client – use any NATS client library (Go, Python, C, Java, etc.) to subscribe directly to the broker. This is the recommended approach for most integrations.

  • vnet-event subscriber API (libvnet_event) – a C library provided with the virtio-net-controller that handles NATS transport, JSON decoding, bounded queuing, and delivers parsed struct vnet_event to a callback. Useful for C/C++ consumers that want structured event access.

Option A: Native NATS Client (any language)

Any standard NATS client library can subscribe to the event subjects. The consumer receives raw JSON and parses it according to the schema.

Python example (using the nats-py package):

import asyncio
import json
import nats

async def main():
    nc = await nats.connect("nats://127.0.0.1:4222")

    async def on_event(msg):
        event = json.loads(msg.data.decode())
        print(f"[{msg.subject}] type={event['type']} "
              f"vuid={event['vuid']} pf={event['pf_index']} vf={event['vf_index']} "
              f"driver_state={event.get('driver_state', 'N/A')} "
              f"lm_state={event.get('lm_state', 'N/A')}")

    # Subscribe to all VF events:
    await nc.subscribe("virtio.vf.>", cb=on_event)

    # Or subscribe to specific events:
    # await nc.subscribe("virtio.vf.0.*.lm.*", cb=on_event)

    # Run until interrupted
    try:
        await asyncio.Event().wait()
    except KeyboardInterrupt:
        pass
    finally:
        await nc.drain()

asyncio.run(main())

Go example (using the nats.go package): 

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "os"
    "os/signal"

    "github.com/nats-io/nats.go"
)

type VNetEvent struct {
    SchemaVersion int    `json:"schema_version"`
    Type          string `json:"type"`
    TimestampNs   string `json:"timestamp_ns"`
    VUID          string `json:"vuid"`
    PFIndex       int    `json:"pf_index"`
    VFIndex       int    `json:"vf_index"`
    LMState       string `json:"lm_state,omitempty"`
    DriverState   string `json:"driver_state"`
}

func main() {
	nc, err := nats.Connect("nats://127.0.0.1:4222")
    if err != nil {
        log.Fatal(err)
    }
    defer nc.Drain()

    nc.Subscribe("virtio.vf.>", func(msg *nats.Msg) {
        var ev VNetEvent
        if err := json.Unmarshal(msg.Data, &ev); err != nil {
            log.Printf("decode error: %v", err)
            return
        }
        fmt.Printf("[%s] type=%s vuid=%s pf=%d vf=%d driver_state=%s\n",
            msg.Subject, ev.Type, ev.VUID, ev.PFIndex, ev.VFIndex, ev.DriverState)
    })

    sig := make(chan os.Signal, 1)
    signal.Notify(sig, os.Interrupt)
    <-sig
}
Option B: Vnet-event Subscriber API (C Library)

The libvnet_event library provides a C subscriber API that handles NATS transport internally and delivers events via a callback. The library manages connection retry, bounded queuing, and optional JSON-to-struct parsing.

Installed paths:

  • Library: /usr/lib/libvnet_event.a (or /usr/lib64/)

  • Header: /usr/include/vnet_event.h

  • Reference subscriber tool: /usr/sbin/vnet_event_subscriber

API lifecycle:

vnet_event_sub_create()   -- allocate handle, configure broker/filter/queue
        |
vnet_event_sub_start()    -- start worker + transport threads, register callback
        |
   (callback invoked for each received event)
        |
vnet_event_sub_destroy()  -- stop threads, free resources

Key types:

#include "vnet_event.h"

/* Opaque subscriber handle. */
typedef struct vnet_event_sub *vnet_event_sub_t;

/* Subscriber configuration. */
struct vnet_event_sub_cfg {
    char     broker_url[256];       /* NATS broker URL */
    char     subject_filter[128];   /* NATS subject filter */
    uint32_t connect_timeout_ms;    /* 0 => 2000; range 100..30000 */
    uint32_t reconnect_backoff_ms;  /* 0 => 1000; range 100..60000 */
    uint32_t max_queue_depth;       /* 0 => 4096; range 16..65536 */
    bool     deliver_parsed;        /* true: parse JSON into struct */
};

/* Callback signature. */
typedef void (*vnet_event_cb)(const struct vnet_event *ev,   /* parsed (or NULL) */
                              const char *json,              /* raw JSON */
                              size_t json_len,               /* JSON length */
                              void *cb_arg);                 /* user context */

C example – subscribe and print events: 

#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>

#include "vnet_event.h"

static volatile sig_atomic_t g_stop;

static void on_signal(int sig) { (void)sig; g_stop = 1; }

static void on_event(const struct vnet_event *ev,
                     const char *json, size_t json_len, void *arg)
{
    (void)arg;

    if (ev) {
        printf("type=%-24s vuid=%-20s pf=%u vf=%u driver_state=%u\n",
               ev->type == 1 ? "VF_CREATED" :
               ev->type == 2 ? "VF_DESTROYED" :
               ev->type == 3 ? "VF_SUSPENDED" :
               ev->type == 4 ? "VF_RESUMED" : "OTHER",
               ev->vuid, ev->pf_index, ev->vf_index, ev->driver_state);
    }

    /* Raw JSON is always available: */
    printf("  json(%zu): %.*s\n", json_len, (int)json_len, json);
}

int main(void)
{
    struct vnet_event_sub_cfg cfg = {};
    vnet_event_sub_t sub = NULL;
    int ret;

    signal(SIGINT, on_signal);
    signal(SIGTERM, on_signal);

    /* Configure the subscriber. */
    snprintf(cfg.broker_url, sizeof(cfg.broker_url),
             "nats://127.0.0.1:4222");
    snprintf(cfg.subject_filter, sizeof(cfg.subject_filter),
             "virtio.vf.>");
    cfg.max_queue_depth = 1024;
    cfg.deliver_parsed  = true;

    ret = vnet_event_sub_create(&cfg, &sub);
    if (ret) {
        fprintf(stderr, "sub_create failed: %d\n", ret);
        return 1;
    }

    ret = vnet_event_sub_start(sub, on_event, NULL);
    if (ret) {
        fprintf(stderr, "sub_start failed: %d\n", ret);
        vnet_event_sub_destroy(sub);
        return 1;
    }

    printf("Listening on %s filter='%s' ... Ctrl+C to stop\n",
           cfg.broker_url, cfg.subject_filter);

    while (!g_stop)
        sleep(1);

    /* Query subscriber health before shutdown. */
    {
        struct vnet_event_sub_stats st = {};

        if (vnet_event_sub_stats_get(sub, &st) == 0) {
            printf("stats: enq=%lu drop=%lu decode_fail=%lu"
                   " conn_fail=%lu sub_fail=%lu"
                   " nextmsg_fail=%lu reconnect=%lu"
                   " last_err=%d depth=%u queued=%u\n",
                   st.enqueued, st.dropped_queue_full,
                   st.decode_fail, st.connect_fail,
                   st.subscribe_fail, st.next_msg_fail,
                   st.reconnect_attempts, st.last_error,
                   st.max_queue_depth,
                   st.current_queue_count);
        }
    }

    vnet_event_sub_destroy(sub);
    return 0;
}

Subscriber Stats API

The subscriber exposes runtime health counters via vnet_event_sub_stats_get(). This function is thread-safe while the subscription is active.

Counter

What to Look For

dropped_queue_full increasing

Consumer callback is too slow, or queue depth is too small.

connect_fail increasing

Broker is unreachable. Check broker_url and broker status.

subscribe_fail > 0

Subject filter may be invalid, or broker rejected the subscription.

next_msg_fail > 0

Connection was lost after a successful connect.

reconnect_attempts growing

Transport is cycling through connect/backoff retries.

current_queue_count near max_queue_depth

Consumer is falling behind; drops are imminent.

Reference Subscriber Tool

The package includes a ready-to-use subscriber at /usr/sbin/vnet_event_subscriber

vnet_event_subscriber [options]

Options:
  --broker-url URL         (default: nats://127.0.0.1:4222)
  --subject-filter FILTER  (default: virtio.vf.>)
  --parsed                 Print parsed event fields (default)
  --raw                    Print raw JSON
  --count N                Exit after N events (0 = run forever)
  --timeout-sec SEC        Exit after SEC seconds (0 = no timeout)

Examples:

# Watch all events with parsed output + raw JSON:
vnet_event_subscriber --parsed --raw

# Watch only LM events for PF 0:
vnet_event_subscriber --subject-filter 'virtio.vf.0.*.lm.*'

# Capture 10 events then exit:
vnet_event_subscriber --parsed --count 10

On exit, the tool prints subscriber stats (enqueued, drops, errors) to stderr.

Troubleshooting

Syslog Messages 

All event subsystem messages are prefixed with vnet_event: in syslog.

Level

Message Pattern

Meaning

INFO

publisher worker started

Worker thread is running.

INFO

publisher worker stopped

Worker thread exited cleanly during shutdown.

WARNING

queue full: dropping newest

Queue overflow; consider increasing max_queue_depth.

WARNING

transport publish failed

NATS publish failed (broker unreachable). Worker will retry.

WARNING

json encode failed

Internal serialization error (should not happen).

DEBUG

published subject=... json=...

Per-event trace (only when log_level >= 1).

Common Scenarios
  • dropped_queue_full increasing: The controller is generating events faster than the worker can publish. Possible causes include a slow or unreachable NATS broker (check reconnect_attempts), max_queue_depth being too small for the workload, or a standard burst of VF operations during mass hotplug.

  • transport_publish_fail increasing: The NATS broker is unreachable or rejecting messages. Check if nats-server is running (systemctl status nats-server or pgrep nats-server), verify the broker_url is correct and reachable (ping <broker-host>), and review firewall rules on port 4222.

  • reconnect_attempts growing steadily: The worker is repeatedly failing to connect. The backoff interval is reconnect_backoff_ms (default 1000ms). Verify broker availability and network path.

  • No events received by subscriber:

    • Confirm enabled is true in virtnet.conf and the controller was restarted.

    • Check virtnet debug vnet_event stats to see if enqueued is incrementing.

    • Verify the subscriber's subject_filter matches the publisher's subject_prefix.

    • Confirm subscriber is connected to the same NATS broker as the publisher.

Last updated: