DOCA SDK Documentation

Virtio-net Service Guide Release Notes

The following subsections provide information on virtio-net service new features, interoperability, known issues, and bug fixes.

Changes and New Features in This Release

Title

Description

Virtio-net Event Notifications (NATS)

Added real-time VF lifecycle and state-change notifications over a local NATS message broker. Events (create, destroy, suspend, resume, driver state, LM state) are published as JSON and can be consumed by any NATS client or via the bundled libvnet_event C subscriber API. NATS support is now built-in by default; users must install nats-server separately on the DPU and enable the feature in virtnet.conf.

Async Drop Counter Polling Mode

Introduced an asynchronous polling mode for RX drop counters, accessible using virtnet modify global dc -mode async. This feature utilizes a background thread to periodically poll and cache hardware drop counters, enabling instant queries at scale (virtnet query rx_drops) and drastically reducing monitoring time from minutes to milliseconds for high-density VF deployments.

Bug Fixes

Ref #

Issue Details

4919614

Description: Restarting the virtio-net-controller with a corrupted recovery file (invalid SF number) causes a reference count leak. This prevents the static PF from closing and leaves the controller hanging indefinitely until killed by systemd.

Detected in version: 26.01

4594583

Description: A link status race condition during hotplug PF initialization can fail MSI-X vector allocation. The controller aborts setup to prevent a crash, resulting in the host detecting a NIC with an all-zero MAC address.

Detected in version: 26.01

4986992

Description: Disabling the multi-queue feature (VIRTIO_NET_F_MQ) leaves active queue counters un-reset. This inconsistency causes resource creation failures (error -22) during RSS initialization on the next driver reload or controller restart, leaving the device in an error state.

Detected in version: 26.01

4622439

Description: The live migration process hangs at 99.99% if the virtio-net-controller service is restarted during an ongoing kernel-based migration.

Detected in version: 26.01

4810680

Description: On BlueField-3 systems lacking the SPRD EU management feature, heavy concurrent stress testing combined with dense queue configurations can cause the virtio-net-controller to fail with a failed to allocate DUAR error when attempting to create DPA thread objects.

Detected in version: 26.01

4882914

Description: On virtio hotplug PFs, the VIRTIO_F_SR_IOV feature flag (bit 37) is incorrectly exposed in the device_feature list and can be modified. The presence and modifiability of this flag are misleading and have no functional effect.

Detected in version: 26.01

4890884

Description: Creating virtio-net hotplug devices beyond the configured firmware limit (PCI_SWITCH_EMULATION_NUM_PORTS) results in a generic -1031 Failed to hotplug SNAP PF error, rather than explicitly stating that the maximum port capacity has been reached.

Detected in version: 26.01

4891120

Description: Attempting to create a legacy hotplug device on a BlueField-3 DPU using the -l (legacy) flag results in a generic Invalid argument error. The legacy hotplug feature is strictly supported only on BlueField-2 devices, but the current error message fails to clearly state this hardware limitation.

Detected in version: 26.01

4897316

Description: The drop_cnt (drop counter) feature fails to persist and reverts to a "disabled" state after a virtio-net-controller restart.

Detected in version: 26.01

4902251

Description: Restarting the virtio-net-controller after enabling Dynamic Interrupt Moderation (DIM) triggers DPA resource leak warnings during the shutdown sequence.

Detected in version: 26.01

4915060

Description: A host reboot or cold boot (PERST# assertion) can cause BlueField-3 DPUs to hang if doca_devemu_virtio_offload_engine_start() is called without subsequently creating VQs.

Detected in version: 26.01

4926603

Description: Processing 32-packet DMA batches on large queues (e.g., 1024-depth) can overflow the TX SQ. This blocks packet completions, resulting in a host-side TX queue hang that cascades into a DPA crash without generating a diagnostic core dump.

Detected in version: 26.01

4944820

Description: During a live update, the virtio-net emulator (virtio_net_emu) may encounter a segmentation fault (SIGSEGV) and crash.

Detected in version: 26.01

4915382

Description: LSO headers are not strictly validated, allowing non-spec gso_type values and gso_size = 0 to be programmed into the WQE MSS and hang TX.

Detected in version: 26.01

4760849

Description: During a live update, the virtio-net-controller may crash if the high availability process inadvertently closes a newly assigned VF kernel resource, causing the VF to fail to load.

Detected in version: 26.01

4958540

Description: Concurrently reinstalling multiple VMs may generate cosmetic vfe-vhostd error logs (e.g., failed restore state ret:5).

Detected in version: 26.01

Known Issues

The following are known limitations of this NVIDIA® BlueField® virtio-net software version.

Ref #

Issue Details

5013051

Description: Following a NATS broker restart, the virtio-net-controller relies on lazy reconnection. Consequently, the first event generated while disconnected is lost. However, this dropped event triggers a successful reconnection, allowing all subsequent events to be delivered normally.

Workaround: N/A

Keyword: LM; vnet_event

Reported in version: 26.04

4898379

Description: Manually issuing inactive and active state commands (e.g., via devlink) on a SF is not supported while the virtio-net controller is active. Executing these state changes abruptly tears down the underlying firmware and hardware resources without properly notifying the controller. This causes an immediate loss of traffic for all connections associated with the SF, and the system cannot automatically recover from this state.

Workaround: Avoid manually toggling SF states while the virtio-net controller is running. If an SF is inadvertently toggled and traffic drops, you must perform a full manual reinitialization of both the affected SF and the controller to restore network connectivity.

Keyword: Scalable function; recovery

Reported in version: 26.04

4961952

Description: If the virtio-net controller is restarted during an active vDPA live migration, transient RQT modify errors (e.g., "Remote I/O error") may appear in the system logs on the migration source VF. These error messages are strictly cosmetic and have no functional impact.

Workaround: N/A

Keyword: Live migration; vDPA

Reported in version: 26.04

4914672

Description: During rapid, repeated virtual machine stress testing (e.g., executing virsh destroy followed by virsh start), Windows VMs may appear unresponsive or fail to answer network pings.

Workaround: Increase the boot wait time in your automation scripts to a minimum of 200 seconds to provide the Windows VM sufficient time to fully complete its crash recovery, finish the boot process, and initialize its virtio-net interfaces before network connectivity is verified.

Keyword: Windows VM; reboot

Reported in version: 26.04

4914672

Description: CentOS 7 virtual machines running older kernels (specifically kernel 3.10) may experience a transient soft lockup in virtnet_send_command when the virtio-net controller undergoes rapid, successive restarts. During this event, the system may report that the CPU is stuck for approximately 22 seconds. This behavior is isolated to older kernel versions; VMs running newer kernels, such as modern Ubuntu or Windows releases, are unaffected.

Workaround: N/A

Keyword: Kernel 3.10.0; CentOS 7; lockup

Reported in version: 26.04

4797496

Description: Virtio-net does not support Packed Virtqueues ("packed_vq": 1) in the following scenarios:

  • It is not supported on any VF configured for live migration.

  • It is not supported on any PF that has the Admin Queue (AQ) enabled.

Workaround: Explicitly disable Packed VQs by setting "packed_vq": 0 in the virtnet.conf configuration file for any PFs using AQs or VFs intended for live migration. Use the default Split VQ (split_vq) mode instead.

Keyword: VQ; live migration

Reported in version: 25.10

4498529

Description: Windows VM may take a lot of time to load when VLAN tagging is enabled.

Workaround: Disable VLAN tagging.

Keyword: Windows; VLAN

Reported in version: 25.10

4534273

Description: After installing a new version, virtnet -v or --version displays the version of the updated CLI, not the source or target upgrade versions.

Workaround: Run virtnet version to view both the original and destination versions.

Keyword: CLI; version; update

Reported in version: 25.07

3879093

Description: When creating a large number of virtio-net VFs, the representor name of the SF may not be renamed.

Workaround: Use the ip command to rename the representor manually.

Keyword: Representor

Reported in version: 24.10

3943905

Description: Host OS kernel <3.19 does not support 31 hotplug devices.

Workaround: Avoid hotplugging more than 20 devices if host OS kernel is <3.19, or upgrade the kernel to ≥3.19.

Keyword: Host OS; kernel; hotplug

Reported in version: 24.07

4022160

Description: Feature bit VIRTIO_NET_F_CTRL_VLAN is not supported. Enabling it from the hotplug device may results in anomalous behavior.

Workaround: Disable VIRTIO_NET_F_CTRL_VLAN.

Keyword: Feature bit

Reported in version: 24.07

4001261

Description: The virtnet.conf file does not check invalid values such as negative numbers or 0.

Workaround: N/A

Keyword: Virtnet; config; invalid value

Reported in version: 24.07

3965598

Description: Admin-VQ-based transitional VF show a vf_get error when the controller is restarted. However, VF functionality is not affected.

Workaround: N/A

Keyword: Admin VQ; transitional device

Reported in version: 24.07

3961951

Description: Out-of-memory call trace occurs when creating many (>300) VFs on a BlueField running OpenEuler or CentOS 7.6.

Workaround: Update the kernel to support shared RQ.

Keyword: OOM; OpenEuler; CentOS 7.6; virtual function

Reported in version: 24.07

3862683

Description: Creating VFs and hotplug PFs in parallel can lead to controller crash.

Workaround: Create VFs followed by hotplug PF or vice versa.

Keyword: Virtio-net emulation

Reported in version: 1.9.0

3665070

Description: Virtio-net controller fails to load if DPA_AUTHENTICATION is enabled.

Workaround: N/A

Keywords: Virtio-net; DPA

Reported in version: DOCA 2.5.0

3538486

Description: When removing LAG configuration from BlueField, a kernel warning for uverbs_destroy_ufile_hw is observed if virtio-net-controller is still running.

Workaround: Stop virtio-net-controller service before cleaning up bond configuration.

Keywords: Virtio-net; LAG

Reported in version: DOCA 2.2.0

3683801

Description: Starting from kernel 5.14, the virtio-net TX path has a logic which may trigger infinite loop when vq is broken (e.g., device is removed) under heavy traffic.

Workaround: N/A

Keyword: Virtio-net

Reported in version: DOCA 1.8.0

3714522

Description: When creating/destroying VFs back to back, make sure the virtio-net controller side does not see any alive VF before recreating them from the guest OS (i.e., virtnet query).

Workaround: N/A

Keyword: Virtio-net; VFs

Reported in version: DOCA 1.8.0

3694402

Description: When restarting the virtio-net-controller from the DPU while the guest OS is booting, the guest OS may see kernel call trace while the controller is preparing the device. It recovers once the controller starts.

Workaround: N/A

Keyword: Virtio-net; hotplug; restart

Reported in version: DOCA 1.8.0

3633453

Description: Jumbo MTU is only supported on a guest OS with kernel 4.11 and above.

Workaround: N/A

Keyword: Virtio-net; jumbo MTU

Reported in version: DOCA 1.7.0

3021967

Description: When rebooting a DPU with a large number of VFs created on host, VF recovery may fail due to timeout.

Workaround: Restart the driver on the host after the DPU is up.

Keyword: Reboot; VFs

Reported in version: DOCA 1.7.0

3232444

Description: After live migration of virtio-net devices using the VFE driver, the max_queues_size output from the virtnet list may be wrong. This does not affect the actual value.

Workaround: N/A

Keywords: Virtio-net; live migration

Reported in version: DOCA 1.4.0

2801780

Description: When running virtio-net-controller with host kernel older than 3.10.0-1160.el7, host virtio driver may get error (Unexpected TXQ (13) queue failure: -28) from dmesg in traffic stress test.

Workaround: N/A

Keywords: Virtio-net; error

Reported in version: DOCA 1.2.0

2870213

Description: Servers do not recover after configuring PCI_SWITCH_EMULATION_NUM_PORT to 32 followed by power cycle.

Workaround: Clear NVRAM and reset mlxconfig to default

Keywords: Virtio-net; power cycle

Reported in version: DOCA 1.2.0

2685191

Description: Once virtio-net is enabled, the mlx5 Windows VF becomes unavailable.

Workaround: N/A

Keywords: Virtio-net; virtual function; WinOF-2

Reported in version: DOCA 1.2.0

2702395

Description: When a device is hot-plugged from the virtio-net controller, the host OS may hang when warm reboot is performed on the host and Arm at the same time.

Workaround: Reboot the host OS first and only then reboot DPU.

Keywords: Virtio-net controller; hot-plug; reboot

Reported in version: DOCA 1.2.0

Last updated: