DOCA SDK Documentation

Bug Fixes in This Version

DOCA Framework Bug Fixes

Ref #

Issue

4186679

Description: Fixed an issue that could cause OVS to crash when SFLOW was enabled with OVN.

Keyword: OVS

Detected in version: 3.0.0

4242133

Description: Fixed an issue where setting hw-offload to false while ports were configured could trigger errors in the OVS logs.

Keyword: OVS-DOCA

Detected in version: 3.0.0

4313727

Description: Disabled the "mark" action in switch mode; it remains supported in VNF mode.

Keyword: DPDK

Detected in version: 3.0.0

4264397

Description: Fixed an issue where OVS did not punt IPv6 Neighbor Advertisements with unicast MACs to the CPU, preventing MAC learning for completely silent IPv6 endpoints. This caused traffic to be software forwarded until the endpoint initiated communication.

Keyword: OVS

Detected in version: 3.0.0

4452977

Description: IPSec HW offload fails when more than one decrypt rule is configured per pipe, leading to anti-replay (AR) failure or syndrome errors.

Keyword: IPSec

Detected in version: 3.0.0

4287011

Description: Disabling OVS CT (using ovs-vsctl set o . other_config:hw-offload-ct-size=0) and attempting to offload CT rules is not supported and could lead to OVS crashes.

Keyword: OVS

Detected in version: 2.10.0

4304103

Description: Fixed an issue where, in flows rewriting both inner and outer destination (dst) and/or source (src) MAC addresses to the same value, the outer MAC rewrite was skipped, resulting in an outer MAC address of 00:00:00:00:00:00.

Keyword: MAC addresses

Detected in version: 3.0.0

4340654

Description: Fixed an issue that prevented LLDP traffic from VFs or BF host PFs from passing through the representor kernel interfaces.

Keyword: LLDP

Detected in version: 3.0.0

DOCA-Host and DOCA Drivers Bug Fixes

Ref #

Issue

4385184

Description: Fixed an issue where buffer initialization became a performance bottleneck during the allocation of large buffers, typically when using a high number of QPs with large message sizes. The root cause was the inefficient use of rand(). This has been resolved by replacing it with a faster pseudo-random algorithm.

Keyword: Buffer initialization, performance

Detected in version: 3.0.0

4390560

Description: Fixed a potential deadlock that could occur during the handling of peer memory registration failures.

Keyword: Deadlock, peer memory registration

Detected in version: 3.0.0

4405229

Description: Increased the size of the slow FDB table to prevent hitting the following error when switching to SwitchDev mode.

mlx5_core 0000:03:00.0: mlx5_cmd_out_err:835:(pid 24362): CREATE_FLOW_GROUP(0x933) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x4065f0), err(-22)

mlx5_core 0000:03:00.0: E-Switch: Failed to create peer miss flow group err(-22)

Keyword: Slow FDB table

Detected in version: 3.0.0

4296889

Description: Fixed a sysfs issue that occurred when accessing hardware counters from within a namespace.

Keyword: sysfs

Detected in version: 3.0.0

4320810

Description: Fixed an issue where ibstat would fail and crash when encountering a non-RoCE/IB device, preventing it from displaying information for the remaining valid RoCE/IB devices.

Keyword: ibstat

Detected in version: 3.0.0

4358857

Description: Increased the poll batch size as the number of QPs scales up to prevent bandwidth degradation in cases of high number of QPs, where polling only 16 CQEs per iteration may not be sufficient to process all completions in time.

Keyword: ib_write_bw performance

Detected in version: 3.0.0

3868222

Description: Fixed a race condition between firmware syndrome report and driver initialization during boot.

Keyword: Race condition, firmware syndrome report, driver initialization

Detected in version: 3.0.0

4125295

Description: Fixed an issue where the driver failed to load when a firmware syndrome was detected during boot.

Keyword: Driver load

Detected in version: 3.0.0

4369312

Description: Fixed an issue where the mlnx_tune -l command did not list several operating systems that were in fact supported.

Keyword: mlnx_tune, OSes

Detected in version: 3.0.0

4172481

Description: The kernel does not define TCA_TUNNEL_KEY_ENC_SRC_PORT. To align offload behavior with non-offload, the OVS community introduced a commit [1] that causes offload to fail if this tunnel attribute is used. Now, any rule with a tunnel set action that includes a tunnel source port can no longer be offloaded.

[1] netdev-offload-tc: Fix offload of tunnel key tp_src.

Keyword: Tunnel source port offload

Detected in version: 3.0.0

4375188

Description: Upstream kernel 6.11 introduced support for encapsulation control flags, which was also added in OVS 3.5.0. However, current hardware does not support matching on these flags, such as "don't fragment" and "checksum." Since these flags can be safely ignored, we reverted upstream commit [1] as a workaround to restore tunnel offload functionality.

[1] net/mlx5e: flower: validate encapsulation control flags

Keyword: Encapsulation control flags, tunnel offload

Detected in version: 3.0.0

4340654

Description: Fixed an issue where LLDP traffic from VFs or BF host PFs was not reaching the representor kernel interfaces.

Keyword: LLDP packets

Detected in version: 2.8.0, 2.9.0

4304103

Description: Flows where both the inner and outer destination (dst) and/or source (src) MAC addresses were rewritten to the same value, the outer MAC address rewrite was ignored, leading to an outer MAC address of 00:00:00:00:00:00. This issue has been fixed.

Keyword: MAC addresses

Detected in version: 2.4.0

4264397

Description: OVS does not forward IPv6 Neighbor Advertisements with unicast destination MAC addresses to the CPU. This means the endpoint MAC address may not be learned on the VTEP if the endpoint is silent, causing traffic to be software forwarded. After the endpoint initiates traffic, it will be hardware forwarded. The issue persists only if the endpoint never initiates any traffic, only responding to IPv6 Neighbor Solicitations (rare). This issue has been fixed.

Keyword: Neighbor Advertisements, Neighbor Solicitations, OVS

Detected in version: 2.10.0

4186679

Description: Fixed an issue where enabling sFlow with OVN caused OVS to crash.

Keyword: sFlow

Detected in version: 2.9.2

4150662

Description: Fixed an issue where OVS crashed unexpectedly after DPUs repeatedly broadcast the error message “packet with own source address.”

Keyword: OVS, DPUs

Detected in version: 2.7

4242133

Description: Fixed an issue where changing the hw-offload setting from true to false while ports are configured could lead to errors reported in the OVS log.

Keyword: OVS

Detected in version: 2.10

BSP Bug Fixes

Ref #

Details

4693948

Description: When attempting to install the Ubuntu 24.04 (64k kernel) BFB image directly to an EMMC device, the installation may fail with a kernel panic. The system logs indicate an inability to mount the root filesystem, specifically returning the error: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0).

Keywords: VFS; kernel panic; EMMC installation

Detected in version: 4.14.0

4923234

Description: The libdoca-sdk-sta-dev package is missing from the BFB and DOCA base images for Ubuntu 24.04. This package is required for target offload functionality.

Keywords: Target offload; BFB image

Detected in version: 4.14.0

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Keywords: BFB installer; Redfish API; task monitoring

Detected in version: 4.14.0

4863927

Description: When attempting to install development packages via dnf on Rocky Linux 9.2, users may encounter a package repository inconsistency. This version mismatch results in a dependency resolution failure, preventing the installation of the packages.

Keywords: gcc; gcc-c++; dnf

Detected in version: 4.14.0

4988092

Description: Following an out-of-the-box installation and subsequent reboot on Ubuntu 24.04 (64k kernel), the NetworkManager-wait-online.service fails to start. The system logs indicate that the service times out while waiting for network connectivity, causing the service state to be marked as failed (status=1/FAILURE).

Keywords: Network Manager; systemd; timeout

Detected in version: 4.14.0

4893340

Description: A strict 128KB maximum size limit for the bf.cfg configuration file caused deployment failures in environments requiring more extensive configurations, such as those that pass large Ignition configuration payloads via the Data Processing Framework (DPF).

Keywords: BFB installation; file size limit; bf.cfg

Detected in version: 4.14.0

4849953

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 8.

Keywords: DPA; missing package

Detected in version: 4.14.0

4871396

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 9.

Keywords: DPA; missing package

Detected in version: 4.14.0

4907434

Description: When a BMC firmware update requires activation (and is the only component pending), the doca-installer tool incorrectly advises the user to perform a Level 3 firmware reset (mlxfwreset -l 3). Executing this recommended reset will not activate the pending BMC firmware. This is a messaging error in the tool's final status summary.

Keywords: BMC firmware; pending activation

Detected in version: 4.14.0

4907646

Description: When using the doca-installer --compare command with the --psid flag to target specific devices, the tool correctly identifies and skips unsupported BlueField-2 devices but still incorrectly halts to ask, "Continue with the available devices?". This unexpected interactive prompt interrupts automated scripts that rely on the tool executing without manual intervention.

Keywords: Automation; interactive prompt

Detected in version: 4.14.0

4879150

Description: Running the mlnx-sf utility with the -e or --enable-eswitch flag causes the command to fail, returning an Unknown option "eswitch" error. This occurs because the eswitch configuration action is no longer supported by the underlying mlxdevm and devlink utilities.

Keywords: Scalable functions; mlnx-sf; eswitch; mlxdevm

Detected in version: 4.14.0

4776492

Description: Occasionally, upgrading PLDM BFB from DOCA v3.2.0 to v3.2.1 may lead to an assert 0x7 in dmesg.

Keywords: PLDM

Detected in version: 4.14.0

4949639

Description: During a BFB installation, the CEC firmware updates successfully, but the completion confirmation message is missing from the RSHIM logs. The log displays "Updating CEC firmware" but omits the final success status before moving on to the next installation step (such as updating certificates).

Keywords: RShim logs; CEC firmware

Detected in version: 4.14.0

4836088

Description: Executing bfcfg -d produces corrupt output and displays incorrect boot options. This occurs due to a parsing error when the tool attempts to convert the BOOTx_DEVPATH variables from their binary configuration format back into ASCII text.

Keywords: Secure boot; ASCII conversion; BOOTx_DEVPATH

Detected in version: 4.14.0

4839828

Description: Host tmfifo_net interfaces may intermittently acquire random MAC addresses instead of the expected 00:1a:ca:ff:ff:XX pattern. This can cause failures in environments that enforce strict MAC address validation.

Keywords: MAC address; tmfifo_net; rshim

Detected in version: 4.14.0

4924237

Description: The RShim USB device may intermittently disappear from the DPU BMC, causing operations that rely on it to fail with a "Failed to enable BMC rshim" error.

Keywords: RShim USB; out-of-band update

Detected in version: 4.14.0

4658222

Description: During the DPU boot-up sequence, an intermittent call trace containing the warning WARN_ON(!host->claimed) may appear in the system logs.

Keywords: Call trace; kernel boot up

Detected in version: 4.14.0

4604090

Description: If a corrupted or unauthenticated BFB image is transferred from the BMC to the DPU, the system halts the installation process as part of a built-in security mechanism. Once triggered, the recovery path remains locked to prevent potential compromise.

Keywords: Corrupt; BFB

Detected in version: 4.14.0

4848119

Description: A BlueField-2 UEFI boot-time regression added approximately 20 seconds to system startup.

Keywords: BlueField-2; boot time; UEFI

Detected in version: 4.14.0

4904043

Description: An intermittent firmware assert error (synd 0x7: irisc not responding) may appear in system logs (dmesg) following a PLDM firmware upgrade while the device is in NIC mode.

Keywords: PLDM

Detected in version: 4.14.0

BMC Bug Fixes

No bug fixes in this release.

BlueField-3 Firmware Bug Fixes

Internal Ref.

Issue

4241238

Description: Fixed TX timeout issue related to the esw_scheduling QoS feature.

Keywords: esw_scheduling QoS

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4392587

Description: Adjusted the temperature sensors array size to match the number of sensors defined in the INI file.

Keywords: Temperature sensors

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4318537

Description: Fixed an issue where the AI and HAI parameters of the ZTR_RTTCC algorithm, when configured by users, were automatically overwritten upon link speed changes. With this fix, if AI/HAI values were tuned for link speeds other than 100Gb/s, users should now divide those values by (link_speed / 100) to maintain consistent congestion control algorithm behavior.

Keywords: Congestion control, ZTR_RTTCC

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4368450

Description: Fixed an issue where PCC_CNP_COUNT could not be reset using the pcc_counter.sh script in the DOCA tools.

Keywords: PCC

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4360191

Description: Fixed an issue where the CMDIF MNVDA could cause the NV Config mechanism to become stuck when the BMC enables Self Recovery mode.

Keywords: NV Config

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4346657

Description: Fixed a firmware issue to ensure that typical PPCC access register failures in DOCA PCC are no longer silently ignored. Users will now receive a syndrome notification when executing the command.

Keywords: DOCA PCC

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4257863

Description: Fixed an issue that could cause the DESTROY_MKEY command to take an excessively long time to execute, with the host driver displaying a "No done completion" message for this command.

Keywords: MKey

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4366438

Description: Fixed TX timeout issue when eSwitch scheduling is enabled and a rate limit is applied.

Keywords: TX timeout

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4370796

Description: Fixed an issue where the firmware could send an incorrect object_id in the device emulation object change event, causing the virtio-net controller to fail to respond to operations on the virtio device on the host. This issue commonly occurred after a software live upgrade when numerous events needed to be reported simultaneously (e.g., unbinding drivers on VFs in parallel).

Keywords: virtio-net controller

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

4345431

Description: Fixed high latency observed in IB_READ_LATACNY when eswitch scheduling is enabled and rate limit is set.

Keywords: Data latency

Discovered in Version:

32.43.1014

Fixed in Release:

32.45.1020

3878086

Description: Congestion Control counters such as ECN and CNP will now be the sum of both ports when in LAG mode.

Keywords: Congestion Control counters

Discovered in Version:

32.42.1000

Fixed in Release:

32.45.1020

4199274

Description: Fixed an issue where RTT packets with any destination MAC address were incorrectly treated as having a valid destination MAC. The new firmware now discards RTT packets if their destination MAC does not match the port's MAC.

Keywords: RTT, destination MAC

Discovered in Version:

32.44.1036

Fixed in Release:

32.45.1020

BlueField-2 Firmware Bug Fixes

Internal Ref.

Issue

2899026 / 2853408


Description: Some pre-OS environments may fail when sensing a hot plug operation during their boot stage.

Keywords: BIOS; Hot plug; Virtio-net

Discovered in Version: 

24.33.1048

Fixed in Release:

24.45.1020

3296463




Description: fwreset is currently supported on PCI Gen 4 devices only.

Keywords: fwreset, PCI Gen4

Discovered in Version:

24.37.1300

Fixed in Release:

24.45.1020

3457472


Description: Disabling the Relaxed Ordered (RO) capability (relaxed_ordering_read_pci_enabled=0) using the vhca_resource_manager is currently not functional. 

Keywords: Relaxed Ordered

Discovered in Version:

24.37.1300

Fixed in Release:

24.45.1020

2169950





Description: When decapsulation on a packet occurs, the FCS indication is not calculated correctly.

Keywords: FCS

Discovered in Version:

24.42.1000

Fixed in Release:

24.45.1020

3638554

Description: Fixed an issue where, if the summary queue size on initiators exceeds the SRQ size on the NVMe-oF target, RNR NACKs are triggered. The Congestion Control (CC) mechanism significantly reduces the rate in response to the presence of RNR, leading to a substantial drop in bandwidth during NVMe WRITE operations and mixed tests.

Keywords: NVMe-oF target, RNR NACKs, Congestion Control (CC)

Discovered in Version:

24.43.2402

Fixed in Release:

24.45.1020

4262272

Description: Fixed an issue where the query_hca_cap timing could be increased on certain BlueField-2 systems.

Keywords: query_hca_cap timing

Discovered in Version:

24.43.2402

Fixed in Release:

24.45.1020


Last updated: