DOCA SDK Documentation

Bug Fixes in This Version

DOCA Framework Bug Fixes

Ref #

Issue

4469496

Description: On some environments, when an application deletes all flow rules from a template table and then attempted to read flow rules to the same table, an error with fail to create rte flow is raised.

Keyword: Flow rules

Detected in version: 2.9.2

4391384

Description: The UCX package is built without GDR copy support due to an unintended change in the build system that excluded CUDA. As a result, applications relying on GDR functionality in UCX are unable to use it.

Keyword: GDR support; CUDA missing; build environment

Detected in version: 2.9.2

4403063

Description: When the package mlnx-fw-updater is installed, it runs its firmware loading script. That script will automatically try to start the MFT kernel support as part of its hardware scan loop. This may cause an issue on some devices.

Keyword: MFT; firmware

Detected in version: 2.9.2

4259675

Description: In rare cases, systems using shared receive queues (shared_rxq) may experience incorrect packet handling during high-throughput traffic.

Keyword: Shared RXQ; packet corruption; routing error

Detected in version: 2.9.2

4410028

Description: On SLES 15 SP5 with kernel version 5.14.21-150500.55.68-default or later, installation of mlnx-ofa_kernel drivers fails to use weak-modules, causing the system to fall back to inbox OFED modules. This occurs because the kernel used to build the drivers (5.14.21-150500.53-default) did not include the mana_ib driver, while newer kernels do—triggering a weak-modules sanity check failure due to the missing replacement.

Keyword: Weak modules; Kernel version mismatch; inbox driver conflict

Detected in version: 2.9.0

BSP Bug Fixes

Ref #

Issue Description

4403055

Description: Repeated power cycles cause corruption in the EXT4 file system.

Keywords: Power cycle; FS corruption

Reported in version: 

BMC Bug Fixes

Ref #

Issue Details

4944048

Description: When upgrading or downgrading between the 25.10-LTSU2 and 26.04 releases, repeated BMC reboots may, in rare cases, cause the profile-manager service to fail due to a malformed JSON file. This failure triggers a core dump of the service, populates core dump logs, and causes the golden-image service to become unresponsive.

Workaround: Perform a factory reset on the BMC.

Keyword: BMC reboot; core dump; factory reset

Reported in version: 25.10-LTSU2

4917779

Description: Initiating an Arm GracefulReset can cause the BMC's Redfish UpdateService to incorrectly report its state as UnavailableOffline.

Reported in version: 26.01

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Reported in version: 26.01

4401488

Description: The BMC kernel enforces CONFIG_STATIC_USERMODEHELPER, which routes all usermode helper calls to /sbin/usermode-helper. Because this executable does not exist on the system, these calls fail and may cause undefined system behavior.

Reported in version: 26.01

4905017

Description: When operating in NIC mode, a host power cycle may intermittently cause the UEFI to fail to retrieve BMC Redfish credentials. This results in a DPU-BMC RF credentials not found error and introduces a timeout delay during the boot sequence.

Reported in version: 26.01

 4969243

Description: When the ENABLE_BMC_WAIT flag is active on BlueField-3 DPUs, the BMC SEL may intermittently fail to record the complete UEFI boot progress following a host power cycle. This is a cosmetic issue and does not affect the functionality of the DPU.

Reported in version: 26.01

4995032

Description: Redfish queries via OobUpdate --show_all_version intermittently returned empty strings for IPMB-backed properties (such as BOARD and BSP).

Reported in version: 26.01

 4867786

Description: During BFB installation, the Golden ARM image update may intermittently hang and fail via Redfish, logging a golden_image_arm firmware update timed out error.

Reported in version: 26.01

 4914053

Description: The BFB installer defaults to DHCP for the VLAN4040 interface. If no DHCP server is present, the request silently fails after a 300-second timeout, bypassing the static IP fallback and skipping all BMC-related firmware updates.

Reported in version: 26.01

4924426

Description: Following a DPU reset, the BaseMACBaseGUID, and Description fields may incorrectly return as empty within the redfish/v1/Systems/Bluefield/Oem/Nvidia schema response.

Reported in version: 26.01

 4987307

Description: During BFB installations via Redfish, the task state may change to "Exception" before the specific error message is appended to the HTTP response payload. This results in incomplete error logs on the initial poll following a failure.

Reported in version: 26.01

 4980118

Description: The set_emu_params.sh script attempts to load the mlxbf_ptm (DPU Power Telemetry) driver, leading to continuous module load failures and log spam because mlxbf_ptm is a non-upstreamed debug driver that is unsupported on several operating systems.

Reported in version: 26.01

 4799519

Description: Accessing the /redfish/v1/Managers/Bluefield_BMC Redfish endpoint may time out, accompanied by repeated ipmb-host timeout errors in the console and significant delays (multiple minutes) when opening the UEFI "System Configuration" page.

Reported in version: 26.01

4932328

Description: Excessive Common Platform Error Record (CPER) files in /var/cper can exhaust the BMC root partition space. This crashes the entity-manager service and causes IPMI commands like ipmitool sdr to fail.

Reported in version: 26.01

4957197

Description: When external monitoring tools or scripts repeatedly query the BMC's Redfish interface using Basic authentication over extended periods, internal session resources fail to release properly. This memory leak eventually causes the BMC to lose network connectivity, even while the DPU management interface remains online.

Reported in version: 26.01

4966472

Description: The BMC generates a warning log for PLDM_Sensor_1_100 when the NIC temperature reaches the official 91°C upper non-critical threshold. This is an expected hardware alert for elevated temperatures, not a software defect.

Reported in version: 26.01

BlueField-3 Firmware Bug Fixes

Internal Ref.

Issue

4422979

Description: Fixed a rare case causing PCIe failure after a power cycle.

Keywords: PCIe

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4388371

Description: Fixed an issue where an uninitialized pport in the SLRG command, when using the SMP interface, caused an assertion failure.

Keywords: SLRG, SMP interface, pport

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4486422

Description: Fixed an issue where PCIe errors from the endpoint were incorrectly reported to RAS even when they were not reported to the host, ensuring compliance with PCIe spec 6.2.5 (Sequence of Device Error Signaling and Logging Operations).

Keywords: PCIe

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4364539

Description: Fixed a race condition in which issuing a reset command to the NIC while the flash is in suspend mode caused the NIC to reboot without recognizing that the flash was still suspended.

Keywords: PRS

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4444987

Description: Removed from the relevant PRS the incorrect INI configuration that skipped receiver detection.

Keywords: PRS

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4427796

Description: Enabled MCTP communication with the DPU BMC on SKUs: 900-9D3C6-00SV-DA0 and 900-9D3C6-B9SV-DA0.

Keywords: MCTP communication, DPU BMC

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4438736

Description: Fixed a race condition between firmware and hardware flows during QP closure and a potential endless loop.

Keywords: Race; endless loop

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4470567

Description: Modified the VQoS parameter configuration to improve latency for large messages.

Keywords: VQoS, latency improvement

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4443919 / 4395036

Description: Fixed a race condition between firmware and hardware flows during QP closure.

Keywords: Race condition

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4355566

Description: Fixed high latency observed in IB_READ_LATACNY when eswitch scheduling is enabled and rate limit is set.

Keywords: Data latency

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4444874

Description: Fixed an issue where the firmware failed to de-assert the PERST signal of the DSP on pcore1. The fix involved correctly checking the output of the default GPIO mapping against 0xFFF (NO_GPIO_FUNCTION) instead of 0xFF (INVALID_READ).

Keywords: PERST signal

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4234972

Description: Fixed an issue where the isr_distributer, responsible for distributing tokens to SQs, was not being triggered reliably every 100 µs. Its priority has been elevated to HIGH, and it is now marked as 'busy' upon completion to ensure consistent and timely execution.

Keywords: VQoS

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4384412

Description: Fixed an issue where the firmware could send an incorrect object_id in the device emulation object change event, causing the virtio-net controller to fail in handling operations on the host's virtio device. This typically occurred after a software live upgrade when many events were triggered simultaneously—such as unbinding drivers on VFs in parallel—and could result in a host hang.

Keywords: Device emulation object change event

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4344710

Description: The enabled by default MSB bit in pkg_id has been removed from the strap. pkg_id now correctly supports values in the range 0 to 3.

Keywords: NC-SI package ID

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4330201

Description: Fixed an issue that prevented the OS from booting due to UEFI PCI enumeration.

Keywords: Booting

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4283167

Description: Fixed an issue in the VQoS algorithm related to learning when an element is active and when it begins sending traffic.

Keywords: VQoS algorithm

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4283168

Description: Resolved higher latency issue when enabling VF group rate limiter (ESW scheduling).

Keywords: Rate limiter

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4361277

Description: Fixed an issue in the ZTR_RTTCC algorithm when using SOURCE_QP (ROCE_CC_SHAPER_COALESCE in mlxconfig) in LAG mode, which caused low bandwidth in many-to-one traffic scenarios.

Keywords: LAG, PCC, ZTR_RTTCC

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4403151

Description: Fixed an issue that caused reduced bandwidth during the initial traffic phase when the lossy ADP retransmission feature was enabled alongside the DCQCN congestion control algorithm, due to a low ACK timeout making ADP retransmissions overly aggressive.

Keywords: Lossy ADP retransmission, Congestion Control

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4444306

Description: Fixed an issue where transitioning a QP attached to an XRQ to the error state using the 2ERR command could lead to request conflicts. The firmware now properly waits for all in-flight requests to complete before issuing a new event, ensuring the software can safely proceed with initializing a new QP.

Keywords: NVMe-oF Target Offload

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4336970

Description: Reduced the bandwidth fluctuation induced by VQoS rate limiting in systems with bellow 350 QPs. This change is enabled by default.

Keywords: VQoS

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4336965

Description: Adjusted the RX lossless buffer default parameters to delay transmission of Pause/PFC frames when the NIC is congested. Rx lossless buffer parameters will now be enabled by default.

Keywords:

RX lossless buffer size

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

4361179

Description: Fixed an issue that caused bandwidth to drop when unbinding multiple VFs with VQoS enabled.

Keywords: VQoS

Discovered in Version:

32.43.2566

Fixed in Release:

32.43.3608

BlueField-2 Firmware Bug Fixes

Internal Ref.

Issue

4342749

Description: Fixed an issue where, if the summary queue size on initiators exceeds the SRQ size on the NVMe-oF target, RNR NACKs are triggered. The Congestion Control (CC) mechanism significantly reduces the rate in response to the presence of RNR, leading to a substantial drop in bandwidth during NVMe WRITE operations and mixed tests.

Keywords: NVMe-oF target, RNR NACKs, Congestion Control (CC)

Discovered in Version:

24.43.2566

Fixed in Release:

24.43.3608

4358188

Description: Fixed an issue where enabling DIM could lead to high IRQ/s in certain scenarios.

Keywords: vDPA, DIM

Discovered in Version:

24.43.2566

Fixed in Release:

24.43.3608

4355566

Description: Fixed high latency observed in IB_READ_LATACNY when eswitch scheduling is enabled and rate limit is set.

Keywords: Data latency

Discovered in Version:

24.43.2566

Fixed in Release:

24.43.3608

Last updated: