DOCA SDK Documentation

Bug Fixes in This Version

DOCA Framework Bug Fixes

Ref #

Issue

4426511

Description: Orchestrated reset mode (MLXConfig) will be released as a Beta feature. There's a known race condition between server reboot and the reset flow running in parallel, which can cause the reset to go out of sync. 

Keyword: Orchestrated reset mode

Detected in version: 3.0.0

2657392

Description: OFED installation caused CIFS to break in RHEL 8.4 and above. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL 8.4 and above.

Keyword: Installation; CIFS

Detected in version: 3.0.0

4374396

Description: Ingress mirroring rules configured on OVS-DOCA are not offloaded to hardware when using remote GRE tunnels.

Keyword: Mirroring

Detected in version: 3.0.0

4464648

Description: When the server crashes, the client’s Comch Producer may call the send error callback twice for the same task, potentially leading to buffer reference count errors.

Keyword: DOCA Comch; duplicate callback; buffer refcount error

Detected in version: 3.0.0

4454054

Description: DMS pod requires "Linux is up" from bfup.service twice to proceed, forcing repeated bfup runs.

Keyword: Installation

Detected in version: 3.0.0

4255270

Description: Packets are encapsulated with standard VXLAN headers even when VXLAN-GBP is configured, resulting in the Group Policy ID (GBP) not being applied.

Keyword: VXLAN; packet encapsulation

Detected in version: 2.10.0

4263035

Description: In L3 EVPN scenarios with 16k overlay and 4k underlay routes, OVS may get stuck or abnormally terminate. 

Keyword: HBN

Detected in version: 2.10.0

3851200

Description: Once PPS is enabled, there is no way to disable it via FireFly commands.

Keyword: PTP

Detected in version: 2.7.0

DOCA-Host and DOCA Drivers Bug Fixes

Ref #

Issue

4404290

Description: Fixed a crash triggered by handling multiple CMA net events in rapid succession on the same CMA ID.

Keyword: CMA

Detected in version: 3.0.0

4500815

Description: Fixed an issue that caused packet loss when enabling or disabling promiscuous mode on a network interface.

Keyword: Promiscuous mode

Detected in version: 3.0.0

4514994

Description: Fixed performance degradation on older kernel versions using RX cache, particularly on slower ARM CPUs with larger RX buffers. The issue was caused by the driver attempting to allocate new RX pages too quickly, leading to head-of-line blocking in the RX cache. The fix improves RX cache usage by triggering page allocation for a bulk of at least 2 WQEs, allowing the application more time to process packets and return buffers to the RX cache, thereby reducing blocking and enhancing performance.

Keyword: Performance, kernel, Rx cache, page allocation

Detected in version: 3.0.0

4504899

Description: Fixed behavior to align with /sys/class/net/<interface-name>/device/sriov_numvfs: silently ignore attempts to set the same number of VFs, and prevent changing the number of VFs until existing VFs are removed.

Keyword: VFs

Detected in version: 3.0.0

3680538

Description: When using strongSwan or OVS-IPsec as explained in the NVIDIA BlueField DPU BSP, the IPSec Rx datapath is not offloaded to hardware and occurs in software running on the Arm cores. As a result, bandwidth performance is substantially low.

Keyword: IPsec

Detected in version: 3.0.0

4448262

Description: Fixed an issue where a kernel crash could occur if a device event arrives during the event subscription process.

Keyword: devx; event_fd

Detected in version: 3.0.0

4449477

Description: On BlueField-3 devices running linux-bluefield kernel versions 5.15.0-1050 or 5.15.0-1060, a kernel crash may occur due to a NULL pointer dereference in the cls_api network scheduler.

Keyword: Kernel crash

Detected in version: 3.0.0

BSP Bug Fixes

Ref #

Details

4693948

Description: When attempting to install the Ubuntu 24.04 (64k kernel) BFB image directly to an EMMC device, the installation may fail with a kernel panic. The system logs indicate an inability to mount the root filesystem, specifically returning the error: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0).

Keywords: VFS; kernel panic; EMMC installation

Detected in version: 4.14.0

4923234

Description: The libdoca-sdk-sta-dev package is missing from the BFB and DOCA base images for Ubuntu 24.04. This package is required for target offload functionality.

Keywords: Target offload; BFB image

Detected in version: 4.14.0

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Keywords: BFB installer; Redfish API; task monitoring

Detected in version: 4.14.0

4863927

Description: When attempting to install development packages via dnf on Rocky Linux 9.2, users may encounter a package repository inconsistency. This version mismatch results in a dependency resolution failure, preventing the installation of the packages.

Keywords: gcc; gcc-c++; dnf

Detected in version: 4.14.0

4988092

Description: Following an out-of-the-box installation and subsequent reboot on Ubuntu 24.04 (64k kernel), the NetworkManager-wait-online.service fails to start. The system logs indicate that the service times out while waiting for network connectivity, causing the service state to be marked as failed (status=1/FAILURE).

Keywords: Network Manager; systemd; timeout

Detected in version: 4.14.0

4893340

Description: A strict 128KB maximum size limit for the bf.cfg configuration file caused deployment failures in environments requiring more extensive configurations, such as those that pass large Ignition configuration payloads via the Data Processing Framework (DPF).

Keywords: BFB installation; file size limit; bf.cfg

Detected in version: 4.14.0

4849953

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 8.

Keywords: DPA; missing package

Detected in version: 4.14.0

4871396

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 9.

Keywords: DPA; missing package

Detected in version: 4.14.0

4907434

Description: When a BMC firmware update requires activation (and is the only component pending), the doca-installer tool incorrectly advises the user to perform a Level 3 firmware reset (mlxfwreset -l 3). Executing this recommended reset will not activate the pending BMC firmware. This is a messaging error in the tool's final status summary.

Keywords: BMC firmware; pending activation

Detected in version: 4.14.0

4907646

Description: When using the doca-installer --compare command with the --psid flag to target specific devices, the tool correctly identifies and skips unsupported BlueField-2 devices but still incorrectly halts to ask, "Continue with the available devices?". This unexpected interactive prompt interrupts automated scripts that rely on the tool executing without manual intervention.

Keywords: Automation; interactive prompt

Detected in version: 4.14.0

4879150

Description: Running the mlnx-sf utility with the -e or --enable-eswitch flag causes the command to fail, returning an Unknown option "eswitch" error. This occurs because the eswitch configuration action is no longer supported by the underlying mlxdevm and devlink utilities.

Keywords: Scalable functions; mlnx-sf; eswitch; mlxdevm

Detected in version: 4.14.0

4776492

Description: Occasionally, upgrading PLDM BFB from DOCA v3.2.0 to v3.2.1 may lead to an assert 0x7 in dmesg.

Keywords: PLDM

Detected in version: 4.14.0

4949639

Description: During a BFB installation, the CEC firmware updates successfully, but the completion confirmation message is missing from the RSHIM logs. The log displays "Updating CEC firmware" but omits the final success status before moving on to the next installation step (such as updating certificates).

Keywords: RShim logs; CEC firmware

Detected in version: 4.14.0

4836088

Description: Executing bfcfg -d produces corrupt output and displays incorrect boot options. This occurs due to a parsing error when the tool attempts to convert the BOOTx_DEVPATH variables from their binary configuration format back into ASCII text.

Keywords: Secure boot; ASCII conversion; BOOTx_DEVPATH

Detected in version: 4.14.0

4839828

Description: Host tmfifo_net interfaces may intermittently acquire random MAC addresses instead of the expected 00:1a:ca:ff:ff:XX pattern. This can cause failures in environments that enforce strict MAC address validation.

Keywords: MAC address; tmfifo_net; rshim

Detected in version: 4.14.0

4924237

Description: The RShim USB device may intermittently disappear from the DPU BMC, causing operations that rely on it to fail with a "Failed to enable BMC rshim" error.

Keywords: RShim USB; out-of-band update

Detected in version: 4.14.0

4658222

Description: During the DPU boot-up sequence, an intermittent call trace containing the warning WARN_ON(!host->claimed) may appear in the system logs.

Keywords: Call trace; kernel boot up

Detected in version: 4.14.0

4604090

Description: If a corrupted or unauthenticated BFB image is transferred from the BMC to the DPU, the system halts the installation process as part of a built-in security mechanism. Once triggered, the recovery path remains locked to prevent potential compromise.

Keywords: Corrupt; BFB

Detected in version: 4.14.0

4848119

Description: A BlueField-2 UEFI boot-time regression added approximately 20 seconds to system startup.

Keywords: BlueField-2; boot time; UEFI

Detected in version: 4.14.0

4904043

Description: An intermittent firmware assert error (synd 0x7: irisc not responding) may appear in system logs (dmesg) following a PLDM firmware upgrade while the device is in NIC mode.

Keywords: PLDM

Detected in version: 4.14.0

BMC Bug Fixes

Ref #

Issue Details

4944048

Description: When upgrading or downgrading between the 25.10-LTSU2 and 26.04 releases, repeated BMC reboots may, in rare cases, cause the profile-manager service to fail due to a malformed JSON file. This failure triggers a core dump of the service, populates core dump logs, and causes the golden-image service to become unresponsive.

Workaround: Perform a factory reset on the BMC.

Keyword: BMC reboot; core dump; factory reset

Reported in version: 25.10-LTSU2

4917779

Description: Initiating an Arm GracefulReset can cause the BMC's Redfish UpdateService to incorrectly report its state as UnavailableOffline.

Reported in version: 26.01

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Reported in version: 26.01

4401488

Description: The BMC kernel enforces CONFIG_STATIC_USERMODEHELPER, which routes all usermode helper calls to /sbin/usermode-helper. Because this executable does not exist on the system, these calls fail and may cause undefined system behavior.

Reported in version: 26.01

4905017

Description: When operating in NIC mode, a host power cycle may intermittently cause the UEFI to fail to retrieve BMC Redfish credentials. This results in a DPU-BMC RF credentials not found error and introduces a timeout delay during the boot sequence.

Reported in version: 26.01

 4969243

Description: When the ENABLE_BMC_WAIT flag is active on BlueField-3 DPUs, the BMC SEL may intermittently fail to record the complete UEFI boot progress following a host power cycle. This is a cosmetic issue and does not affect the functionality of the DPU.

Reported in version: 26.01

4995032

Description: Redfish queries via OobUpdate --show_all_version intermittently returned empty strings for IPMB-backed properties (such as BOARD and BSP).

Reported in version: 26.01

 4867786

Description: During BFB installation, the Golden ARM image update may intermittently hang and fail via Redfish, logging a golden_image_arm firmware update timed out error.

Reported in version: 26.01

 4914053

Description: The BFB installer defaults to DHCP for the VLAN4040 interface. If no DHCP server is present, the request silently fails after a 300-second timeout, bypassing the static IP fallback and skipping all BMC-related firmware updates.

Reported in version: 26.01

4924426

Description: Following a DPU reset, the BaseMACBaseGUID, and Description fields may incorrectly return as empty within the redfish/v1/Systems/Bluefield/Oem/Nvidia schema response.

Reported in version: 26.01

 4987307

Description: During BFB installations via Redfish, the task state may change to "Exception" before the specific error message is appended to the HTTP response payload. This results in incomplete error logs on the initial poll following a failure.

Reported in version: 26.01

 4980118

Description: The set_emu_params.sh script attempts to load the mlxbf_ptm (DPU Power Telemetry) driver, leading to continuous module load failures and log spam because mlxbf_ptm is a non-upstreamed debug driver that is unsupported on several operating systems.

Reported in version: 26.01

 4799519

Description: Accessing the /redfish/v1/Managers/Bluefield_BMC Redfish endpoint may time out, accompanied by repeated ipmb-host timeout errors in the console and significant delays (multiple minutes) when opening the UEFI "System Configuration" page.

Reported in version: 26.01

4932328

Description: Excessive Common Platform Error Record (CPER) files in /var/cper can exhaust the BMC root partition space. This crashes the entity-manager service and causes IPMI commands like ipmitool sdr to fail.

Reported in version: 26.01

4957197

Description: When external monitoring tools or scripts repeatedly query the BMC's Redfish interface using Basic authentication over extended periods, internal session resources fail to release properly. This memory leak eventually causes the BMC to lose network connectivity, even while the DPU management interface remains online.

Reported in version: 26.01

4966472

Description: The BMC generates a warning log for PLDM_Sensor_1_100 when the NIC temperature reaches the official 91°C upper non-critical threshold. This is an expected hardware alert for elevated temperatures, not a software defect.

Reported in version: 26.01

BlueField-3 Firmware Bug Fixes

Internal Ref.

Issue

4501157 / 4257750

Description: Fixed a critical issue with a live firmware patch.

Keywords: Live firmware patch

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4377816

Description: Fixed an issue where firmware did not de-assert the PERST of the DSP on pcore1. The fix updates the check to correctly interpret the default GPIO mapping value as 0xFFF (NO_GPIO_FUNCTION) instead of 0xFF (INVALID_READ).

Keywords: mlxconfig

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4286902

Description: Fixed a race condition in DPA process termination during the exception flow, where a failed process could be missed and not reported to the user.

Keywords: DPA

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4420567

Description: Removed an unnecessary and partially incorrect firmware check that blocked valid action list permutations allowed by the PRM. Validation of these permutations remains the responsibility of the software.

Keywords: Header actions

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4498670

Description: Fixed a race condition where destroying two emulation objects with the same VHCA ID could result in one destroy command failing with syndrome 0xF3F880.

Keywords: VirtIO

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4475307

Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher.

Keywords: PCC DCQCN, congestion control.

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4443601

Description: Fixed a firmware issue where PXE failed to boot when both LAG ports were up.

Keywords: PXE, LAG

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4480427

Description: Fixed incorrect calculation of start address and mode for the CQE buffer in DPA CQ, which could cause CQEs to be written to the wrong address when the buffer is not 4K-aligned and spans a second page boundary.

Keywords: CQ, CQE Buffer, DPA

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4388371

Description: Fixed an issue where an uninitialized pport in the SLRG command, when using the SMP interface, caused an assertion failure.

Keywords: SLRG, SMP interface, pport

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4395036

Description: Fixed a race condition between firmware and hardware flows during QP closure.

Keywords: Race condition

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4428580

Description: Fixed a rare issue where triggering mstdump via core_dump in Windows drivers could cause a PCI link down condition.

Keywords: mstdump, windows

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4428580

Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages.

Keywords: vQoS, latency

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4436922

Description: DC InfiniBand is not functional in this firmware version.

Keywords: DC, DDP traffic

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4366117

Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. 

Keywords: PXE boot filters

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4475307

Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher.

Keywords: PCC DCQCN, congestion control.

Detected in version:

32.45.1020

Fixed in Release: 

32.46.1006

4486431

Description: Fixed an issue where issuing multiple parallel queries of DPA_THREAD objects with the same object ID could fail.

Keywords: DPA

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

4470053

Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages.

Keywords: vQoS, latency

Discovered in Version:

32.45.1020

Fixed in Release:

32.46.1006

BlueField-2 Firmware Bug Fixes

Internal Ref.

Issue

4366117

Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. 

Keywords: PXE boot filters

Discovered in Version: 

24.45.1020

Fixed in Release:

24.46.1006


Last updated: