DOCA SDK Documentation

Bug Fixes in This Version


DOCA Framework Bug Fixes

Ref #

Issue

4220089

Description: Using dpdkvhostuser interface with OVS-DOCA causes it to crash.

Keyword: OVS

Detected in version: 2.9.0

4155959

Description: With uplinks in the br-sfc bridge, IPv6 traffic in uplink-to-uplink direction results in OVS crash resulting in complete traffic drop.

Keyword: OVS

Detected in version: 2.4.0

4268399

Description: The RX queue may exhaust its mbuf pool, leading to incorrect CQE polling that mistakenly accesses application-owned mbufs, potentially causing crashes.

Keyword: CQEs; polling; mbuf overwrite

Detected in version: 2.9.0

4224295

Description: Traffic between non-hostnetwork workloads stops after 5 minutes due to the DPU VTEP MAC address aging out in OVS.

Keyword: OVS; aging; VTEP MAC

Detected in version: 2.9.0

4200690

Description: The fTPM trusted application is signed for testing proposes only (i.e., not securely) with a development key.

Keyword: fTPM over OP-TEE

Detected in version: 2.9.1

3962272

Description: rte_eth_dev_start() performs unnecessary recreation of mlx5 control flow rules, resulting in increased delay of rte_eth_dev_start().

Keyword: Simple forward

Detected in version: 2.9.0

4130438

Description: Firefly is not compliant with "SyncE to 1pps Class B/C Transient response" while using NVIDIA® ConnectX®-7 FHHL adapter card.

Keyword: Firefly

Detected in version: 2.9.0

DOCA-Host and DOCA Drivers Bug Fixes

Ref #

Issue

4019161

Description: Increased the default TX queue length in IPoIB to enhance qdisc queueing and reduce CPU spikes.

Keyword: TX queue; qdisc; CPU

Detected in version: 2.10

4181675

Description: Fixed incorrect SA switching when multiple active TX SAs are created on a SC, caused by failing to respect the SA configured by encoding_sa.

Keyword: TX SAs

Detected in version: 2.10

4037307

Description: Fixed the receive queue cache size calculation to account for the host page size.

Keyword: Memory allocation

Detected in version: 2.10

4125071

Description: The --defltprio flag has been deprecated and removed from the mlnx_qos tool.

Keyword: mlnx_qos tool

Detected in version: 2.10

BSP Bug Fixes

Ref #

Details

4693948

Description: When attempting to install the Ubuntu 24.04 (64k kernel) BFB image directly to an EMMC device, the installation may fail with a kernel panic. The system logs indicate an inability to mount the root filesystem, specifically returning the error: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0).

Keywords: VFS; kernel panic; EMMC installation

Detected in version: 4.14.0

4923234

Description: The libdoca-sdk-sta-dev package is missing from the BFB and DOCA base images for Ubuntu 24.04. This package is required for target offload functionality.

Keywords: Target offload; BFB image

Detected in version: 4.14.0

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Keywords: BFB installer; Redfish API; task monitoring

Detected in version: 4.14.0

4863927

Description: When attempting to install development packages via dnf on Rocky Linux 9.2, users may encounter a package repository inconsistency. This version mismatch results in a dependency resolution failure, preventing the installation of the packages.

Keywords: gcc; gcc-c++; dnf

Detected in version: 4.14.0

4988092

Description: Following an out-of-the-box installation and subsequent reboot on Ubuntu 24.04 (64k kernel), the NetworkManager-wait-online.service fails to start. The system logs indicate that the service times out while waiting for network connectivity, causing the service state to be marked as failed (status=1/FAILURE).

Keywords: Network Manager; systemd; timeout

Detected in version: 4.14.0

4893340

Description: A strict 128KB maximum size limit for the bf.cfg configuration file caused deployment failures in environments requiring more extensive configurations, such as those that pass large Ignition configuration payloads via the Data Processing Framework (DPF).

Keywords: BFB installation; file size limit; bf.cfg

Detected in version: 4.14.0

4849953

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 8.

Keywords: DPA; missing package

Detected in version: 4.14.0

4871396

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 9.

Keywords: DPA; missing package

Detected in version: 4.14.0

4907434

Description: When a BMC firmware update requires activation (and is the only component pending), the doca-installer tool incorrectly advises the user to perform a Level 3 firmware reset (mlxfwreset -l 3). Executing this recommended reset will not activate the pending BMC firmware. This is a messaging error in the tool's final status summary.

Keywords: BMC firmware; pending activation

Detected in version: 4.14.0

4907646

Description: When using the doca-installer --compare command with the --psid flag to target specific devices, the tool correctly identifies and skips unsupported BlueField-2 devices but still incorrectly halts to ask, "Continue with the available devices?". This unexpected interactive prompt interrupts automated scripts that rely on the tool executing without manual intervention.

Keywords: Automation; interactive prompt

Detected in version: 4.14.0

4879150

Description: Running the mlnx-sf utility with the -e or --enable-eswitch flag causes the command to fail, returning an Unknown option "eswitch" error. This occurs because the eswitch configuration action is no longer supported by the underlying mlxdevm and devlink utilities.

Keywords: Scalable functions; mlnx-sf; eswitch; mlxdevm

Detected in version: 4.14.0

4776492

Description: Occasionally, upgrading PLDM BFB from DOCA v3.2.0 to v3.2.1 may lead to an assert 0x7 in dmesg.

Keywords: PLDM

Detected in version: 4.14.0

4949639

Description: During a BFB installation, the CEC firmware updates successfully, but the completion confirmation message is missing from the RSHIM logs. The log displays "Updating CEC firmware" but omits the final success status before moving on to the next installation step (such as updating certificates).

Keywords: RShim logs; CEC firmware

Detected in version: 4.14.0

4836088

Description: Executing bfcfg -d produces corrupt output and displays incorrect boot options. This occurs due to a parsing error when the tool attempts to convert the BOOTx_DEVPATH variables from their binary configuration format back into ASCII text.

Keywords: Secure boot; ASCII conversion; BOOTx_DEVPATH

Detected in version: 4.14.0

4839828

Description: Host tmfifo_net interfaces may intermittently acquire random MAC addresses instead of the expected 00:1a:ca:ff:ff:XX pattern. This can cause failures in environments that enforce strict MAC address validation.

Keywords: MAC address; tmfifo_net; rshim

Detected in version: 4.14.0

4924237

Description: The RShim USB device may intermittently disappear from the DPU BMC, causing operations that rely on it to fail with a "Failed to enable BMC rshim" error.

Keywords: RShim USB; out-of-band update

Detected in version: 4.14.0

4658222

Description: During the DPU boot-up sequence, an intermittent call trace containing the warning WARN_ON(!host->claimed) may appear in the system logs.

Keywords: Call trace; kernel boot up

Detected in version: 4.14.0

4604090

Description: If a corrupted or unauthenticated BFB image is transferred from the BMC to the DPU, the system halts the installation process as part of a built-in security mechanism. Once triggered, the recovery path remains locked to prevent potential compromise.

Keywords: Corrupt; BFB

Detected in version: 4.14.0

4848119

Description: A BlueField-2 UEFI boot-time regression added approximately 20 seconds to system startup.

Keywords: BlueField-2; boot time; UEFI

Detected in version: 4.14.0

4904043

Description: An intermittent firmware assert error (synd 0x7: irisc not responding) may appear in system logs (dmesg) following a PLDM firmware upgrade while the device is in NIC mode.

Keywords: PLDM

Detected in version: 4.14.0

BMC Bug Fixes

Ref #

Issue Details

4944048

Description: When upgrading or downgrading between the 25.10-LTSU2 and 26.04 releases, repeated BMC reboots may, in rare cases, cause the profile-manager service to fail due to a malformed JSON file. This failure triggers a core dump of the service, populates core dump logs, and causes the golden-image service to become unresponsive.

Workaround: Perform a factory reset on the BMC.

Keyword: BMC reboot; core dump; factory reset

Reported in version: 25.10-LTSU2

4917779

Description: Initiating an Arm GracefulReset can cause the BMC's Redfish UpdateService to incorrectly report its state as UnavailableOffline.

Reported in version: 26.01

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Reported in version: 26.01

4401488

Description: The BMC kernel enforces CONFIG_STATIC_USERMODEHELPER, which routes all usermode helper calls to /sbin/usermode-helper. Because this executable does not exist on the system, these calls fail and may cause undefined system behavior.

Reported in version: 26.01

4905017

Description: When operating in NIC mode, a host power cycle may intermittently cause the UEFI to fail to retrieve BMC Redfish credentials. This results in a DPU-BMC RF credentials not found error and introduces a timeout delay during the boot sequence.

Reported in version: 26.01

 4969243

Description: When the ENABLE_BMC_WAIT flag is active on BlueField-3 DPUs, the BMC SEL may intermittently fail to record the complete UEFI boot progress following a host power cycle. This is a cosmetic issue and does not affect the functionality of the DPU.

Reported in version: 26.01

4995032

Description: Redfish queries via OobUpdate --show_all_version intermittently returned empty strings for IPMB-backed properties (such as BOARD and BSP).

Reported in version: 26.01

 4867786

Description: During BFB installation, the Golden ARM image update may intermittently hang and fail via Redfish, logging a golden_image_arm firmware update timed out error.

Reported in version: 26.01

 4914053

Description: The BFB installer defaults to DHCP for the VLAN4040 interface. If no DHCP server is present, the request silently fails after a 300-second timeout, bypassing the static IP fallback and skipping all BMC-related firmware updates.

Reported in version: 26.01

4924426

Description: Following a DPU reset, the BaseMACBaseGUID, and Description fields may incorrectly return as empty within the redfish/v1/Systems/Bluefield/Oem/Nvidia schema response.

Reported in version: 26.01

 4987307

Description: During BFB installations via Redfish, the task state may change to "Exception" before the specific error message is appended to the HTTP response payload. This results in incomplete error logs on the initial poll following a failure.

Reported in version: 26.01

 4980118

Description: The set_emu_params.sh script attempts to load the mlxbf_ptm (DPU Power Telemetry) driver, leading to continuous module load failures and log spam because mlxbf_ptm is a non-upstreamed debug driver that is unsupported on several operating systems.

Reported in version: 26.01

 4799519

Description: Accessing the /redfish/v1/Managers/Bluefield_BMC Redfish endpoint may time out, accompanied by repeated ipmb-host timeout errors in the console and significant delays (multiple minutes) when opening the UEFI "System Configuration" page.

Reported in version: 26.01

4932328

Description: Excessive Common Platform Error Record (CPER) files in /var/cper can exhaust the BMC root partition space. This crashes the entity-manager service and causes IPMI commands like ipmitool sdr to fail.

Reported in version: 26.01

4957197

Description: When external monitoring tools or scripts repeatedly query the BMC's Redfish interface using Basic authentication over extended periods, internal session resources fail to release properly. This memory leak eventually causes the BMC to lose network connectivity, even while the DPU management interface remains online.

Reported in version: 26.01

4966472

Description: The BMC generates a warning log for PLDM_Sensor_1_100 when the NIC temperature reaches the official 91°C upper non-critical threshold. This is an expected hardware alert for elevated temperatures, not a software defect.

Reported in version: 26.01

BlueField-3 Firmware Bug Fixes

Internal Ref.

Issue

4087432

Description: Increased the RX lossless buffer size to delay the transmission of Pause/PFC frames during NIC congestion.

Keywords: RX lossless buffer size

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4184904 / 4183908

Description: Fixed an issue where the VDPA feature bits GUEST_TSO4 and GUEST_TSO6 were unexpectedly set by default, leading to traffic interruptions.

Keywords: VDPA, feature cap, GUEST_TSO4, GUEST_TSO6

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4184910

Description: Fixed an issue where enabling PCC NP and setting the link type to one port as IB and the other as Ethernet could cause an assert to appear in dmesg with ext_synd 0x8309.

Keywords: PCC NP, port type

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4206142

Description: Fixed an issue related to the warning assert 0x8a88, which occurred due to a non-harmful read of the mkey during CREATE_XRQ with RNDV type.

Keywords: Warning assert 0x8a88

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4073037

Description: An incorrect GPIO identification led to a false assumption of an overcurrent event. Fixing the GPIO definition resolved the issue.

Keywords: GPIO identification

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4220460

Description: The default-enabled MSB in pkg_id has been removed from the strap. pkg_id now supports values from 0 to 3.

Keywords: NC-SI package ID

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

3672595

Description: When using multiple links on the same PCORE, if one link goes down (e.g., due to a disconnected cable), the PCIe tree below the active link is only partially visible, with only the outer switch USP being enumerated.

Keywords: PCIe Tree

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

3920614

Description: When a QP attached to XRQ is moved to an error state via the 2ERR command, the firmware waits for requests in the device to complete before sending a new event. The software must wait for this event before proceeding with the new QP, preventing conflicting requests between the new and old QPs.

Keywords: NVMe-oF Target Offload

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

3956166

Description: Fixed an issue in the ZTR_RTTCC algorithm when using SOURCE_QP (ROCE_CC_SHAPER_COALESCE in mlxconfig) in LAG mode, which caused low bandwidth in many-to-one traffic scenarios.

Keywords: LAG, PCC, ZTR_RTTCC

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4258064

Description: When "Support separate priority configuration for RTT packets for DOCA PCC" is enabled and a QP is created after DOCA PCC starts, a fwassert will appear in dmesg, along with basic debug output, when the QP is destroyed. In addition to the fwassert, the allocated steering rules for the QP are not deallocated, leading to a resource leak.

Keywords: DOCA PCC

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

4265811

Description: Fixed an issue that caused HCA initialization to fail due to a random memory violation.

Keywords: HCA initialization

Discovered in Version:

32.43.2402

Fixed in Release:

32.44.1036

3661179

Description: Added a new mechanism for allocations and deallocations flows to enhance parallelism.

Keywords: Allocations, deallocations flows

Discovered in Version:

32.39.2048

Fixed in Release:

32.44.1036

4178900

Description: The following nvconfig settings for the Communication DPU (BF3-COM-DPU) are now set by default:

  • INTERNAL_CPU_MODEL=1

  • LAG_RESOURCE_ALLOCATION=1

  • NUM_OF_PF=0

  • CQE_COMPRESSION=1

  • NUM_OF_VFS=0

  • HAIRPIN_DATA_BUFFER_LOCK=1

  • MEMIC_SIZE_LIMIT=0

  • PCI_WR_ORDERING=1

Keywords: Communication DPU (BF3-COM-DPU), default nvconfig settings

Discovered in Version:

32.42.1000

Fixed in Release:

32.44.1036

3837255

Description: Fixed an issue with the Arm shutdown from the host OS process that resulted in the "-E- Failed to send Register MRSI" message.

Keywords: Host OS; reboot; error

Discovered in Version:

32.42.1000

Fixed in Release:

32.44.1036

BlueField-2 Firmware Bug Fixes

Internal Ref.

Issue

4206142

Description: Fixed an issue related to the warning assert 0x8a88, which occurred due to a non-harmful read of the mkey during CREATE_XRQ with RNDV type.

Keywords: Warning assert 0x8a88

Discovered in Version:

24.43.2402

Fixed in Release:

24.44.1036

4154495

Description: Fixed rare issue that caused traffic to halt and prevented recovery when the emulation doorbell malfunctioned.

Keywords: Doorbell

Discovered in Version:

24.43.2402

Fixed in Release:

24.44.1036


Last updated: