DOCA SDK Documentation

Bug Fixes in This Version

DOCA Framework Bug Fixes

Ref #

Issue Details

4888526

Description: In multiport configurations deployed with two concurrent tunnels (such as a vxlan_vxlan topology), TCP traffic and occasionally ICMP (ping) traffic consistently fails on the DPU.

Keyword: OVS-DOCA; multiport

Detected in version: 3.3.0

4946525

Description: When configuring the PPCC register via mlxreg/mlxconfig on ConnectX-8 and ConnectX-9 adapters, setting the counter_group_id back to 0 on an active slot (such as algo_slot 0) fails to take effect.

Keyword: Port Programmable Congestion Control

Detected in version: 3.3.0

4922195

Description: High CPU utilization is observed in doca_spcx_cc caused by the FlexIO dtrace subsystem busy-polling in the host.

Keyword: High CPU Utilization; Congestion Control Daemon

Detected in version: 3.3.0

4834014

Description: In the PCC NP switch telemetry module, the doca_pcc_dev_np_user_packet_handler() function fails to properly validate fields in malformed probe packets.

Keyword: Programmable Congestion Control; Network Processor

Detected in version: 3.3.0

4838536

Description: During a DOCA Flow hot upgrade, the initialization of the new application may crash with a DOCA_ERROR_DRIVER due to DevX object creation failures.

Keyword: Hot upgrade; PSP master key; master key aliasing

Detected in version: 3.3.0

4891592

Description: Configuring DOCA Flow Connection Tracking via gRPC without specifying the nbArmSessions parameter in the ctCfg JSON payload causes the doca_flow_grpc server to crash.

Keyword: CT; doca_flow_grpc

Detected in version: 3.3.0

4984457

Description: Performance degradation in maximum packets per second occurs on ConnectX-8 and ConnectX-9 adapters when processing Connection Tracking rules.

Keyword: MAX PPS; CT

Detected in version: 3.3.0

4669057

Description: Processing a Connection Tracking entry that simultaneously modifies the IPv6 source, IPv6 destination, and TCP destination port triggers a segmentation fault.

Keyword: CT; IPv6 modification; TCP modification

Detected in version: 3.3.0

4919797

Description: When executed in debug mode, the doca_flow_hash_flooding_pipe sample fails to start.

Keyword: Sanity check; ENGINE_FWD_PIPE; ENGINE_FWD_HASH_PIPE

Detected in version: 3.3.0

4994255

Description: When configuring a hash pipe in DOCA Flow, providing multiple action descriptions (actionDescs) that modify distinct fields within the VXLAN header (specifically vxlan.vni and vxlan.rsvd1) causes the last action description in the array to be dropped and not applied to the packet.

Keyword: doca_flow; PIPE_HASH; VXLAN header

Detected in version: 3.3.0

4966667

Description:

Keyword:

Detected in version: 3.3.0

4847948

Description: When executing the doca_caps tool, checking for DOCA Telemetry PCC library support fails and returns a DOCA_ERROR_DRIVER error. This interrupts the process, preventing the tool from dumping the relevant telemetry and device information.

Keyword: DOCA Telemetry PCC; doca_caps

Detected in version: 3.3.0

4996071

Description: When utilizing the doca_gpu_dev_eth_rxq_recv_thread API, the returned output parameter out_first_pkt_idx may report an incorrect packet index. Furthermore, the function may fail to process the maximum number of packets requested by the max_rx_pkts input parameter.

Keyword: DOCA GPUNetIO API

Detected in version: 3.3.0

4923491

Description: When compiling doca-poc-services-3.2.2023 with datapath tracing enabled (-Dforce_trace_on=true), the build fails.

Keyword: HL PCC; PCC_DATAPATH_TRACE; macro redefinition; meson compilation error; DPA

Detected in version: 3.3.0

4918199

Description: When executing the doca_devemu_pci_device_stateful_region_dpu sample on the DPU, a single write operation from the host triggers three sequential events instead of one.

Keyword: Stateful region; PCIe emulation; write event

Detected in version: 3.3.0

4985536

Description: When reading counters via doca_telemetry_diag_query_counters, the API returns a truncated data batch whenever sample_id reaches its 20-bit wrap-around boundary ( ). Because the library fails to apply the proper bitmask across this boundary, queries terminate early, causing periodic data loss and subsequent DOCA_ERROR_SKIPPED errors due to firmware ring buffer overwrites.

Keyword:

Detected in version: 3.3.0

4906807

Description: When using the NIC Configuration Operator to update BlueField firmware, the process fails if any target network interfaces are already manually unbound from the mlx5_core driver.

Keyword: Firmware upgrade; unbind error

Detected in version: 3.3.0

4913600

Description: When a host contains a mix of BlueField DPUs and standard ConnectX network adapters, attempting to provision a selected DPU via the DMS fails. DMS incorrectly classifies the unselected ConnectX adapters as "excluded DPUs." During BFB activation, DMS attempts to convert all excluded devices to rshim interfaces; because ConnectX devices do not possess an rshim architecture, the tool encounters an internal error (can't find rshim address on device) and aborts the entire provisioning process.

Keyword: DOCA Management Service; BFB activation; RShim

Detected in version: 3.3.0

4874895

Description: When using the DMS provisioning/image/diff action to compare firmware versions, finding a mismatch between the currently running image and the provided offline image causes the server to incorrectly throw an RPC "Internal error". Because of this error handling failure, the API does not return a proper Image Diff response, and the requested JSON output file (via --json-output-dir) is not generated.

Keyword: DOCA Management Service; JSON; imagediff

Detected in version: 3.3.0

4835953

Description: When attempting to enable VFs on InfiniBand adapters using the doca_mgmt_data_direct tool, the operation fails.

Keyword: doca_mgmt_data_direct; doca_data_direct

Detected in version: 3.3.0

4929635

Description: A segmentation fault may occur when running the doca_flow_perf tool with a JSON configuration that attempts to simultaneously modify both the IPv4 source address and the UDP destination port (e.g., using the increase mode for both fields in the actions block).

Keyword: Segmentation fault; IPv4 source; UDP destination

Detected in version: 3.3.0

4955283

Description: When launching the doca_psp_gateway application, a segmentation fault (SIGSEGV) may occur, causing the application to crash during initialization.

Keyword: Segmentation fault; PSP

Detected in version: 3.3.0

DOCA-Host and DOCA Drivers Bug Fixes

Ref #

Issue Details

4850615 / 4369227 / 5000492 / NVBud 5503449

Description: Fixed an issue where Debian/Ubuntu systems installed an old version of the openibd init script instead of the latest one. The package now installs the correct, up-to-date openibd script, fixing boot behavior and ensuring proper module reload and IB interface creation.

Keyword: Debian/Ubuntu, openibd init

Detected in version: 3.3.0

4848424 / 4417358 / 5017386 / 5017387 / 5018630 / 5028826 / 5030867

Description: Fixed an issue where a BlueField-3 host SF could remain stuck in the inactive+attached state and fail to reactivate after host restart flows.

Keyword: inactive+attached state

Detected in version: 3.3.0

5001782 / 5004557 / 5004558 / 5004559 / 5011047 / NVbug 6100467

Description: Fixed a NULL pointer dereference in the mlx5_core driver that could cause a kernel panic and report PCIe completion-timeout errors on ConnectX-8 bare-metal systems.

Keyword: mlx5_core driver, kernel panic, PCIe completion-timeout errors

Detected in version: 3.3.0

4913956 / 4921870 / 4921892 / 4921893

Description: Fixed an issue that occurred when CONFIG_UBSAN=y and the user provided an invalid value, resulting in the following UBSAN error:
UBSAN: shift-out-of-bounds in drivers/net/ethernet/mellanox/mlx5/core/diag/diag_cnt.c:190:11 shift exponent 4294967293 is too large for 32-bit type 'int'

Keyword: CONFIG_UBSAN

Detected in version: 3.3.0

4956297 / 4960598 / 4962096 / 4962098 / 4962099 / 4962100

Description: Fixed a race in the ib_core kernel module during concurrent address resolution that could produce a zero destination MAC (DMAC), causing silent packet loss and RDMA retry-exceeded errors.

Keyword: ib_core kernel

Detected in version: 3.3.0

4958267

Description: Fixed an issue where upgrading doca-ofed did not always upgrade doca-extra (which includes the module sources and doca-kernel-support build scripts). In some cases, this caused doca-kernel-support to build against sources from an older release.

Keyword: doca-ofed upgrade

Detected in version: 3.4.0

4932058 / 4913818 / 4931212 / 4932056 / 4932243

Description: Fixed an issue where accessing timestamping mode could fail on kernel 7.0 and later.

Keyword: timestamping mode

Detected in version: 3.4.0

BSP Bug Fixes

Ref #

Details

4693948

Description: When attempting to install the Ubuntu 24.04 (64k kernel) BFB image directly to an EMMC device, the installation may fail with a kernel panic. The system logs indicate an inability to mount the root filesystem, specifically returning the error: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0).

Keywords: VFS; kernel panic; EMMC installation

Detected in version: 4.14.0

4923234

Description: The libdoca-sdk-sta-dev package is missing from the BFB and DOCA base images for Ubuntu 24.04. This package is required for target offload functionality.

Keywords: Target offload; BFB image

Detected in version: 4.14.0

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Keywords: BFB installer; Redfish API; task monitoring

Detected in version: 4.14.0

4863927

Description: When attempting to install development packages via dnf on Rocky Linux 9.2, users may encounter a package repository inconsistency. This version mismatch results in a dependency resolution failure, preventing the installation of the packages.

Keywords: gcc; gcc-c++; dnf

Detected in version: 4.14.0

4988092

Description: Following an out-of-the-box installation and subsequent reboot on Ubuntu 24.04 (64k kernel), the NetworkManager-wait-online.service fails to start. The system logs indicate that the service times out while waiting for network connectivity, causing the service state to be marked as failed (status=1/FAILURE).

Keywords: Network Manager; systemd; timeout

Detected in version: 4.14.0

4893340

Description: A strict 128KB maximum size limit for the bf.cfg configuration file caused deployment failures in environments requiring more extensive configurations, such as those that pass large Ignition configuration payloads via the Data Processing Framework (DPF).

Keywords: BFB installation; file size limit; bf.cfg

Detected in version: 4.14.0

4849953

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 8.

Keywords: DPA; missing package

Detected in version: 4.14.0

4871396

Description: The dpa-ps and dpa-statistics diagnostic tools are missing from the BlueField-3 BFB image for Oracle Linux 9.

Keywords: DPA; missing package

Detected in version: 4.14.0

4907434

Description: When a BMC firmware update requires activation (and is the only component pending), the doca-installer tool incorrectly advises the user to perform a Level 3 firmware reset (mlxfwreset -l 3). Executing this recommended reset will not activate the pending BMC firmware. This is a messaging error in the tool's final status summary.

Keywords: BMC firmware; pending activation

Detected in version: 4.14.0

4907646

Description: When using the doca-installer --compare command with the --psid flag to target specific devices, the tool correctly identifies and skips unsupported BlueField-2 devices but still incorrectly halts to ask, "Continue with the available devices?". This unexpected interactive prompt interrupts automated scripts that rely on the tool executing without manual intervention.

Keywords: Automation; interactive prompt

Detected in version: 4.14.0

4879150

Description: Running the mlnx-sf utility with the -e or --enable-eswitch flag causes the command to fail, returning an Unknown option "eswitch" error. This occurs because the eswitch configuration action is no longer supported by the underlying mlxdevm and devlink utilities.

Keywords: Scalable functions; mlnx-sf; eswitch; mlxdevm

Detected in version: 4.14.0

4776492

Description: Occasionally, upgrading PLDM BFB from DOCA v3.2.0 to v3.2.1 may lead to an assert 0x7 in dmesg.

Keywords: PLDM

Detected in version: 4.14.0

4949639

Description: During a BFB installation, the CEC firmware updates successfully, but the completion confirmation message is missing from the RSHIM logs. The log displays "Updating CEC firmware" but omits the final success status before moving on to the next installation step (such as updating certificates).

Keywords: RShim logs; CEC firmware

Detected in version: 4.14.0

4836088

Description: Executing bfcfg -d produces corrupt output and displays incorrect boot options. This occurs due to a parsing error when the tool attempts to convert the BOOTx_DEVPATH variables from their binary configuration format back into ASCII text.

Keywords: Secure boot; ASCII conversion; BOOTx_DEVPATH

Detected in version: 4.14.0

4839828

Description: Host tmfifo_net interfaces may intermittently acquire random MAC addresses instead of the expected 00:1a:ca:ff:ff:XX pattern. This can cause failures in environments that enforce strict MAC address validation.

Keywords: MAC address; tmfifo_net; rshim

Detected in version: 4.14.0

4924237

Description: The RShim USB device may intermittently disappear from the DPU BMC, causing operations that rely on it to fail with a "Failed to enable BMC rshim" error.

Keywords: RShim USB; out-of-band update

Detected in version: 4.14.0

4658222

Description: During the DPU boot-up sequence, an intermittent call trace containing the warning WARN_ON(!host->claimed) may appear in the system logs.

Keywords: Call trace; kernel boot up

Detected in version: 4.14.0

4604090

Description: If a corrupted or unauthenticated BFB image is transferred from the BMC to the DPU, the system halts the installation process as part of a built-in security mechanism. Once triggered, the recovery path remains locked to prevent potential compromise.

Keywords: Corrupt; BFB

Detected in version: 4.14.0

4848119

Description: A BlueField-2 UEFI boot-time regression added approximately 20 seconds to system startup.

Keywords: BlueField-2; boot time; UEFI

Detected in version: 4.14.0

4904043

Description: An intermittent firmware assert error (synd 0x7: irisc not responding) may appear in system logs (dmesg) following a PLDM firmware upgrade while the device is in NIC mode.

Keywords: PLDM

Detected in version: 4.14.0

BMC Bug Fixes

Ref #

Issue Details

4944048

Description: When upgrading or downgrading between the 25.10-LTSU2 and 26.04 releases, repeated BMC reboots may, in rare cases, cause the profile-manager service to fail due to a malformed JSON file. This failure triggers a core dump of the service, populates core dump logs, and causes the golden-image service to become unresponsive.

Workaround: Perform a factory reset on the BMC.

Keyword: BMC reboot; core dump; factory reset

Reported in version: 25.10-LTSU2

4917779

Description: Initiating an Arm GracefulReset can cause the BMC's Redfish UpdateService to incorrectly report its state as UnavailableOffline.

Reported in version: 26.01

4948318

4945554

Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state.

Reported in version: 26.01

4401488

Description: The BMC kernel enforces CONFIG_STATIC_USERMODEHELPER, which routes all usermode helper calls to /sbin/usermode-helper. Because this executable does not exist on the system, these calls fail and may cause undefined system behavior.

Reported in version: 26.01

4905017

Description: When operating in NIC mode, a host power cycle may intermittently cause the UEFI to fail to retrieve BMC Redfish credentials. This results in a DPU-BMC RF credentials not found error and introduces a timeout delay during the boot sequence.

Reported in version: 26.01

 4969243

Description: When the ENABLE_BMC_WAIT flag is active on BlueField-3 DPUs, the BMC SEL may intermittently fail to record the complete UEFI boot progress following a host power cycle. This is a cosmetic issue and does not affect the functionality of the DPU.

Reported in version: 26.01

4995032

Description: Redfish queries via OobUpdate --show_all_version intermittently returned empty strings for IPMB-backed properties (such as BOARD and BSP).

Reported in version: 26.01

 4867786

Description: During BFB installation, the Golden ARM image update may intermittently hang and fail via Redfish, logging a golden_image_arm firmware update timed out error.

Reported in version: 26.01

 4914053

Description: The BFB installer defaults to DHCP for the VLAN4040 interface. If no DHCP server is present, the request silently fails after a 300-second timeout, bypassing the static IP fallback and skipping all BMC-related firmware updates.

Reported in version: 26.01

4924426

Description: Following a DPU reset, the BaseMACBaseGUID, and Description fields may incorrectly return as empty within the redfish/v1/Systems/Bluefield/Oem/Nvidia schema response.

Reported in version: 26.01

 4987307

Description: During BFB installations via Redfish, the task state may change to "Exception" before the specific error message is appended to the HTTP response payload. This results in incomplete error logs on the initial poll following a failure.

Reported in version: 26.01

 4980118

Description: The set_emu_params.sh script attempts to load the mlxbf_ptm (DPU Power Telemetry) driver, leading to continuous module load failures and log spam because mlxbf_ptm is a non-upstreamed debug driver that is unsupported on several operating systems.

Reported in version: 26.01

 4799519

Description: Accessing the /redfish/v1/Managers/Bluefield_BMC Redfish endpoint may time out, accompanied by repeated ipmb-host timeout errors in the console and significant delays (multiple minutes) when opening the UEFI "System Configuration" page.

Reported in version: 26.01

4932328

Description: Excessive Common Platform Error Record (CPER) files in /var/cper can exhaust the BMC root partition space. This crashes the entity-manager service and causes IPMI commands like ipmitool sdr to fail.

Reported in version: 26.01

4957197

Description: When external monitoring tools or scripts repeatedly query the BMC's Redfish interface using Basic authentication over extended periods, internal session resources fail to release properly. This memory leak eventually causes the BMC to lose network connectivity, even while the DPU management interface remains online.

Reported in version: 26.01

4966472

Description: The BMC generates a warning log for PLDM_Sensor_1_100 when the NIC temperature reaches the official 91°C upper non-critical threshold. This is an expected hardware alert for elevated temperatures, not a software defect.

Reported in version: 26.01

BlueField-3 Firmware Bug Fixes

Internal Ref.

Issue

4881757 / 4859649 / NVbug 5887804

Description: Resolved a firmware corner case causing the Flash Gateway to return incorrect data.

Keywords: Flash Gateway

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4964566 / 4957757

Description: Fixed a DEAD IRISC assert that could occur during TLV NV_DATA flash access by suspending the watchdog while waiting for flash IPC (until timeout), preventing the assert on TLV access.

Keywords: DEAD IRISC assert

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4657767 / 4658776 / 4874764 / 4874765

Description: Fixed an issue where repeatedly writing NVCONFIG TLVs could cause excessive NV_DATA partition swaps during garbage collection. This rapid cycling could accelerate flash wear (end-of-life at 100,000 erases) and potentially render the device inoperable.

Firmware now avoids unnecessary physical writes by returning OK when the requested configuration already exists in flash, and increases the maximum supported NV_DATA partition swaps from 100,000 to 200,000.

Keywords: NVCONFIG TLVs

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4871267 / 4871254

Description: Fixed an issue where ICM_RES_HW_DMFS_ENCAP_H_FW was allocated per GVMI, preventing some RTTs from using it.

Keywords: GVMI, RTT

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4860860

Description: Fixed an issue where queue pairs (QPs) created during a PCC process transition could miss congestion-control (CC) information, preventing them from being fully managed.

Keywords: DOCA, PCC, QP, Congestion Control

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4796182

Description: Fixed an issue where the live migration target did not receive a port state change event on the resume VHCA command. The target now generates this event so software that depends on port state is notified of any changes.

Keywords: Live migration

Discovered in Version: 32.47.1026

Fixed in Release: 32.49.1012

4683339 / 4780301 / 4895260

Description: Fixed an issue where QPs established before loading DOCA PCC could exhibit inconsistent algorithm-selected behavior between ports in LAG mode after DOCA PCC is loaded.

Keywords: Congestion Control, DOCA PCC

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4873353

Description: Added post-FMT initialization checks to verify static PFs and port PFs and ensure the expected assumptions hold.

Keywords: Post-FMT initialization checks

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4804415 / 4821502

Description: Fixed an issue where a host warm reboot with emulated NVMe devices exposed could cause firmware assertion 0x8494, preventing SNAP from operating on the existing emulated NVMe devices.

Keywords: NVMe

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4931651

Description: Fixed an issue where the APT unit diagnostic counter used an incorrect internal address, causing CR space timeouts and returning incorrect values.

Keywords: Diagnostic counters

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4946554 / 4954401 / 4954405 / 4954410 / 4957021

Description: Fixed an issue where configuring more than 16 PFs on the external host and then triggering an FLR on the ECPF could cause an iRISC hang on BlueField. The external host must be configured with 16 PFs or fewer.

Keywords: Bluefield, ECPF FLR

Discovered in Version: 32.48.1000

Fixed in Release: 32.49.1012

4534767

Description: Fixed an issue where, in multi-probe mode, only one slot for IFA1 or IFA2 was allowed, even though IFA1 and IFA2 can operate together.

Keywords: PCC, IFA1, IFA2

Detected in version: 32.46.1006

Fixed in Release: 32.49.1012

4876645 / 4879909 / 4879910

Description: Fixed an issue where an FLR on the emulation device could cause corrupted data to be sent to the destination via the emulation device’s backend QP.

Keywords: NVMe and Virtio emulation

Detected in version: 32.47.1026

Fixed in Release: 32.49.1012

BlueField-2 Firmware Bug Fixes

Internal Ref.

Issue

4796182

Description: Fixed an issue where the live migration target did not receive a port state change event on the resume VHCA command. The target now generates this event so software that depends on port state is notified of any changes.

Keywords: Live migration

Discovered in Version: 24.47.1026

Fixed in Release: 24.49.1012

4924450 / 4942300 / 4942302 / 4942304

Description: Fixed an issue in PCC where RDMA traffic could stall at large scale for certain IP and UDP source-port combinations when a PCC user algorithm was active and no CC algorithm was configured in slot 0.

Keywords: PCC, RDMA

Discovered in Version: 24.48.1000

Fixed in Release: 24.49.1012

4946554 / 4954401 / 4954405 / 4954410 / 4957021

Description: Fixed an issue where configuring more than 16 PFs on the external host and then triggering an FLR on the ECPF could cause an iRISC hang on BlueField. The external host must be configured with 16 PFs or fewer.

Keywords: Bluefield, ECPF FLR

Discovered in Version: 24.48.1000

Fixed in Release: 24.49.1012

Last updated: