DOCA Framework Bug Fixes
|
Ref # |
Issue |
|---|---|
|
4426511 |
Description: Orchestrated reset mode (MLXConfig) will be released as a Beta feature. There's a known race condition between server reboot and the reset flow running in parallel, which can cause the reset to go out of sync. |
|
Keyword: Orchestrated reset mode |
|
|
Detected in version: 3.0.0 |
|
|
2657392 |
Description: OFED installation caused CIFS to break in RHEL 8.4 and above. A dummy module was added so that CIFS will be disabled after OFED installation in RHEL 8.4 and above. |
|
Keyword: Installation; CIFS |
|
|
Detected in version: 3.0.0 |
|
|
4374396 |
Description: Ingress mirroring rules configured on OVS-DOCA are not offloaded to hardware when using remote GRE tunnels. |
|
Keyword: Mirroring |
|
|
Detected in version: 3.0.0 |
|
|
4464648 |
Description: When the server crashes, the client’s Comch Producer may call the send error callback twice for the same task, potentially leading to buffer reference count errors. |
|
Keyword: DOCA Comch; duplicate callback; buffer refcount error |
|
|
Detected in version: 3.0.0 |
|
|
4454054 |
Description: DMS pod requires "Linux is up" from |
|
Keyword: Installation |
|
|
Detected in version: 3.0.0 |
|
|
4255270 |
Description: Packets are encapsulated with standard VXLAN headers even when VXLAN-GBP is configured, resulting in the Group Policy ID (GBP) not being applied. |
|
Keyword: VXLAN; packet encapsulation |
|
|
Detected in version: 2.10.0 |
|
|
4263035 |
Description: In L3 EVPN scenarios with 16k overlay and 4k underlay routes, OVS may get stuck or abnormally terminate. |
|
Keyword: HBN |
|
|
Detected in version: 2.10.0 |
|
|
3851200 |
Description: Once PPS is enabled, there is no way to disable it via FireFly commands. |
|
Keyword: PTP |
|
|
Detected in version: 2.7.0 |
DOCA-Host and DOCA Drivers Bug Fixes
|
Ref # |
Issue |
|---|---|
|
4404290 |
Description: Fixed a crash triggered by handling multiple CMA net events in rapid succession on the same CMA ID. |
|
Keyword: CMA |
|
|
Detected in version: 3.0.0 |
|
|
4500815 |
Description: Fixed an issue that caused packet loss when enabling or disabling promiscuous mode on a network interface. |
|
Keyword: Promiscuous mode |
|
|
Detected in version: 3.0.0 |
|
|
4514994 |
Description: Fixed performance degradation on older kernel versions using RX cache, particularly on slower ARM CPUs with larger RX buffers. The issue was caused by the driver attempting to allocate new RX pages too quickly, leading to head-of-line blocking in the RX cache. The fix improves RX cache usage by triggering page allocation for a bulk of at least 2 WQEs, allowing the application more time to process packets and return buffers to the RX cache, thereby reducing blocking and enhancing performance. |
|
Keyword: Performance, kernel, Rx cache, page allocation |
|
|
Detected in version: 3.0.0 |
|
|
4504899 |
Description: Fixed behavior to align with |
|
Keyword: VFs |
|
|
Detected in version: 3.0.0 |
|
|
3680538 |
Description: When using strongSwan or OVS-IPsec as explained in the NVIDIA BlueField DPU BSP, the IPSec Rx datapath is not offloaded to hardware and occurs in software running on the Arm cores. As a result, bandwidth performance is substantially low. |
|
Keyword: IPsec |
|
|
Detected in version: 3.0.0 |
|
|
4448262 |
Description: Fixed an issue where a kernel crash could occur if a device event arrives during the event subscription process. |
|
Keyword: devx; event_fd |
|
|
Detected in version: 3.0.0 |
|
|
4449477 |
Description: On BlueField-3 devices running linux-bluefield kernel versions 5.15.0-1050 or 5.15.0-1060, a kernel crash may occur due to a NULL pointer dereference in the cls_api network scheduler. |
|
Keyword: Kernel crash |
|
|
Detected in version: 3.0.0 |
BSP Bug Fixes
|
Ref # |
Details |
|---|---|
|
4693948 |
Description: When attempting to install the Ubuntu 24.04 (64k kernel) BFB image directly to an EMMC device, the installation may fail with a kernel panic. The system logs indicate an inability to mount the root filesystem, specifically returning the error: |
|
Keywords: VFS; kernel panic; EMMC installation |
|
|
Detected in version: 4.14.0 |
|
|
4923234 |
Description: The |
|
Keywords: Target offload; BFB image |
|
|
Detected in version: 4.14.0 |
|
|
4948318 4945554 |
Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state. |
|
Keywords: BFB installer; Redfish API; task monitoring |
|
|
Detected in version: 4.14.0 |
|
|
4863927 |
Description: When attempting to install development packages via |
|
Keywords: gcc; gcc-c++; dnf |
|
|
Detected in version: 4.14.0 |
|
|
4988092 |
Description: Following an out-of-the-box installation and subsequent reboot on Ubuntu 24.04 (64k kernel), the |
|
Keywords: Network Manager; systemd; timeout |
|
|
Detected in version: 4.14.0 |
|
|
4893340 |
Description: A strict 128KB maximum size limit for the |
|
Keywords: BFB installation; file size limit; bf.cfg |
|
|
Detected in version: 4.14.0 |
|
|
4849953 |
Description: The |
|
Keywords: DPA; missing package |
|
|
Detected in version: 4.14.0 |
|
|
4871396 |
Description: The |
|
Keywords: DPA; missing package |
|
|
Detected in version: 4.14.0 |
|
|
4907434 |
Description: When a BMC firmware update requires activation (and is the only component pending), the |
|
Keywords: BMC firmware; pending activation |
|
|
Detected in version: 4.14.0 |
|
|
4907646 |
Description: When using the |
|
Keywords: Automation; interactive prompt |
|
|
Detected in version: 4.14.0 |
|
|
4879150 |
Description: Running the |
|
Keywords: Scalable functions; mlnx-sf; eswitch; mlxdevm |
|
|
Detected in version: 4.14.0 |
|
|
4776492 |
Description: Occasionally, upgrading PLDM BFB from DOCA v3.2.0 to v3.2.1 may lead to an assert 0x7 in dmesg. |
|
Keywords: PLDM |
|
|
Detected in version: 4.14.0 |
|
|
4949639 |
Description: During a BFB installation, the CEC firmware updates successfully, but the completion confirmation message is missing from the RSHIM logs. The log displays "Updating CEC firmware" but omits the final success status before moving on to the next installation step (such as updating certificates). |
|
Keywords: RShim logs; CEC firmware |
|
|
Detected in version: 4.14.0 |
|
|
4836088 |
Description: Executing |
|
Keywords: Secure boot; ASCII conversion; BOOTx_DEVPATH |
|
|
Detected in version: 4.14.0 |
|
|
4839828 |
Description: Host |
|
Keywords: MAC address; tmfifo_net; rshim |
|
|
Detected in version: 4.14.0 |
|
|
4924237 |
Description: The RShim USB device may intermittently disappear from the DPU BMC, causing operations that rely on it to fail with a "Failed to enable BMC rshim" error. |
|
Keywords: RShim USB; out-of-band update |
|
|
Detected in version: 4.14.0 |
|
|
4658222 |
Description: During the DPU boot-up sequence, an intermittent call trace containing the warning |
|
Keywords: Call trace; kernel boot up |
|
|
Detected in version: 4.14.0 |
|
|
4604090 |
Description: If a corrupted or unauthenticated BFB image is transferred from the BMC to the DPU, the system halts the installation process as part of a built-in security mechanism. Once triggered, the recovery path remains locked to prevent potential compromise. |
|
Keywords: Corrupt; BFB |
|
|
Detected in version: 4.14.0 |
|
|
4848119 |
Description: A BlueField-2 UEFI boot-time regression added approximately 20 seconds to system startup. |
|
Keywords: BlueField-2; boot time; UEFI |
|
|
Detected in version: 4.14.0 |
|
|
4904043 |
Description: An intermittent firmware assert error ( |
|
Keywords: PLDM |
|
|
Detected in version: 4.14.0 |
BMC Bug Fixes
|
Ref # |
Issue Details |
|---|---|
|
4944048 |
Description: When upgrading or downgrading between the 25.10-LTSU2 and 26.04 releases, repeated BMC reboots may, in rare cases, cause the |
|
Workaround: Perform a factory reset on the BMC. |
|
|
Keyword: BMC reboot; core dump; factory reset |
|
|
Reported in version: 25.10-LTSU2 |
|
|
4917779 |
Description: Initiating an Arm |
|
Reported in version: 26.01 |
|
|
4948318 4945554 |
Description: If a secondary BMC task (such as a log dump) is started after the BMC firmware update has been initiated, but before the installer's monitoring logic has attached to it, the installer may mistakenly track the secondary task. This tracking error causes the installer to misjudge the update's completion, which can cause the subsequent BMC reboot to fail and leave the new firmware in a pending, unactivated state. |
|
Reported in version: 26.01 |
|
|
4401488 |
Description: The BMC kernel enforces |
|
Reported in version: 26.01 |
|
|
4905017 |
Description: When operating in NIC mode, a host power cycle may intermittently cause the UEFI to fail to retrieve BMC Redfish credentials. This results in a |
|
Reported in version: 26.01 |
|
|
4969243 |
Description: When the |
|
Reported in version: 26.01 |
|
|
4995032 |
Description: Redfish queries via |
|
Reported in version: 26.01 |
|
|
4867786 |
Description: During BFB installation, the Golden ARM image update may intermittently hang and fail via Redfish, logging a |
|
Reported in version: 26.01 |
|
|
4914053 |
Description: The BFB installer defaults to DHCP for the VLAN4040 interface. If no DHCP server is present, the request silently fails after a 300-second timeout, bypassing the static IP fallback and skipping all BMC-related firmware updates. |
|
Reported in version: 26.01 |
|
|
4924426 |
Description: Following a DPU reset, the |
|
Reported in version: 26.01 |
|
|
4987307 |
Description: During BFB installations via Redfish, the task state may change to "Exception" before the specific error message is appended to the HTTP response payload. This results in incomplete error logs on the initial poll following a failure. |
|
Reported in version: 26.01 |
|
|
4980118 |
Description: The |
|
Reported in version: 26.01 |
|
|
4799519 |
Description: Accessing the |
|
Reported in version: 26.01 |
|
|
4932328 |
Description: Excessive Common Platform Error Record (CPER) files in |
|
Reported in version: 26.01 |
|
|
4957197 |
Description: When external monitoring tools or scripts repeatedly query the BMC's Redfish interface using Basic authentication over extended periods, internal session resources fail to release properly. This memory leak eventually causes the BMC to lose network connectivity, even while the DPU management interface remains online. |
|
Reported in version: 26.01 |
|
|
4966472 |
Description: The BMC generates a warning log for PLDM_Sensor_1_100 when the NIC temperature reaches the official 91°C upper non-critical threshold. This is an expected hardware alert for elevated temperatures, not a software defect. |
|
Reported in version: 26.01 |
BlueField-3 Firmware Bug Fixes
|
Internal Ref. |
Issue |
|---|---|
|
4501157 / 4257750 |
Description: Fixed a critical issue with a live firmware patch. |
|
Keywords: Live firmware patch
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4377816 |
Description: Fixed an issue where firmware did not de-assert the |
|
Keywords: mlxconfig |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4286902 |
Description: Fixed a race condition in DPA process termination during the exception flow, where a failed process could be missed and not reported to the user. |
|
Keywords: DPA |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4420567 |
Description: Removed an unnecessary and partially incorrect firmware check that blocked valid action list permutations allowed by the PRM. Validation of these permutations remains the responsibility of the software. |
|
Keywords: Header actions |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4498670 |
Description: Fixed a race condition where destroying two emulation objects with the same VHCA ID could result in one destroy command failing with syndrome 0xF3F880. |
|
Keywords: VirtIO |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4475307 |
Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher. |
|
Keywords: PCC DCQCN, congestion control. |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4443601 |
Description: Fixed a firmware issue where PXE failed to boot when both LAG ports were up. |
|
Keywords: PXE, LAG
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4480427 |
Description: Fixed incorrect calculation of start address and mode for the CQE buffer in DPA CQ, which could cause CQEs to be written to the wrong address when the buffer is not 4K-aligned and spans a second page boundary. |
|
Keywords: CQ, CQE Buffer, DPA
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4388371 |
Description: Fixed an issue where an uninitialized pport in the SLRG command, when using the SMP interface, caused an assertion failure. |
|
Keywords: SLRG, SMP interface, pport
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4395036 |
Description: Fixed a race condition between firmware and hardware flows during QP closure. |
|
Keywords: Race condition
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4428580 |
Description: Fixed a rare issue where triggering mstdump via core_dump in Windows drivers could cause a PCI link down condition. |
|
Keywords: mstdump, windows
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4428580 |
Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages. |
|
Keywords: vQoS, latency
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4436922 |
Description: DC InfiniBand is not functional in this firmware version. |
|
Keywords: DC, DDP traffic |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4366117 |
Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. |
|
Keywords: PXE boot filters |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4475307 |
Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher. |
|
Keywords: PCC DCQCN, congestion control. |
|
|
Detected in version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4486431 |
Description: Fixed an issue where issuing multiple parallel queries of DPA_THREAD objects with the same object ID could fail. |
|
Keywords: DPA |
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
|
|
4470053 |
Description: Fixed an issue with vQoS parameter configuration to improve latency handling for large messages. |
|
Keywords: vQoS, latency
|
|
|
Discovered in Version: 32.45.1020 |
|
|
Fixed in Release: 32.46.1006 |
BlueField-2 Firmware Bug Fixes
|
Internal Ref. |
Issue |
|---|---|
|
4366117 |
Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. |
|
Keywords: PXE boot filters |
|
|
Discovered in Version: 24.45.1020 |
|
|
Fixed in Release: 24.46.1006 |
Last updated: