ConnectX-8 Firmware Release Notes

Bug Fixes in this Firmware Version

Internal Ref.

Issue

4812446 / 4917369

Description: Fixed an issue where CREATE_PARSE_GRAPH_NODE incorrectly applied validation intended for WA mode, in which header_length_field_offset is adjusted by firmware. Since ConnectX-8 and newer adapters use normal mode only, these checks were redundant and could reject valid input.

Validation is now mode-specific: normal-mode rules are applied on ConnectX-8 and newer adapters, while WA-mode rules are applied on older adapters.

Keywords: Parse graph node

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4812446

Description: Fixed an issue where firmware did not enforce the adapter limit for the number of flow match samples per parse graph node across the device. As a result, creation could succeed even when the total exceeded the hardware-supported limit.

Firmware now enforces this limit before allocating samples. Creating a parse graph node that would exceed the maximum number of hardware flow match samples now fails with a “no resources” error.

Keywords: Parse graph node

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4864823 / 4858750

Description: Fixed an issue where, when Flex Parser overwrite native arc was enabled, destroying or closing a parse graph node on a native protocol arc did not restore the native protocol parser. As a result, the native parser remained unavailable on the device until a reset or another recovery action was performed.

Native protocol parsing is now restored as expected when the parse graph node is closed or destroyed. Native parsers are disabled only while the Flex Parser graph owns the arc.

Keywords: Flex Parser

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4980585 / NVBug 6084453

Description: Fixed an issue where NSM Get-Port-Network-Addresses returned an invalid format on B200/B300 HMC. It now follows the documented non-compact format.

Keywords: NSM Get-Port-Network-Addresses

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4928056 / 4999011 / 4999400

Description: Fixed a corner-case race condition triggered by DSP resets, causing the link to hang.

Keywords: DSP reset

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4964566 / 4957757

Description: Fixed a DEAD IRISC assert that could occur during TLV NV_DATA flash access by suspending the watchdog while waiting for flash IPC (until timeout), preventing the assert on TLV access.

Keywords: DEAD IRISC assert

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4835832

Description: In rare cases, certain module types may experience link-up failures.

Keywords: Cables

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4859294

Description: Fixed an issue where, after toggling all ports, one port could become stuck in the Receiver Detect state.

Keywords: Port toggling

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4657767 / 4658776 / 4874764 / 4874765

Description: Fixed an issue where repeatedly writing NVCONFIG TLVs could cause excessive NV_DATA partition swaps during garbage collection. This rapid cycling could accelerate flash wear (end-of-life at 100,000 erases) and potentially render the device inoperable.

Firmware now avoids unnecessary physical writes by returning OK when the requested configuration already exists in flash, and increases the maximum supported NV_DATA partition swaps from 100,000 to 200,000.

Keywords: NVCONFIG TLVs

Detected in version:

40.48.1000

Fixed in Release:

40.49.1014

4983638 / NVBug 6090735

Description: Fixed an issue where NSM did not function reliably in multihost configurations when a DMA physical function (PF) was present.

Keywords: NSM, multihost configurations, DMA physical function (PF)

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4871267 / 4871254

Description: Fixed an issue where ICM_RES_HW_DMFS_ENCAP_H_FW was allocated per GVMI, preventing some RTTs from using it.

Keywords: GVMI, RTT

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4860860

Description: Fixed an issue where queue pairs (QPs) created during a PCC process transition could miss congestion-control (CC) information, preventing them from being fully managed.

Keywords: DOCA, PCC, QP, Congestion Control

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4774410 / 4773595

Description: Fixed an issue where the LAG layer steering table had only one strict-SQ entry, causing strict no-port-select traffic to be routed to a single default port. With LOAD_BALANCE_MODE_P1=3, this could lead to link flapping because LACP packets were transmitted only from that default port. The fix expands the LAG layer steering table and adds additional strict-port entries to distribute strict traffic correctly.

Keywords: Link Flapping

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4789601 / 4850200 / NVBug 5736447

Description: Fixed an issue where RDMA traffic could stall in large-scale deployments for certain source IP and UDP source-port combinations when DOCA PCC was active and no congestion-control algorithm was configured in algorithm slot 0.

Keywords: RDMA, DOCA PCC, Congestion Control Algorithm

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4796182

Description: Fixed an issue where the live migration target did not receive a port state change event on the resume VHCA command. The target now generates this event so software that depends on port state is notified of any changes.

Keywords: Live migration

Detected in version:

40.47.1026

Fixed in Release:

40.49.1014

4859742 / 4847702

Description: Fixed an issue during hitless upgrade where, after the SACK generation/handler fence, firmware could mark the old port configuration ID as invalid. If SACK causes were still active, a race could cause SACK ISRs to stop unexpectedly. The fix is to always return BUSY while handover is active.

Keywords: Hitless upgrade

Detected in version:

40.48.1000

Fixed in Release:

40.49.1014

4873533 / 4947034

Description: Fixed an issue where, when using multiplane ZTRCC congestion control with multiple flows, the RTT timeout counter in the PPCC register could increase even when the network was not congested.

Keywords: Congestion control, multiplane, SPCX CC, RTT, PPCC

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

4876790 / 4822829

Description: Fixed an issue where a firmware race between packet receive and QP cleanup could move a QP to error when reopening the same QP and sending the same MSN. This could occur when interrupting traffic (e.g., Ctrl+C) and running many iterations until the same QP/MSN combination is reused.

Keywords: Firmware race

Detected in version:

40.48.1000

Fixed in Release:

40.49.1014

4905774

Description: Fixed an issue where the FMT Static PF verifier did not account for the Tools PF when verifying the DMA PF. 

Keywords: FMT Static PF verifier

Detected in version:

40.48.1000

Fixed in Release:

40.49.1014

4683339 / 4780301 / 4895260

Description: Fixed an issue where QPs established before loading DOCA PCC could exhibit inconsistent algorithm-selected behavior between ports in LAG mode after DOCA PCC is loaded.

Keywords: Congestion Control, DOCA PCC

Detected in version:

40.45.1020

Fixed in Release:

40.49.1014

4843342 / 4970550

Description: Fixed an issue where one-to-one RoCE traffic using a single QP might not achieve line rate on some platforms when using the default ROCE_CC_COMPATIBILITY_MODE setting in mlxconfig.

Keywords: Congestion control, PCC, ROCE_CC_COMPATIBILITY_MODE

Detected in version:

40.47.1026

Fixed in Release: 

40.49.1014

4892822 / 4638811

Description: Fixed an issue where, in some configurations with ConnectX-8 connected to an MP3 switch, the link might fail to come up on the switch side after toggling ports. In these cases, ConnectX-8 reported opcode 14, indicating detected remote faults and that the partner was not bringing the link up.

Keywords: Link down, opcode 14, AC/DC cycles

Detected in version:

40.47.1026

Fixed in Release:

40.49.1014

4783261 / 4658799

Description: Fixed an issue where ACS errors could occur due to a race condition when performing LDE/SBR while MCTP traffic to the device was active.

Keywords: ACS, MCTP

Detected in version:

40.47.1026

Fixed in Release:

40.49.1014

4881141 / 4882127 / 4882128

Description: Fixed a potential routing error when IOVAs assigned to the NIC overlap with the PCIe address space. In certain configurations, an IOVA could fall into an “unclaimed address” range, within a PCIe switch’s Upstream Port (USP) window but outside any Downstream Port (DSP) aperture.
 
Note: Since this is a kernel issue, to ensure packets are routed correctly in these scenarios, users must enable the ACS Unclaimed Request Redirect bit in the PCIe bridges’ Access Control Services (ACS) capability via the kernel.

Keywords: ACS Unclaimed Request Redirect, IOVA

Detected in version:

40.47.1026

Fixed in Release:

40.49.1014

4952219 / NVBug 6026050

Description: Fixed an issue where partially enabling ACS enhanced capability could cause unexpected driver and system behavior. ACS enhanced capability remains disabled and will not be enabled.

Keywords: ACS

Detected in version:

40.48.1000

Fixed in Release: 

40.49.1014

Last updated: