ConnectX-8 Firmware Release Notes

Bug Fixes History

This section includes history of bug fixes of 3 major releases back. For older releases history, please refer to the relevant firmware versions Release Notes.

Internal Ref.

Issue

4823907 / NVbug 5742181

Description: Fixed an issue where, in certain configurations with the ConnectX-8 PCIe switch enabled, downstream devices (including GPUs) might not be detected and could drop from the PCI bus, with GPU sensors/properties reporting nan. This was caused by the device not receiving the required PERST# assertion during initialization, and was seen only when PCIe settings were manually modified via mlxconfig (e.g., restricting link speed/width or ASPM on specific PCI buses).

Note: On legacy firmware, additional configuration steps may still be required, as detailed below.

If you cannot update the firmware immediately, you can restore device detection using one of the following options:

  • Option 1: Reset configuration

    • Reset the device configuration to defaults:

      mlxconfig -d <device> -y reset

  • Option 2: Manual PERST configuration (B300)

    • Manually configure the PERST parameters:

      mlxconfig -d <device> set PCI_BUS10_CONTROL_EN=1
      mlxconfig -d <device> set PCI_BUS10_PERST_SOURCE=2
      mlxconfig -d <device> set PCI_BUS10_PERST_GPIO=8

Keywords: ConnectX-8 PCIe, GPU, PERST# assertion

Detected in version:

40.47.1026

Fixed in Release: 

40.48.1000

4786813

Description: Fixed an issue where the DPA kernel used unsafe ICM access during process creation/modification, which could cause the DPA kernel to hang during FLR.

Keywords: DPA kernel, FLR

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4884739

Description: Link failures may occasionally be observed at PAM4 speeds over optical interfaces in rare cases.

Keywords: PAM4 speeds, optical interfaces

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4804664 / 4806969

Description: Fixed an issue in the User Debugger “query caps” where it returned only the number of capabilities, not the capability bitmap.

Keywords: User Debugger “query caps”

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4813862 / 4146077

Description: Fixed an issue where CR dumps could time out when accessing xpl_top addresses across all three pcores.

Keywords: CR dump

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4833440

Description: Fixed an issue where the Virtio and NVMe EMU_MNG settings were exposed incorrectly, which could cause confusion when using mlxconfig.

Keywords: Virtio and NVMe emulation, mlxconfig

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4756451

Description: Fixed an issue where the PHY LED could show green during the initializing state when active speed was set to full speed. In IB mode, the initializing-state LED should be amber only.

Keywords: PHY LED

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4768546

Description: Fixed an issue where, on multi-PF-per-port systems, a PF FLR could impact the traffic bandwidth of other PFs on the same port.

Keywords: PF FLR

Detected in version:

40.47.1026

Fixed in Release: 

40.48.1000

4705918

Description: Fixed an issue where PTP could converge to an incorrect time/offset and report an inaccurate path delay.

Keywords: PTP

Detected in version:

40.47.1026

Fixed in Release: 

40.48.1000

4657792 / NVbug 5567725

Description: Fixed an issue where, in Flit Mode, the device could become unresponsive when receiving malformed or invalid traffic from a link partner.

Keywords: Flit Mode

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4484662

Description: Fixed an issue where mlxlink reported 0 values for SNR (media and host) due to incorrect local port mapping in firmware and an incorrect page number used by MFT.

Keywords: mlxlink

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4621747 / 4792742 / 4794290 / NVbug 5502241

Description: Fixed an issue where parallel accesses to the MCIA register could return incorrect data. In some hosts running ethtool -m <interface> repeatedly (e.g., once per second), this could intermittently report Identifier: 0x00 (unknown/no module), causing health checks to fail.

Keywords: MCIA register

Detected in version:

40.47.1026

Fixed in Release: 

40.48.1000

4686284 / NVbug 5607036

Description: Implemented IB extended port telemetry counters via the NSM Type 1 Get Port Telemetry Counters command, adding counters 19 and 20: NSM_LINK_ERROR_RECOVERY_COUNTER_CNTR_ID and NSM_LINK_DOWNED_COUNTER_CNTR_ID.

Keywords: IB extended port telemetry counters, NSM

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4758174 / NVbug 5698200

Description: Fixed a rare attestation certificate signature formatting issue by removing an unnecessary leading zero byte in the “r” or “s” value.

Keywords: Attestation certificate signature format

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4532684 / 4635872 / 4794865 / 4794866 / 4794867 / NVbug 5385446

Description: Fixed an issue by improving the ADP-RETX algorithm to avoid re-arming without performing a retransmission.

Keywords: ADP-RETX algorithm

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4542516 / 4554220

Description: Fixed an issue where, in certain Gen6 setups, RDMA READ bidirectional traffic required at least 5 QPs to reach full wire speed.

Keywords: RDMA READ bidirectional traffic

Detected in version:

40.47.1026

Fixed in Release: 

40.48.1000

4554763 / 4808657

Description: Fixed an issue affecting single-process, unidirectional RDMA READ to GPU memory (4 QPs, 128KB messages) by enabling ZERO_TOUCH_TUNING_ENABLE via MLXCONFIG.

Keywords: Zero Touch Tuning, mlxconfig

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4608214

Description: Fixed an issue where probe packets might not be sent under heavy traffic.

Keywords: PCC, ZTR_RTTCC, probe

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4450570 / 4780432 / 4780433

Description: Fixed an issue where the root complex sent MCTP-over-PCI messages before a BDF was assigned, causing responses to be sent with BDF 0. The fix ensures that MCTP messages routed by ID are ignored until a valid BDF is assigned.

Keywords: MCTP-over-PCI, BDF, MCTP messages

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4809134 / 4824635

Description: Fixed an issue where the steering tables were not updated after enabling partial Spectrum-X capabilities (BTH.AR) via LLPD.

Keywords: Steering tables, LLDP

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

4797308 / NVbug 5706024

Description: Fixed an issue where an intX message was sent with a Requester ID of 0, causing an ACS violation at the root port. The fix uses the correct BDF as the Requester ID instead of 0.

Keywords: intX message

Detected in version:

40.47.1026

Fixed in Release:

40.48.1000

Internal Ref.

Issue

4608544

Description: Fixed an issue where, in rare live migration scenarios, a delayed doorbell triggered a false timeout alarm.

Keywords: Live migration, doorbell, timeout alarm

Detected in version:

40.46.1006

Fixed in Release:

40.47.1088

4648642

Description: Fixed a rare issue in which destroying PCC NP configuration objects could result in assert 0x8175 being logged in dmesg.

Keywords: Assert 0x8175, PCC NP

Detected in version:

40.47.1026

Fixed in Release: 

40.47.1088

4655971

Description: Fixed the PCIe counters to correctly report event values in nanoseconds.

Keywords: DOCA Telemetry Diagnostics

Detected in version:

40.47.1026

Fixed in Release: 

40.47.1088

4690503

Description: Fixed an issue where creating a DPA process that uses 128 MB of data caused the dynamic library to fail with syndrome 0xdc30ac. The BSS section of the DPA application is now limited to 64 MB.

Keywords: DPA process, BSS

Detected in version:

40.47.1026

Fixed in Release: 

40.47.1088

Internal Ref.

Issue

4570205

Description: Fixed a firmware issue where the ZTR_RTTCC algorithm parameters AI and HAI did not support a sufficient range.

Keywords: PCC, ZTR_RTTCC

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4629077

Description: Fixed an issue where coalescing regular SX events with SX RTT events under ZTR_RTTCC could keep improper event fields, which could impact congestion control behavior.

Keywords: PCC, ZTR_RTTCC

Detected in version:

40.46.1006

Fixed in Release:

40.47.1026

4683328

Description: Fixed an issue in the ZTR_RTTCC algorithm where probe-abortion handling could behave improperly under high-stress network conditions, ensuring proper congestion control and stable traffic performance.

Keywords: PCC, ZTR_RTTCC

Detected in version:

40.46.1006

Fixed in Release:

40.47.1026

4501554

Description: Fixed an assertion failure that could occur with the E-Switch uplink in specific configurations where the e-switch was disabled and Path Migration was active or GVMIs were using SRQ loopback in SQs. The issue occurred because the firmware attempted to perform cleanup operations when the uplink configuration lacked sufficient capacity.
Now, when the E-Switch is disabled and no actions are available in the uplink STE, the firmware connects to the uplink STE instead of copying it.

Keywords: Path migration, steering

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4506854

Description: Added Scaling Factor "read" field. To obtain correct values in mlxlink, MFT version 4.33.0 or later is required.

Keywords: Scaling Factor, mlxlink, MFT

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4468319

Description: Fixed an issue where the ConnectX-8 downstream port failed to send a NACK when rejecting an L1 entry request from the upstream port.

Keywords: NACK, downstream port

Detected in version:

40.46.1006

Fixed in Release:

40.47.1026

4550782

Description: Fixed an issue on GB200 systems with two symmetrical ConnectX-8 SuperNICs, which caused DPN numbering differences on the HCA upstream port. Legacy drivers accessed with dpn=0,0,0, which could result in attempts to access the wrong DPN node in socket-direct systems.
The firmware now automatically determines the correct pcie_index based on the accessed link in direct-NIC systems.

Keywords: DPN numbering

Detected in version:

40.45.1020

Fixed in Release: 

40.47.1026

4571079

Description: Fixed an issue where invoking the resourcedump tool with segment type DPA_PROCESS_LST returned invalid data when the parameter n1 == 1 and no processes existed on the current vhca_id.
The fix adds a proper check, and the resourcedump tool now reports the correct error in this scenario.

Keywords: DPA PROCESS, RESOURCE DUMP

Detected in version:

40.45.1020

Fixed in Release: 

40.47.1026

4529293

Description: Fixed an issue where, during failover or restart, the SM sending a PortInfo MAD to the HCA firmware triggered reinitialization of port buffers, momentarily halting ingress traffic and causing packet drops.
The firmware now avoids reconfiguring port buffers when the new configuration matches the current one.

Keywords: OpenSM

Detected in version:

40.45.1020

Fixed in Release: 

40.47.1026

4641215

Description: Fixed a rare issue where MFRL operations could fail due to a timeout.

Keywords: MFRL

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4683346

Description: Fixed an issue where, under the ZTR_RTTCC algorithm, a flow that reached its minimum rate due to heavy congestion would not recover its rate once the congestion cleared.

Keywords: PCC, ZTR_RTTCC

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4575692

Description: Fixed an issue where a missing interrupt from the module IO (Expander) could prevent the module from being raised.

Keywords: Module IO (Expander)

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4620765

Description: Fixed an issue where reading debug registers could cause link BER (Bit Error Rate) degradation over time.

Keywords: BER

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4658434

Description: Fixed an issue where ports connected via 4 or 8 lanes and configured for 200G_2x (using only 2 lanes) would fail to link when using a mix of new firmware (with “Non Tx-Squelch” support) and older firmware versions.

Note: Please make sure on both sides, switch (local device) and Ssitch/NIC (peer device) you:

  • Deploy the new firmware release versions as a matched bundle on both Switch and NIC devices.

  • Configure the port to use 2 lanes (instead of 4 or 8 lanes) while keeping the 200G_2x speed setting.

Keywords: Port speed

Detected in version:

40.46.1006

Fixed in Release:

40.47.1026

4401684

Description: Fixed an issue in Arch diagnostic data counters where the pcie_link_outbound_data_bytes counter was incorrectly returning only zero values.

Keywords: Arch diagnostic data counters

Detected in version:

40.45.1020

Fixed in Release:

40.47.1026

4575696

Description: Fixed an issue where multiple long-running process registers could cause aborted access and timeouts, the internal state is now properly handled.

Keywords: ibdiagnet2

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4583940

Description: Fixed an issue where enabling the CCMAD custom header on one PCC probe slot caused other slots to malfunction when multiple slots were configured.

Note: If using firmware versions older than the 40.47.10xx GA release, disable the CCMAD custom header when multiple probe slots are enabled.

Keywords: PCC CCMAD custom header

Detected in version:

40.46.1006

Fixed in Release: 

40.47.1026

4208960

Description: A packet may be parsed incorrectly, if a driver uses the header_length_field_mask when creating a PARSE_GRAPH_NODE object, and the mask value is not composed of continuous bits or does not commence at the least significant bit.

Keywords: PARSE GRAPH NODE, Flex Parser

Detected in version:

40.44.0208

Fixed in Release:

40.47.1026

4610740

Description: Fixed a firmware issue where a CQE error with vendor_syndrome RDE_MAL_WQE (0xd6) could cause traffic disruption on the affected QP.

Keywords: RDMA, transport

Detected in version:

40.45.1020

Fixed in Release:

40.47.1026

Internal Ref.

Issue

4603774

Description: Fixed an issue where the adapter card could drop NC-SI over MCTP commands when padding bytes were present after the NC-SI checksum.

Keywords: NC-SI

Discovered in Version:

40.46.1006

Fixed in Release:

40.46.3048

Internal Ref.

Issue

4286902

Description: Fixed a race condition in DPA process termination during the exception flow, where a failed process could be missed and not reported to the user.

Keywords: DPA

Detected in version:

40.45.1020

Fixed in Release:

40.46.1006

4401109

Description: Fixed an issue where RTTs on IFA1 were not sent when IFA1 and IFA2 were configured in cumulative slots.

Keywords: PCC, multi probe, IFA

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006

4486431

Description: Fixed an issue where issuing multiple parallel queries of DPA_THREAD objects with the same object ID could fail.

Keywords: DPA

Detected in version:

40.45.1020

Fixed in Release:

40.46.1006

 

4443601

Description: Fixed a firmware issue where PXE failed to boot when both LAG ports were up.

Keywords: PXE, LAG

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006

4475307

Description: Fixed an issue where PCC DCQCN used incorrect parameter values when link speed was 400Gbps or higher.

Keywords: PCC DCQCN, congestion control.

Detected in version:

40.45.1020

Fixed in Release:

40.46.1006

4480427

Description: Fixed incorrect calculation of start address and mode for the CQE buffer in DPA CQ, which could cause CQEs to be written to the wrong address when the buffer is not 4K-aligned and spans a second page boundary.

Keywords: CQ, CQE Buffer, DPA

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006

4402022

Description: Fixed an issue where Wake-on-LAN (WoL) may not function correctly on certain multihost configurations.

Keywords: Wake-on-LAN (WoL)

Detected in version:

40.45.1020

Fixed in Release:

40.46.1006

4445479

Description: Added a fixed estimated power value for Hvdd in the INI configuration.

Keywords: Hvdd

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006

4426779

Description: Updated the handling of the PLDM Type-5 'Activate Firmware' command to ensure the update flow does not fail when 'self-contained activation' is requested. Although the 'Self Contained Activation Is Not Supported' completion code will still be returned, the component will now be successfully marked as pending.

Keywords: PLDM

Detected in version:

40.44.1036

Fixed in Release: 

40.46.1006

4318063

Description: When running the PTP4L application, the path delay displays inconsistent values after each fwreset and rerun, resulting in a non-constant PPS offset that fails to meet Class B/C requirements.

Keywords: PTP

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006

4163634

Description: When connecting a Quantum-3 switch system (with a port split into 8 ports) to a ConnectX-8 single port SuperNIC, the link will not be established.

Workaround: Configure the Quantum-3 switch system port to be split into 2 or 4 ports, or set the ConnectX-8 to operate in multiplane mode.

Keywords: Port split, Quantum-3

Detected in version:

40.44.0212

 Fixed in Release: 

40.46.1006

4366117

Description: Configuring a small MTU leads to fragmentation of packets critical for the PXE boot process. As a result, the PXE boot filters mistakenly discard these packets, causing the PXE boot to fail. 

Keywords: PXE boot filters

Detected in version:

40.45.1020

Fixed in Release: 

40.46.1006


Last updated: