ConnectX-7 Firmware Release Notes

Bug Fixes History


This section includes history of 3 major releases back. For older releases history, please refer to the relevant firmware versions.


Internal Ref.

Issue

3712016

Description: Fixed an issue that prevented Congestion Control from behaving properly when GRH is used in traffic of an IB cluster.

Keywords: IB Congestion Control, CNP, SL

Discovered in Version:

28.39.1002

Fixed in Release:

28.40.1000

3174038

Description: SPDM requests received while CPLD burn flow is in progress may be answered with incorrect responses.

Keywords: SPDM

Discovered in Version:

28.34.1002

Fixed in Release:

28.40.1000

3110297

Description: When ConnectX-7 adapter card is configured to use the Auto-Negotiation mode, 400G_8x linkup cannot be raised. 

Keywords: 400G_8x, linkup

Discovered in Version:

28.34.4000

Fixed in Release:

28.40.1000

3339818

Description: When performing a stress toggling on a ConnectX-7 adapter card that is connected to the MMA1Z00-NS400 cable and the speed is set to 100G_1x with interleaved FEC, a long linkup time of up to 5 min may occur. 

Keywords: Toggling, MMA1Z00-NS400

Discovered in Version:

28.36.1010

Fixed in Release:

28.40.1000

3339919

Description: 

  • When raising a link using 200G optical cables while connecting a ConnectX-7 to a ConnectX-7, raising a link with width less than the maximum provided by the cable with speed 25G lane is not supported.

  • When raising a link using 400G optical cables while connecting a ConnectX-7 to a ConnectX-7, raising a link with width less than the maximum provided by the cable with speed 50G or 25G lane is not supported.

Keywords: Link up speed

Discovered in Version:

28.36.1010

Fixed in Release:

28.40.1000

3312483

Description: WoL packets may not working properly if sent to Unicast destination MAC.

Keywords: WoL packets, Unicast destination MAC

Discovered in Version:

28.36.1010

Fixed in Release:

28.40.1000

3275394

Description: When performing PCIe link secondary-bus-reset, disable/enable or mlxfwreset on AMD based Genoa systems, the device takes longer then expected to link up, due to a PCIe receiver termination misconfiguration.

Keywords: PCIe

Discovered in Version:

28.37.1014

Fixed in Release:

28.40.1000

3457472

Description: Disabling the Relaxed Ordered (RO) capability (relaxed_ordering_read_pci_enabled=0) using the vhca_resource_manager is currently not functional. 

Keywords: Relaxed Ordered

Discovered in Version:

28.37.1014

Fixed in Release:

28.40.1000

3606136

Description: In rare cases, linkup time of NDR and NDR200 with MMA4Z00-NS400 may take longer than 60 seconds.

Keywords: Cables, NDR, NDR200, linkup time

Discovered in Version:

28.39.1002

Fixed in Release:

28.40.1000

3683068

Description: Added back the Digital Feedforward Equalizer (DFFE) hardware component to improve the signal integrity link.

Keywords: Digital Feedforward Equalizer (DFFE)

Discovered in Version:

28.38.1002

Fixed in Release:

28.40.1000

3708035

Description: Fixed an issue with Selective-Repeat configuration which occasionally caused retransmission to wait for timeout instead of out-of-sequence NACK.

Keywords: RoCE, SR

Discovered in Version:

28.38.1002

Fixed in Release:

28.40.1000

3695219

Description: Enabled the lowest minimum rate for SW DCQCN to enable congestion control to hold a larger amount of QPs without pauses or drops.

Keywords: Congestion control, PCC, DCQCN

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.40.1000

3637429

Description: Fixed an issue that caused the secondary ASIC run module init to fail due to missing condition.

Keywords: Secondary device, EEPROM

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.40.1000

3693945

Description: Fixed an issue that kept the adapter cards' quad ports UP when using breakout cables / QSFP-split-4. Now when a 4 alignment loss is noticed, the link in 25G/lane Ethernet is dropped.

Keywords: Quad ports, link up, breakout cables / QSFP-split-4

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.40.1000

3607329

Description: Modified PCIe switch downstream port EQLZ.PH1 timing to 3ms.

Keywords: PCIe switch downstream port

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.40.1000

3617606

Description: Fixed a rare race condition in NODNIC teardown that caused commands to hang on regular PF. 

Keywords: NODNIC teardown

Discovered in Version: 

28.36.1010

Fixed in Release: 

28.40.1000



Internal Ref.

Issue

3652874

Description: Fixed firmware measurements calculation.

Keywords: Firmware measurements calculation

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3664415




Description: Fixed an issue that caused Live Migration to hang during the "save" stage.

Keywords: Live migration

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3629353




Description: Fixed the cr_space in port configuration to prevent wrong timestamp of cqes.  

Keywords: Hardware timestamp

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3582559




Description: Added support for LED scheme #2 to MCX750500B-0D0K / MCX750500B-0D00 adapter cards. 

Keywords: LED

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3669258




Description: Fixed a rare issue that prevented changes in mlxconfig from taking effect upon warm reboot.

Keywords: mlxconfig

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3670719 / 3676590




Description: Added a small delay after the power up process to fix an issue that occasionally caused the module to be unstable after the power up.

Keywords: Link up

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3629562




Description: Fixed a code mismatch in the process of handling the cause to the link being down when the remote faults were received.

Keywords: Link down

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3532508




Description: Fixed a wrong parameter in the cable info MAD that resulted in unnecessary messages in the log. 

Keywords: Cable info MAD

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3634350



Description: Disabled PCI power event messages on OCP 3.0 adapter cards according to the spec requirements.

Keywords: PCI, OCP 3.0

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3636714




Description: Fixed an issue that caused the buffer for PLDM firmware update that were pending NIC requests to not being properly locked in case of PLDM-over-NC-SI, and consequently being corrupted by other flows.

Keywords: PLDM, buffer

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3592276




Description: Fixed an issue that prevent MSI Interrupts from being advertised correctly, resulting in the wrong MSI being sent.

Keywords: MSI

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3605363



Description: "Get Temperature" OEM command now always returns a unified temperature.

Keywords: Temperature

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3531972



Description: Changed the bar configuration algorithm so that the last update to the bar address will be the one that takes affect when the host configures the same bar address for two different PFs.

Keywords: Network Interface

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3626872



Description: Fixed an issue that caused the firmware to miscalculate the value of the maximum current temperature measured from all the diodes (found in the Internal_sensor_curr_temp field). 

Keywords: Sensor, temperature

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3544340 / 3537706 / 3639178



Description: Improved SPDM v1.0 compatibility. SPDM measurements signature additional fixes.

Keywords: SPDM

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3587821




Description: Fixed a HW bug that resulted in transaction loss that when cache replacement transaction occurs in parallel to code transcoding.

Keywords: HW bug, transaction loss

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3610861




Description: The eeprom module gets stuck in polling in 20% of the times after reset. To resolve the issue, a delay after config module to high power was added.

Keywords: Polling, module, reset

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3507928



Description: Fixed a linkup failure issue that occurred when connecting to a 25GbE transceiver by clearing the PSI Aging before trying to open Tx power.

Keywords: Cables, PSI Aging, 25GbE transceiver

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3602379




Description: The "Bad Signal Integrity" message seen after power cycle can be safely ignored. The user should monitor BER number.

Keywords: Bad Signal Integrity, BER

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3605686



Description: Fixed a statics issue that caused the i2c access to module to lock and stuck the switch. 

Keywords: i2c, switch

Discovered in Version: 

28.38.1900

Fixed in Release: 

28.39.2048

3482251



Description: Added support for hairpin drop counter in  QUERY_VNIC_ENV command. 

Keywords: Hairpin

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3539437



Description: Fixed an issue that prevented  the get_func_num_from_pci_func_num function from returning  the value "-1" for undefined function type.

Keywords: get_func_num_from_pci_func_num

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3570478



Description: Fixed Signal-to-Noise Ratio (SNR) value calculation for correct readings from the MMA4Z00 optical cable module.

Keywords: SNR

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3602169



Description: Added a locking mechanism to protect the firmware from a race condition between insertion and deletion of the same rule in parallel. Such behavior occasionally resulted in firmware accessing a memory that has already been released, thus causing IOMMU / translation error.

Note: This fix will not impact insertion rate for tables owned by SW steering.

Keywords: Firmware steering

Discovered in Version: 

28.38.1002

Fixed in Release: 

28.39.2048

3588515 / 3409806




Description: Fixed a race condition that led to a firmware assert upon driver removal, or when changing the ETH flow control scheme in case of a stress of larger than MTU ingress packets.  

Keywords: Race condition, firmware assert

Discovered in Version:

28.38.1002

Fixed in Release:

28.39.2048

3610169



Description: Fixed QoS Shaper handling behavior for non-transmitting applications.

Keywords: QoS Shaper

Discovered in Version:

28.38.1002

Fixed in Release:

28.39.2048


Internal Ref.

Issue

3537571

Description: Fixed SPDM measurements signature.

Keywords: SPDM

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3439757



Description: Fixed an issue that prevented the system from detecting the PCIe device during slot DC power cycle tests.

Keywords: PCIe device, DC power cycle tests

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3534473



Description: Added a new field/slot ID to PRS  pcie_cfg_data.pci_cfg_space.pciex.pcie_switch_ini_defined_base_slot_id = 3 to define a specific slot number for GPU bridge DSP.

Keywords: Slot ID

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3331179



Description: Improved token calculation.

Keywords: Token calculation

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3299420



Description: Upgrading from firmware v28.38.1014 and below to v

28.38.1002

no longer requires an upgrade to an intermediate version.

Keywords: Firmware upgrade

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3394841



Description: Updated the plug in/out events' reporting method to report only when the last recorded event is the opposite of the current event.

Keywords: Port events

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3469311



Description: Fixed the SPDM operations order according to the spec. v1.1.0.

Keywords: SPDM operations

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3527987



Description: Added support for NC-SI channel on both ports.

Keywords: NC-SI channel

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3459317



Description: Changed the protection mechanism for BAR configuration.

Keywords: BAR configuration

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3345150



Description: Fixed an issue that caused a packet with invalid/bad padcount to be silently dropped instead of sending a bad nack error.

Keywords: Packet drop

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3418627



Description: Fixed wrong credits configuration that occurred when MAX_ACC_OUT_READ was configured.

Keywords: Performance

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3466088



Description: Update the SX root to work with driverless mode in vport0 gvmi teardown.

Keywords: Driverless mode

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3487313



Description: Fixed a a rare deadlock case between 2 DC packets in the RX side.

Keywords: Firmware deadlock

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3495889



Description: Fixed a QoS host port rate limit shaper inaccuracy that occurred when the shaper was configured via the QSHR access register.

Keywords: Port rate limit shaper

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

3449451



Description: When using ConnectX-7 adapter card as InfiniBand, the port must be configured to use the Auto-Negotiation mode.

Keywords: Auto-Negotiation, InfiniBand

Discovered in Version: 

28.37.1014

Fixed in Release: 

28.38.1002

Internal Ref.

Issue

3272599






Description: Removed the option to clear "Tx disable cap" for all non-baseT SFP modules.

Keywords: Tx disable cap

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3339087




Description: Added a split mask verification process to check whether or not a module is split in HCA. 

Keywords: Cables, split module

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3411270




Description: Fixed an issue that resulted in firmware crash when setting large payload length values (more than ~1500) in NC-SI command's header.

Keywords: NC-SI

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3405790




Description: Fixed an issue that resulted in the interface type being shown as "unsupported" in CMIS modules.

Keywords: CMIS

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3418889




Description: Updated the NEGOTIATE_ALGORITHMS response according to the SPDM specification.

Keywords: SPDM

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3409686




Description: Added the option to clear the DPC registers after warm reboot.

Keywords: DPC

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3411116




Description: Fixed the configuration of the TS1s sent by the DownStream port (DSP) when moving to EQLZ.ph2.

Keywords: DSP

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3138665




Description: Changed the initial Tx preset configuration for the DownStream port (DSP).

Keywords: Tx, DSP

Discovered in Version:

28.36.1010

Fixed in Release: 

28.37.1014

3138665



Description: PLDM firmware update process fails in case 1304 bytes chunk size is chosen.

Keywords: PLDM firmware update

Discovered in Version:

28.34.4000

Fixed in Release: 

28.37.1014

3336619



Description:  Fixed an issues that occurred during secure firmware update when decrypting and authenticating each chunk of data using its authentication tag. The issue appeared when the main code chunk was split between the user chunks and any GCM operation (e.g., flash read with decryption). This GCM operation broke the GCM context for main chunk authentication and therefore failed. 

Keywords: Secure firmware update, GCM, code chunk

Discovered in Version: 

28.36.1010

Fixed in Release: 

28.37.1014

3327847



Description: CNP received, handled, and ignored counters in the hardware counters cannot work after moving to Programmable Congestion Control mode.

Keywords: CNP, Programmable Congestion Control

Discovered in Version: 

28.36.1010

Fixed in Release: 

28.37.1014

3336610




Description: Fixed a rare issue that prevented the hardware from handling an error flow that occurred when accessing the DPA cluster L2 cache from the firmware processor. In this case the firmware processor hardware requested a VA=>PA translation from the internal mmio, and the address translation was broken by the mmio on the 4K page boundary.

Keywords: Error handling, mmio, firmware processor

Discovered in Version: 

28.36.1010

Fixed in Release: 

28.37.1014

3073517



Description: When connecting a ConnectX-7 adapter card to a ConnectX-5 or an NVIDIA Spectrum switch and trying to raise 10G/40G over 100G optics cable is not supported.

Keywords: Optical cables, ConnectX-5, NVIDIA Spectrum

Discovered in Version: 

28.33.4030

Fixed in Release: 

28.37.1014

3358994



Description: Fixed an issue that prevented the hardware from consuming Port-VL and credits, which consequently blocked traffic from being transmitted due to a race condition between the firmware and the hardware when accessing the chip memory (CR space).

Keywords: Firmware race, CR space, Port-VL

Discovered in Version: 

28.36.1010

Fixed in Release: 

28.37.1014



Last updated: