NVIDIA ConnectX-4 Lx Ethernet Adapter Cards for OCP 2.0 User Manual

Troubleshooting

General Troubleshooting

Server unable to find the adapter

  • Ensure that the adapter is placed correctly

  • Make sure the adapter slot and the adapter are compatible Install the adapter in a different PCI Express slot

  • Use the drivers that came with the adapter or download the latest

  • Make sure your motherboard has the latest BIOS

  • Try to reboot the server

The adapter no longer works

  • Reseat the adapter in its slot or a different slot, if necessary

  • Try using another cable

  • Reinstall the drivers for the network driver files may be damaged or deleted

  • Reboot the server

Adapters stopped working after installing another adapter

  • Try removing and re-installing all adapters

  • Check that cables are connected properly

  • Make sure your motherboard has the latest BIOS

Link indicator light is off

  • Try another port on the switch

  • Make sure the cable is securely attached

  • Check you are using the proper cables that do not exceed the recommended lengths

  • Verify that your switch and adapter port are compatible

Link light is on, but with no communication established

  • Check that the latest driver is loaded

  • Check that both the adapter and its link are set to the same speed and duplex settings

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="4b523f41-8656-419c-88f1-aeb55af8366a"><ac:plain-text-body><![CDATA[

Event message received of insufficient power














  • When [ adapter's current power consumption ] >  [ PCIe slot advertised power limit ] – a warning message appears in the server's system even logs (Eg. dmesg: "Detected insufficient power on the PCIe slow")
    ]]></ac:plain-text-body></ac:structured-macro>














  • It's recommended to use a PCIe slot that can supply enough power.
    <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b5759ba8-9228-4a24-bab6-93f3a0051d77"><ac:plain-text-body><![CDATA></ac:plain-text-body></ac:structured-macro>

  • If the message remains – please consider switching from Active Optical Cable (AOC) or transceiver to Direct Attached Copper (DAC) connectivity.


Linux Troubleshooting

Environment Information

cat /etc/issue uname -a cat /proc/cupinfo

grep 'model name'

uniq ofed_info -s ifconfig -a ip link show ethtool <interface> ethtool -i <interface_of_Mellanox_port_num> ibdev2netdev

Card Detection

lspci

grep -i Mellanox

Mellanox Firmware Tool (MFT)

Download and install MFT: MFT Documentation Refer to the User Manual for installation instructions. Once installed, run: mst start mst status flint -d <mst_device> q

Ports Information

ibstat ibv_devinfo

Firmware Version Upgrade

To download the latest firmware version, refer to the NVIDIA Update and Query Utility.

Collect Log File

cat /var/log/messages dmesg >> system.log journalctl (Applicable on new operating systems) cat /var/log/syslog

Windows Troubleshooting

Environment Information

From the Windows desktop choose the Start menu and run: msinfo32 To export system information to a text file, choose the Export option from the File menu. Assign a file name and save.

Mellanox Firmware Tool (MFT)

Download and install MFT: MFT Documentation Refer to the User Manual for installation instructions. Once installed, open a CMD window and run: WinMFT mst start mst status flint –d <mst_device> q

Ports Information

vstat

Firmware Version Upgrade

Download the latest firmware version using the PSID/board ID from here. flint –d <mst_device> –i <firmware_bin_file> b

Collect Log File

  • Event log viewer

  • MST device logs:
    mst start mst status

  • flint –d <mst_device> dc > dump_configuration.log

  • mstdump <mst_device> dc > mstdump.log

Last updated: