Preface
This page provides information to troubleshoot problems related to installing and operating a BlueField networking platform (DPU/SuperNIC).
Command Cheat Sheet
|
Command |
Description |
|---|---|
|
|
Create |
|
|
Obtain the software versions by executing the |
|
|
Gather the target device information. Execute |
|
|
Inspect the content of the boot image, identify the Root-of-Trust Public Key (ROTPK) and verify Chain-of-Trust (CoT) certificates.
|
|
|
Display RShim log from BlueField Arm's Linux. If a string is provided, it is be added into the RShim log. |
|
|
Gather the target device information. Execute |
|
|
Gather the firmware version by executing |
|
|
Can display the content of the BFB. It also helps checking the integrity of a BFB file. |
Logging and Counters
Unable to Load BL2, BL2R, or PSC Image
The following errors appear in console if images are corrupted or not signed properly:
|
Device |
Error |
|---|---|
|
BlueField |
|
|
BlueField-2 |
|
|
BlueField-3 |
|
.bfb File Cannot Recognize the BlueField Board Type
The BlueField reverts to low core operation and prints the following messages:
***System type can't be determined***
***Booting as a minimal system**
Debug Info Package
Ubuntu Kernel Debug
To install kernel debug symbols, run the following:
# sudo add-apt-repository ppa:canonical-kernel-bluefield/release
# echo "deb https://ppa.launchpadcontent.net/canonical-kernel-bluefield/release/ubuntu/ jammy main/debug" | sudo tee -a /etc/apt/sources.list.d/canonical-kernel-bluefield-ubuntu-release-jammy.list
# sudo apt update
# sudoapt install linux-image-$(uname -r)-dbgsym
E.g.:
# sudo apt-cache policy linux-image-unsigned-5.15.0-1043-bluefield-dbgsym
# sudo apt install linux-image-unsigned-5.15.0-1043-bluefield-dbgsym
Scenarios
Host is Not Available and BlueField is Stuck in Boot
In situations where there is no OOB to the BlueField or the BlueField is stuck in boot, a reset can be sent from the BMC to the BlueField.
BlueField Target is Stuck in UEFI Menu
Upgrade to the latest stable boot partition images, see "How to upgrade the boot partition (ATF & UEFI) without re-installation".
Unable to Burn FW from Host Server
Please verify that you are not in running in isolated mode. Run:
$ sudo mlxprivhost -d /dev/mst/mt41686_pciconf0 q
Current device configurations:
------------------------------
level : PRIVILEGED
...
By default, Bluefield operates in privileged mode. Please refer to "NVIDIA BlueField Modes of Operation" for more information.
Server Unable to Find the BlueField
-
Ensure that the BlueField is placed correctly
-
Make sure the BlueField slot and the BlueField are compatible
-
Install the BlueField in a different PCIe slot
-
Use the drivers that came with the BlueField or download the latest
-
Make sure your motherboard has the latest BIOS
-
Perform a graceful shutdown then power cycle the server
BlueField No Longer Works
-
Reseat the BlueField in its slot or a different slot, if necessary
-
Try using another cable
-
Reinstall the drivers for the network driver files may be damaged or deleted
-
Perform a graceful shutdown then power cycle the server
BlueField stopped working after installing another BFB
-
Try removing and reinstalling all BlueField devices
-
Check that cables are connected properly
-
Make sure your motherboard has the latest BIOS
Link Indicator Light is Off
-
Try another port on the switch
-
Make sure the cable is securely attached
-
Check you are using the proper cables that do not exceed the recommended lengths
-
Verify that your switch and BlueField port are compatible
Link Light is On but No Communication is Established
-
Check that the latest driver is loaded
-
Check that both the BlueField and its link are set to the same speed and duplex settings
DPU is Stuck at Runtime
If the BlueField DPU appears unresponsive or stuck during runtime, follow these steps to collect debug information and trigger diagnostics.
-
Check logs for runtime exceptions.Review the RShim log on the host: sudo journalctl -u rshim Inspect the UART console log for crash traces, kernel panics, or exceptions emitted by the DPU.
-
Use SysRq to trigger diagnostic actions. If the DPU appears frozen, you can use the SysRq interface to trigger diagnostic outputs or a controlled crash for debugging.Enable SysRq functionality. Ensure the host enables SysRq support: echo 1 > /proc/sys/kernel/sysrq (Optional) Enable console output: echo 7 > /proc/sys/kernel/printk You may also use a specific bitmask value in /proc/sys/kernel/sysrq to enable selective SysRq actions.You can send SysRq commands to the DPU via the RShim console device. Example for triggering a system crash (or crashdump, if configured): printf "\x0fc" > /dev/rshim0/console Replace c with other SysRq characters as needed (e.g., t for thread dump, m for memory info). The format is always: printf "\x0f<char>" > /dev/rshimX/console
-
Dump DPU
dmesgoutput via RShim (for BlueField-3 in DPU mode; Kernel<6.0). If the DPU is running Linux and supportsrshimdump mode, you can extract thedmesgbuffer using:This retrieves a debug log from the DPU throughrshim -c -i 0 --bfdump
rshim0. -
RShim command-line options. Use
rshim -c --helpfor full command-line functionality:$ rshim -c --help Usage: rshim [options] OPTIONS: -c, --cmdmode Run in command line mode -g, --get-debug Get debug code -m, --bfdump Dump BlueField dmesg log -r, --reg <addr.[32|64] [value]> Read/write register -s, --set-debug <0 | 1> Enable or disable debug -i, --index <N> Use /dev/rshim<N>/ -h, --help Show help information
Last updated: