BlueField Troubleshooting Guide

Software Installation and Upgrade

Preface

Information on how a user can troubleshoot issues installing software on BlueField.

Command Cheat Sheet

Command

Description

cat <filename>bfb > /dev/rshim1/boot

Load software via RShim

echo 'DISPLAY_LEVEL 2' > /dev/rshim0/misc; cat /dev/rshim0/misc

Dump RShim log

bfsbdump

Check lifecycle of the BFB

bfsbverify

Check the signature of the BFB file

mlx-mkbfb -d

Dump the BFB content. If the command returned with errors or displays missing files then make sure you redownload the BFB file or request a new BFB file from NVIDIA.

echo 'SW_RESET 1' > /dev/rshim0/misc

Reset the BlueField

Logging and Counters

N/A

Debug Info Package

N/A

Scenarios

Errors During BlueField Software Install Using BFB

cat: write error: Connection Timed Out

When the BFB installation is interrupted or incomplete, this indicates an unexpected boot event that caused the BlueField to halt.

# cat bf-bundle-2.7.0-40_24.04_ubuntu-22.04_prod.bfb > /dev/rshim1/boot 
cat: write error: Connection timed out

To identify what could have went wrong during the BFB boot, dump the RShim log and identify error message(s) under Log Messages section.

# echo 'DISPLAY_LEVEL 2' > /dev/rshim0/misc 
# cat /dev/rshim0/misc
"ERR[BL1]: PSC error -60" in RShim log

The message ERR[BL1]: PSC error -60 indicates that the BlueField PSC ROM failed to boot the PSC firmware, and the boot for both the BlueField Arm and BlueField PSC is halted.

# cat /dev/rshim1/misc
DISPLAY_LEVEL   2 (0:basic, 1:advanced, 2:log)
BOOT_MODE       0 (0:rshim, 1:emmc, 2:emmc-boot-swap)
BOOT_TIMEOUT    150 (seconds)
DROP_MODE       0 (0:normal, 1:drop)
SW_RESET        0 (1: reset)
DEV_NAME        pcie-0000:65:00.1
DEV_INFO        BlueField-3(Rev 1)
OPN_STR         N/A
---------------------------------------
             Log Messages
---------------------------------------
 ERR[BL1]: PSC error -60
  1. Connect to the BlueField Arm console (refer to SoC Management Interface - Logging and Counters).

  2. Return to the original terminal and re-execute the cat command <or bfb-install>. Monitor the console output in parallel.

"PSC BR_EXIT timeout" Printed Out to the Console

The error message PSC BR_EXIT timeout, printed out to the console, is likely the result of PSC ROM failing to load and authenticate PSC BL1.

Nvidia BlueField-3 rev1 BL1 V1.0
PSC BR_EXIT timeout
  1. Reset the chip and verify its lifecycle. Run echo 'SW_RESET 1' > /dev/rshim0/misc and dump the RShim log echo 'DISPLAY_LEVEL 2' > /dev/rshim0/misc; cat /dev/rshim0/misc.

  2. Identify the log INFO[BL31]: lifecycle GA Secured . Note that the log can display lifecycle other than GA Secured. GA Secured or Secured (development) may be printed.

  3. If the log is not present, wait until BlueField boots up and is ready, then connect to the BlueField Arm console (refer to SoC Management Interface to learn how to connect to the BlueField Arm console).

  4. From BlueField Arm console, run:

    # bfsbdump
     
     BlueField3
    ----------------------
    NV Production        : 1
    Arm Life Cycle       : Secure
    Secure Boot          : Enabled
    Secure Boot Key      : Production
    ...
    


  5. If the Arm Lifecycle is Secure, the Secure Boot is enabled and the Secure Boot Key is Production, then the chip lifecycle is equivalent to GA Secured: Install a BFB file signed with a production key.

  6. If the Arm Lifecycle is Secure, the Secure boot is enabled and the Secure Boot Key is Development, then the chip lifecycle is equivalent to Secured (development): Install a BFB file signed with a development key.

  7. Check the signature of the BFB file using the command bfsbverify and make sure the Root-of-Trust Public Key matches your BlueField Secure Boot Key.

    # bfsbverify --bfb default.bfb --version 2
     
    Verify BFB for BlueField-3 platform
    -----------------------------------
     
    Verify Root-of-Trust Public Key:
      NVIDIA official ROT key (production)
     
    Verify Chain-of-Trust certificates:
      BL2 Content Certificate...Verified OK
      DDR Content Certificate...Verified OK
      Trusted Key Certificate...Verified OK
      BL31 Key Certificate...Verified OK
      Bl31 Content Certificate...Verified OK
      BL32 Key Certificate...Not Found
      BL33 Key Certificate...Verified OK
      Bl33 Content Certificate...Verified OK
     
    Done.
    


    1. If it is not matching, request or download the correct BFB file to install on your BlueField.

    2. Contact NVIDIA Enterprise Support if the BFB RoT Public Key is matching the BlueField Secure Boot Key.

Other PSC Boot Errors Printed Out to Console

These errors are likely the result of a corrupted BFB file. Check the integrity of the BFB file by calculating its md5sum and compare it to the BFB file received from NVIDIA.

It is also possible to dump the BFB content using the command mlx-mkbfb -d. If the command returns with errors or displays missing files, make sure to redownload the BFB file or request a new BFB file from NVIDIA.

Nvidia BlueField-3 rev1 BL1 V1.0
PSC VERIFY_BCT timeout


Nvidia BlueField-3 rev1 BL1 V1.0
Failed to load PSC-BL1


Nvidia BlueField-3 rev1 BL1 V1.0
PSC-BL1 BOOT_MODE_COLD timeout


Nvidia BlueField-3 rev1 BL1 V1.0
Failed to load PSC-FW


Nvidia BlueField-3 rev1 BL1 V1.0
PSC-BL1 MB1_CB_EXIT timeout
Bad Magic Number Error Printed out to console

This error is likely the result of a corrupted BFB file.

Try one of the following solutions:

  • Check the integrity of the BFB file by calculating its md5sum and compare it to the BFB file received from NVIDIA.

  • Dump the BFB content using the command mlx-mkbfb -d:

    Nvidia BlueField-3 rev1 BL1 V1.0
    ERROR:   BlueField boot: bad magic number 0x7475612f
    

    If the command returned with errors or displays missing files, then:

    • Redownload the BFB file; or

    • Request a new BFB file from NVIDIA

"PANIC(BL2): PC" Error in RShim Console

This error is likely caused by a failure in DDR training implemented by the Arm first stage bootloader.

# cat /dev/rshim0/misc 
DISPLAY_LEVEL   2 (0:basic, 1:advanced, 2:log)
BOOT_MODE       1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
BOOT_TIMEOUT    150 (seconds)
DROP_MODE       0 (0:normal, 1:drop)
SW_RESET        0 (1: reset)
DEV_NAME        pcie-lf-0000:b3:00.0
DEV_INFO        BlueField-3(Rev 1)
OPN_STR         N/A
UP_TIME         350(s)
SECURE_NIC_MODE 0 (0:no, 1:yes)
---------------------------------------
             Log Messages
---------------------------------------
 INFO[PSC]: PSC BL1 START
 INFO[BL2]: start
 INFO[BL2]: boot mode (rshim)
 INFO[BL2]: Configuring clocks for Livefish mode
 INFO[BL2]: VDDQ: 1118 mV
 PANIC(BL2): PC = 0x40c7cc
   elr_el1         0x0
   esr_el1         0x0
   far_el1         0x0


PC=0x40c7cc is only an example. It could show any value, not necessarily 0x40c7cc.

To resolve the issue:

  1. Verify whether the BlueField NIC is in LiveFish mode by checking the RShim log:If the message INFO[BL2]: Configuring clocks for Livefish mode appears, then LiveFish mode is enabled. This message should follow INFO[BL2]: boot mode (rshim).If the message in not present in the log, then the BlueField is in functional mode.

  2. If the device is in LiveFish mode, then install the BlueField firmware prior to BFB installation.

  3. If the device is not in LiveFish mode, then check that you have installed the correct BlueField firmware matching your configuration (please refer to Software Installation and Upgrade to learn how to install BlueField NIC firmware).

  4. If the device is not in LiveFish mode and the BlueField firmware is matching the BlueField SKU, then contact NVIDIA Enterprise Support.

"INFO[UEFI]: Var reclaim" in RShim Console

If the variable reclaim operation is performed repeatedly, this could indicate that the UEFI Persistent Variable Store (UPVS) is running out of space.

# cat /dev/rshim0/misc 
DISPLAY_LEVEL   2 (0:basic, 1:advanced, 2:log)
BOOT_MODE       1 (0:rshim, 1:emmc, 2:emmc-boot-swap)
BOOT_TIMEOUT    150 (seconds)
DROP_MODE       0 (0:normal, 1:drop)
SW_RESET        0 (1: reset)
DEV_NAME        pcie-0000:65:00.1
DEV_INFO        BlueField-3(Rev 1)
OPN_STR         N/A
 ---------------------------------------
             Log Messages
---------------------------------------
 INFO[PSC]: PSC BL1 START
 INFO[BL2]: start
 INFO[BL2]: boot mode (rshim)
 INFO[BL2]: VDDQ: 1118 mV
 INFO[BL2]: DDR POST passed
 INFO[BL2]: UEFI loaded
 INFO[BL31]: start
 INFO[BL31]: lifecycle Secured (development)
 INFO[BL31]: VDD: 751 mV
 INFO[BL31]: runtime
 INFO[BL31]: MB ping success
 INFO[UEFI]: eMMC init
 INFO[UEFI]: eMMC probed
 INFO[UEFI]: UPVS valid
 WARN[UEFI]: UPVS full
 INFO[UEFI]: Var reclaim
 INFO[UEFI]: Var reclaim done
 INFO[UEFI]: Var reclaim
 INFO[UEFI]: Var reclaim done
 INFO[UEFI]: Var reclaim
 INFO[UEFI]: Var reclaim done
 INFO[UEFI]: Var reclaim
 INFO[UEFI]: Var reclaim done


Expect the DPU boot to be extremely slowly in this scenario.

  1. Reset the BlueField:

    echo 'SW_RESET 1' > /dev/rshim0/misc
    


  2. Log into the BlueField Arm console.

  3. Wait until you reach the Linux prompt or access into UEFI menu.If you stop at the UEFI menu, you can either clean up the EFI variable store from Device Manager > System Configuration.If the system gets to the Linux prompt, clean up the EFI variables under /sys/firmware/efi/efivars. This can be done by running chattr -i /sys/firmware/efi/efivars/* before running rm -f against any file in /sys/firmware/efi/efivars. It is harmless to delete dump-* variables or any other user variables. However BootXXXX variables deletion if needed, must be performed using efibootmgr command line.Other variable deletion can be performed at your own risk.

Boot Stops at UEFI Menu

The RShim log does not contain any specific error, but the UEFI menu screen is displayed on the BlueField Arm console.

 ---------------------------------------
             Log Messages
---------------------------------------
 INFO[PSC]: PSC BL1 START
 INFO[BL2]: start
 INFO[BL2]: boot mode (rshim)
 INFO[BL2]: VDDQ: 1118 mV
 INFO[BL2]: DDR POST passed
 INFO[BL2]: UEFI loaded
 INFO[BL31]: start
 INFO[BL31]: lifecycle Secured (production)
 INFO[BL31]: VDD: 751 mV
 INFO[BL31]: runtime
 INFO[BL31]: MB ping success
 INFO[UEFI]: eMMC init
 INFO[UEFI]: eMMC probed
 INFO[UEFI]: UPVS valid
 INFO[UEFI]: PCIe enum start
 INFO[UEFI]: PCIe enum end
 INFO[UEFI]: UEFI Secure Boot (enabled)
 INFO[UEFI]: Redfish enabled

image-2024-8-23_16-34-47.png

This indicates that the kernel image inside the BFB file failed to boot.

To troubleshoot this issue, check the status of UEFI secure boot:

  • If UEFI secure boot is enabled (i.e., the message INFO[UEFI]: UEFI Secure Boot (enabled) is present in the RShim log), then check the signature of kernel image inside the BFB file:

    $ mlx-mkbfb -x bf-bundle-2.7.0-40_24.04_ubuntu-22.04_prod.bfb
    $ sbverify -l dump-image-v0 
    signature 1
    image signature issuers:
     - /C=GB/ST=Isle of Man/L=Douglas/O=Canonical Ltd./CN=Canonical Ltd. Master Certificate Authority
    image signature certificates:
     - subject: /C=GB/ST=Isle of Man/O=Canonical Ltd./OU=Secure Boot/CN=Canonical Ltd. Secure Boot Signing (Ubuntu Advantage 2021 v1)
       issuer:  /C=GB/ST=Isle of Man/L=Douglas/O=Canonical Ltd./CN=Canonical Ltd. Master Certificate Authority
    
    • If the signature is present:Reset the BlueField:  echo 'SW_RESET 1' > /dev/rshim0/misc Check the list of the certificates enrolled in the BlueField Arm UEFI db by running mokutil --db from the BlueField Arm console:If the certificate is not displayed, then enroll the certificate before installing the BFB file. Refer to UEFI Secure Boot for details on how to enroll db certificate using Redfish, and/or UEFI menu.If the certificate is displayed, then contact NVIDIA Enterprise Support

    • If the signature is not present, contact NVIDIA Enterprise Support

      It is possible to disable UEFI secure boot and install the BFB file if you do not require UEFI secure boot.


  • If UEFI secure boot is disabled (i.e., the message INFO[UEFI]: UEFI Secure Boot (disabled) is present in the RShim log), then dump the content of the BFB file and check whether Boot image (version 0) is present:

    • If Boot image (version 0) is not present, then you may be using a reduced BFB such as preboot-install.bfb. Download and install a fw-bundle BFB file.

    • If Boot image (version 0) is present, contact NVIDIA Enterprise Support.

      $ mlx-mkbfb -d bf-bundle-2.7.0-40_24.04_ubuntu-22.04_prod.bfb
      ...
        25377280 Boot image (version 0)
       520665088 In-memory filesystem (version 0)
      


UEFI Does Not Boot the BFB Kernel Image

The RShim log does not contain a specific error but the login prompt appears on the BlueField Arm console:

---------------------------------------
             Log Messages
---------------------------------------
 INFO[PSC]: PSC BL1 START
 INFO[BL2]: start
 INFO[BL2]: boot mode (rshim)
 INFO[BL2]: VDDQ: 1118 mV
 INFO[BL2]: DDR POST passed
 INFO[BL2]: UEFI loaded
 INFO[BL31]: start
 INFO[BL31]: lifecycle Secured (production)
 INFO[BL31]: VDD: 751 mV
 INFO[BL31]: runtime
 INFO[BL31]: MB ping success
 INFO[UEFI]: eMMC init
 INFO[UEFI]: eMMC probed
 INFO[UEFI]: UPVS valid
 INFO[UEFI]: PCIe enum start
 INFO[UEFI]: PCIe enum end
 INFO[UEFI]: UEFI Secure Boot (enabled)
 INFO[UEFI]: Redfish enabled
 INFO[UEFI]: DPU-BMC RF credentials found
 INFO[UEFI]: exit Boot Service
 INFO[MISC]: Linux up
 INFO[MISC]: DPU is ready

This indicates that the kernel image inside the BFB file failed to boot so the UEFI defaulted to the first valid boot option.

To troubleshoot this issue:

  1. Check the content of the BFB - verify that Boot image (version 0) is present.

  2. Check if UEFI secure boot is enabled and verify the certificates enrolled in UEFI db and the certificate used for the kernel image signature as explained earlier.

Network Boot (PXE, HTTP boot)

PXE/HTTP Boot Logging

When booting PXE or HTTP manually from the UEFI menu, helpful logging can get cut off due to the UEFI menu clearing the screen. To see the logs and ensure none are missed, dump the console logs into a file and read the log from there or get the BlueField console log dump from the BlueField BMC. For more information about retrieving BlueField console logs from the BMC, refer to the BMC and BlueField Logs page in the NVIDIA BlueField BMC Software User Manual. Alternatively, users may change the boot order so that PXE/HTTP boot is attempted before flash boot automatically and error logs are visible in real time on the console because the UEFI menu is skipped.

It is often helpful to troubleshoot and verify PXE boot before moving to HTTP boot because set up is a little easier and there is generally more UEFI logging available when PXE boot issues occur as opposed to HTTP boot issues.

The following subsections are a few examples of logs that may occur for several common scenarios.

DHCP Server is Not Running
[16:23:46]>>Start PXE over IPv4.
[16:24:45]  PXE-E18: Server response timeout.
TFTP Server is Not Running
[16:35:36]>>Start PXE over IPv4.
[16:35:39]  Station IP address is 192.168.100.2
[16:35:39]
[16:35:39]  Server IP address is 192.168.100.1
[16:35:39]  NBP filename is /shimaa64.efi
[16:35:39]  NBP filesize is 0 Bytes
[16:35:39]  PXE-E99: Unexpected network error.
PXE Boot File Does Not Exist
[16:28:32]>>Start PXE over IPv4.
[16:28:36]  Station IP address is 192.168.100.2
[16:28:36]
[16:28:36]  Server IP address is 192.168.100.1
[16:28:36]  NBP filename is /PXE-TEST.efi
[16:28:36]  NBP filesize is 0 Bytes
[16:28:36]  PXE-E23: Client received TFTP error from server.
Shim Does Not Boot
[18:07:22]>>Start PXE over IPv4.
[18:07:26]  Station IP address is 192.168.100.2
[18:07:26]
[18:07:26]  Server IP address is 192.168.100.1
[18:07:26]  NBP filename is /shimaa64.efi
[18:07:26]  NBP filesize is 980057 Bytes
[18:07:26] Downloading NBP file...
[18:07:27]
[18:07:27]  NBP file downloaded successfully.

This can often happen due to authentication issues with unsupported signatures or SBAT restrictions. It is important that UEFI supports the shim being booted and that the shim supports the version of grub being booted.

Grub Does Not Boot
[17:26:05]>>Start PXE over IPv4.
[17:26:09]  Station IP address is 192.168.100.2
[17:26:09]
[17:26:09]  Server IP address is 192.168.100.1
[17:26:09]  NBP filename is /shimaa64.efi
[17:26:09]  NBP filesize is 980056 Bytes
[17:26:09] Downloading NBP file...
[17:26:09]
[17:26:09]  NBP file downloaded successfully.
[17:26:09]Fetching Netboot Image
[17:26:15]
 Minimal BASH-like line editing is supported. For the first word, TAB
   lists possible command completions. Anywhere else TAB lists possible
  device or file completions.

Grub >

The grub being booted must support network boot. It is common for boot to stop at the grub command line when there are grub issues.

Successful PXE Boot
[16:37:10]>>Start PXE over IPv4.
[16:37:13]  Station IP address is 192.168.100.2
[16:37:13]
[16:37:13]  Server IP address is 192.168.100.1
[16:37:13]  NBP filename is /shimaa64.efi
[16:37:13]  NBP filesize is 980056 Bytes
[16:37:13] Downloading NBP file...
[16:37:14]
[16:37:14]  NBP file downloaded successfully.
[16:37:14]Fetching Netboot Image
[16:37:22]
                             GNU GRUB  version 2.06
...

At this point the GRUB menu should show some boot options which are available based on the GRUB config used for PXE boot.

DHCP Packet Inspection

It can often be helpful to look at the DHCP packets being sent over the network when troubleshooting PXE and HTTP boot issues. The sections below provide some examples for packet inspection using the Linux command line, but Wireshark is also a great alternative if supported.

IPv4

For IPv4 based PXE and HTTP boot, the tool dhcdump can be installed on the DHCP host server and used to quickly parse different DHCP packets and options. The following is an example log taken from a BlueField PXE booting using the tmfifo_net0  interface:

root@bu-lab102:~# dhcpdump -i tmfifo_net0
  TIME: 2024-06-10 10:26:29.980
    IP: 0.0.0.0 (0:1a:ca:ff:ff:1) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
 HTYPE: 1 (Ethernet)
  HLEN: 6
  HOPS: 0
   XID: 22093441
  SECS: 0
 FLAGS: 7f80
CIADDR: 0.0.0.0
YIADDR: 0.0.0.0
SIADDR: 0.0.0.0
GIADDR: 0.0.0.0
CHADDR: 00:1a:ca:ff:ff:01:00:00:00:00:00:00:00:00:00:00
 SNAME: .
 FNAME: .
OPTION:  53 (  1) DHCP message type         1 (DHCPDISCOVER)
OPTION:  57 (  2) Maximum DHCP message size 1472
OPTION:  55 ( 35) Parameter Request List      1 (Subnet mask)
                                              2 (Time offset)
                                              3 (Routers)
                                              4 (Time server)
                                              5 (Name server)
                                              6 (DNS server)
                                             12 (Host name)
                                             13 (Boot file size)
                                             15 (Domainname)
                                             17 (Root path)
                                             18 (Extensions path)
                                             22 (Maximum datagram reassembly size)
                                             23 (Default IP TTL)
                                             28 (Broadcast address)
                                             40 (NIS domain)
                                             41 (NIS servers)
                                             42 (NTP servers)
                                             43 (Vendor specific info)
                                             50 (Request IP address)
                                             51 (IP address leasetime)
                                             54 (Server identifier)
                                             58 (T1)
                                             59 (T2)
                                             60 (Vendor class identifier)
                                             66 (TFTP server name)
                                             67 (Bootfile name)
                                             97 (UUID/GUID)
                                            128 (???)
                                            129 (???)
                                            130 (???)
                                            131 (???)
                                            132 (???)
                                            133 (???)
                                            134 (???)
                                            135 (???)

OPTION:  97 ( 17) UUID/GUID                 009c2debc0368611 ..-..6..
                                            ee8000a088c20ee8 ........
                                            18               .
OPTION:  94 (  3) Client NDI                010300           ...
OPTION:  93 (  2) Client System             000b             ..
OPTION:  60 ( 13) Vendor class identifier   NVIDIA/BF/PXE
OPTION:  43 (131) Vendor specific info      8005424633000081 ..BF3...
                                            30426c7565466965 0BlueFie
                                            6c643a342e382e30 ld:4.8.0
                                            2d322d6765373965 -2-ge79e
                                            3037662d64697274 07f-dirt
                                            7900000000000000 y.......
                                            0000000000000000 ........
                                            008248444f43415f ..HDOCA_
                                            322e352e305f4253 2.5.0_BS
                                            505f342e352e305f P_4.5.0_
                                            5562756e74755f32 Ubuntu_2
                                            322e30342d312e32 2.04-1.2
                                            3032333131303800 0231108.
                                            0000000000000000 ........
                                            0000000000000000 ........
                                            0000000000000000 ........
                                            000000           ...
---------------------------------------------------------------------------

  TIME: 2024-06-10 10:26:29.981
    IP: 192.168.100.1 (0:1a:ca:ff:ff:2) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 2 (BOOTPREPLY)
 HTYPE: 1 (Ethernet)
  HLEN: 6
  HOPS: 0
   XID: 22093441
  SECS: 0
 FLAGS: 7f80
CIADDR: 0.0.0.0
YIADDR: 192.168.100.2
SIADDR: 192.168.100.1
GIADDR: 0.0.0.0
CHADDR: 00:1a:ca:ff:ff:01:00:00:00:00:00:00:00:00:00:00
 SNAME: .
 FNAME: /PXE-TEST.efi.
OPTION:  53 (  1) DHCP message type         2 (DHCPOFFER)
OPTION:  54 (  4) Server identifier         192.168.100.1
OPTION:  51 (  4) IP address leasetime      43200 (12h)
OPTION:   1 (  4) Subnet mask               255.255.255.0

The example shows the DHCP discover packet sent by the client (BlueField) and the offer packet sent by the server as part of the DHCP DORA process (including useful information like the vendor class identifier and vendor-specific information). In this case, the DHCP server has been configured to serve a test file, PXE-TEST.efi, over TFTP and it can be useful to verify DHCP, TFTP, and HTTP server configuration by looking at the packet dump.

An alternative to dhcpdump is to use tcpdump to look at all raw data sent over the network. For DHCP, only ports 67 and 68 need to be monitored:

# Monitor raw DHCP data
tcpdump -i tmfifo_net0 -n -vvv -xx port 67 or 78 

# Convert packets to ASCII
tcpdump -i tmfifo_net0 -n -vvv -A port 67 or 78 
IPv6

The dhcpdump tool does not currently support IPv6, but tcpdump can be used for monitoring raw and ASCII data by filtering on ports 546 and 547:

# Monitor raw DHCP data
tcpdump -i tmfifo_net0 -n -vvv -xx port 546 or 547

# Convert packets to ASCII
tcpdump -i tmfifo_net0 -n -vvv -A port 546 or 547

Failure to Update BlueField Arm Bootloader

EFI Capsule Authentication Failed

If an EFI capsule update is initiated from the OS and the following error appears on the BlueField Arm console:

FmpDxe: EFI Capsule Authentication Failed, Status: Security Violation.

This indicates that the capsule signature verification failed. The system is operating in User Mode, meaning the UEFI Platform Key (PK) is present and Secure Boot enforcement is active. Capsule authentication is strictly enforced in this mode.

Common Causes
  • The capsule file is not signed; or

  • The key used to generate capsule file signature does not match the platform UEFI Secure Boot configuration

Verification Steps

From the BlueField Arm, use one of the following commands to inspect the currently enrolled UEFI db certificates:

  • Using sbkeysync:

    # sbkeysync --verbose --dry-run
    Look for output resembling: 
    Filesystem keystore:
    firmware keys:
      ...
      db:
        ...
        /C=US/ST=MA/L=Westborough/O=NVIDIA Corporation/OU=BlueField Secure Boot/CN=NVIDIA BlueField Secure Boot UEFI db Signing 2021
    ...

  • Using mokutil:

    # mokutil --db | grep -i subject:
    Expected output: 
    Subject: C=US, ST=MA, L=Westborough, O=NVIDIA Corporation, OU=BlueField Secure Boot, CN=NVIDIA BlueField Secure Boot UEFI db Signing 2021

Resolution

If the appropriate certificate is not present, install the correct certificate that matches your platform’s signing authority.

If the certificate is present, the capsule file likely has a bad or mismatched signature. In that case:

  1. Download the correct EFI capsule that matches the BlueField platform and UEFI signing configuration.

  2. Reattempt the update using the validated capsule.

Last updated: