DOCA SDK Documentation

NVIDIA BlueField Reset and Reboot Procedures

BlueField System Reboot

This section describes the necessary operations to load new NIC firmware, following NVIDIA® BlueField® NIC firmware update. This procedure deprecates the need for full server power cycle.

The following steps are executed in the BlueField OS:

  1. Issue a query command to ascertain whether BlueField system reboot is supported by your environment: 

    mlxfwreset -d 03:00.0 q 
    

    If the output includes the following lines, proceed to step 2: 

    3: Driver restart and PCI reset                                   -Supported (default)
    ...
    1: Driver is the owner                                            -Supported (default)
    

    If it says Not Supported instead, then proceed to the instructions under section "NVIDIA BlueField Reset and Reboot Procedures | BlueField System level Reset".

  2. Issue a BlueField system reboot:

    mlxfwreset -d 03:00.0 -y -l 3 --sync 1 r
    

BlueField System-level Reset

This section describes the way to perform system-level reset (SLR) which is necessary for firmware configuration changes to take effect.

System-level Reset for BlueField in DPU Mode

The following is the high-level flow of the procedure:

  1. Graceful shutdown of BlueField Arm cores.

  2. Query BlueField state to affirm shutdown reached. 

    In systems with multiple BlueField networking platforms, repeat steps 1 and 2 for all devices before proceeding.

  3. Warm reboot the server.

Step by step process:

Some of the following steps can be performed using different methods, depending on resource availability and support in the user's environment.

  1. Graceful shutdown of BlueField Arm cores.

    This operation is expected to finish within 15 seconds.

    Possible methods:

    • From the BlueField OS: 

      shutdown -h now
      

      Or: 

      mlxfwreset -d /dev/mst/mt*pciconf0 -l 1 -t 4 r
      
    • From the host OS: 

      Not relevant when the BlueField is operating in Zero-Trust Mode.

      mlxfwreset -d <mst-device> -l 1 -t 4 r
      
    • Using the BlueField BMC:

      ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> power soft
      

      Or using Redfish (BlueField-3 and above):

      curl -k -u root:<password> -H "Content-Type: application/json" -X POST https://<bmc_ip>/redfish/v1/Systems/Bluefield/Actions/ComputerSystem.Reset -d '{"ResetType": "GracefulShutdown"}'
      
  2. Query BlueField state.
    Possible methods:

    • From the host OS: 

      Not relevant when the BlueField is operating in Zero-Trust Mode.

      echo DISPLAY_LEVEL 2 > /dev/rshim0/misc
      cat /dev/rshim0/misc
      

      Expected output:

      INFO[BL31]: System Off
      
    • Utilizing the BlueField BMC:

      ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> raw 0x32 0xA3
      

      Expected output: 06.

  3. Warm reboot the server from the host OS:

    mlxfwreset -d <mst-device> -l 4 r
    

    If multiple BlueField devices are present in the host, this command must run only once. In this case, the MST device can be of any of the BlueFields for which the reset is necessary and participated in step 1.

    Or: 

    reboot
    

    For external hosts which do not toggle PERST# in their standard reboot command, use the mlxfwreset option.

System-level Reset for BlueField in NIC Mode

Perform warm reboot of the host OS:

mlxfwreset -d <mst-device> -l 4 r

Or:

reboot

For external hosts which do not toggle PERST# in their standard reboot command, use the mlxfwreset option.

System-level Reset for Host with Separate Power Control

This procedure is a special use case relevant only to host platforms with separate power control for the PCIe slot and CPUs, in which the BlueField (running in DPU mode) is provided power while host OS/CPUs may be in shutdown or similar standby state (this allows the BlueField device to be operational while the host CPU is in shutdown/standby state).

The following is the high-level flow of the procedure:

  1. Graceful shutdown of host OS or similar CPU standby.

  2. Graceful shutdown of BlueField Arm cores.

  3. Query BlueField state to affirm shutdown reached.

  4. Full BlueField Reset

  5. Query BlueField state to affirm operational state reached 

    In systems with multiple BlueField networking platforms, repeat steps 1 through 5 for all devices before proceeding.

  6. Power on the server.  

Step by step process:

Some of the following steps can be performed using different methods, depending on resource availability and support in the user's environment.

  1. Graceful shutdown of host OS by any means preferable.

  2. Graceful shutdown of BlueField Arm cores.

    This step normally takes up to 15 seconds to complete. 

    • From the BlueField OS:

      shutdown -h now
      
    • Utilizing the BlueField BMC:

      • Using IPMI:

        ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> power soft
        
      • Using Redfish (for BlueField-3 and above):

        curl -k -u root:<password> -H "Content-Type: application/json" -X POST https://<bmc_ip>/redfish/v1/Systems/Bluefield/Actions/ComputerSystem.Reset -d '{"ResetType": "GracefulShutdown"}'
        
  3. Query the BlueField's state utilizing the BlueField BMC:

    ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> raw 0x32 0xA3
    

    Expected output: 06.

  4. Perform BlueField hard reset utilizing the BlueField BMC:

    This step takes up to 2 minutes to complete.

    • Using IPMI:

      ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> power cycle
      
    • Using Redfish (for BlueField-3 and above):

      curl -k -u root:<password> -H "Content-Type: application/json" -X POST https://<bmc_ip>/redfish/v1/Systems/Bluefield/Actions/ComputerSystem.Reset -d '{"ResetType" : "PowerCycle"}'
      
  5. Query BlueField operational state utilizing the BlueField BMC:

    At this point, the BlueField is expected to be operational.

    ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P <password> raw 0x32 0xA3
    

    Expected output: 05.

  6. Power on/boot up the host OS.

BlueField DPU Reset Using a BMC Platform 

The BlueField DPU can also be reset from a BMC platform using NC-SI command over I2C. This option is more common in DPU BMC absence, when the BlueField DPU is running in NIC mode or when it is used as a controller.

The reset is performed using the Reset BlueField DPU (Command=0x12, Parameter=0xB), which allows a BMC platform to reset the NVIDIA BlueField DPU device. This command is only applicable to BlueField-2 devices.

The Reset BlueField-3 DPU command is addressed to the package only. When the internal reset is complete, the BMC platform should reconfigure the device.

The Reset BlueField DPU command is supported on BlueField-2 and later devices.

Reset BlueField DPU Format

Bytes/Bits

31:24

23:16

15:8

7:0

0:15

NC-SI Header (OEM Command)

16:19

NVIDIA Manufacture ID (IANA) = 0x8119

20:23

Command rev=0x00

MLNX Cmd ID=0x12

Parameter=0x0B

NICR

Mode

24:27

Checksum 31:0

The parameter descriptions for Reset BlueField DPU command are provided below.

Reset BlueField DPU Parameters

Field

Description

NICR

  • 0 - NIC does not reset. Only the embedded CPU will reset.

  • 1 - Reset the embedded CPU and the NIC

Mode

This field defines the type of conditions to use before performing the internal reset

  • 0 - The internal reset will start after sending the response to this command

  • 1 - The internal reset will start only when all the hosts asserts their PERST# signals low

  • 2 - The internal reset will start only when all the hosts disabled their PCIe links. This may or may not include assertion of their respective PERST# signals low.

  • Other - Reserved

Reset BlueField DPU Response

The ConnectX adapter responds to a Reset BlueField DPU command when the package ID matches, and with no checksum error.

Reset BlueField DPU Response Format

Bytes/Bits

31:24

23:16

15:8

7:0

0:15

NC-SI Header (OEM Command)

16:19

Response Code

Reason Code

20:23

NVIDIA Manufacture ID (IANA) = 0x8119

24:27

Command rev=0x00

MLNX Cmd ID=0x12

Parameter=0x0B

NICR

Mode

28:31

Checksum 31:0

Shutdown BlueField DPU OS (Command=0x12, Parameter=0x1A)

The Shutdown BlueField DPU OS command allows trusted platform bmc to send an OS Shutdown request to the embedded CPU on NVIDIA BlueField DPU Devices.

The Shutdown BlueField DPU OS command format is shown below.

Shutdown BlueField DPU OS Command Format

Bytes/Bits

31:24

23:16

15:8

7:0

0:15

NC-SI Header (OEM Command)

16:19

NVIDIA Manufacture ID (IANA) = 0x8119

20:23

Command rev=0x00

MLNX Cmd ID=0x12

Parameter=0x1A

Reserved

24:27

Reserved

28:31

Checksum 31:0

This command has no input parameters. This command is a package command.

Shutdown BlueField DPU OS Response

NVIDIA BlueField DPU always receive and respond to Shutdown BlueField DPU OS command on supporting devices when the package ID matches. When the Shutdown BlueField DPU OS command is sent to a non-supporting device, or the command is received from an untrusted platform bmc, the command will fail with reason code 0x7FFF (Unsupported command).

Shutdown BlueField DPU OS Response Format

Bytes/Bits

31:24

23:16

15:8

7:0

0:15

NC-SI Header (OEM Command)

16:19

Response Code

Reason Code

20:23

NVIDIA Manufacture ID (IANA) = 0x8119

24:27

Command rev=0x00

MLNX Cmd ID=0x12

Parameter=0x1A

Reserved

28:31

Reserved

32:35

Checksum 31:0

Last updated: