NVIDIA BlueField BSP

BlueField Boot Flow Configuration

Serialized DPU and Host Boot Sequence

Supported for NVIDIA BlueField-3 devices onward.

Following a server power cycle, both the DPU and Host OSes begin boot at the same moment. On some servers, Host OS may complete boot before DPU services are fully operational.

By using serialized DPU and Host boot as described below, the boot flow can be serialized between DPU and Host. On every server power cycle, DPU initiates its boot sequence while holding Host OS from booting during its BIOS execution phase, until BlueField Arm OS is up. Once DPU Arm-OS is up, Host OS is released to start loading.

This feature is disabled by default.

Prerequisite

This feature relies on the use of UEFI with expansion ROM enabled (by default it is enabled). Please verify the following:

  • If you are using a host with Arm CPU:Make sure EXP_ROM_UEFI_ARM_ENABLE is set to True (1) 

  • If you are using a host with x86 CPU:Make sure EXP_ROM_UEFI_x86_ENABLE is set to True (1) 

Enabling Serialized DPU and Host Boot

From host or BlueField Arm OS console, run:

$ sudo mlxconfig -d /dev/mst/<device> s DELAY_HOST_OS_INIT=1

For this configuration to take effect, BlueField reset should be applied. 

Disabling Serialized DPU and Host Boot

From host or BlueField Arm OS console, run:

$ sudo mlxconfig -d /dev/mst/<device> s DELAY_HOST_OS_INIT=2

For this configuration to take effect, BlueField reset should be applied. 

Extended Serialized DPU and Host Boot

This mode is enabled via the DELAY_HOST_OS_INIT user option ENABLE_USER (3) and strictly serializes the boot flow between the DPU and the host on a per-port basis.

To enable this mode, run the following mlxconfig command on the host:

$ sudo mlxconfig -d /dev/mst/<device> s DELAY_HOST_OS_INIT=3

Boot sequence:

  1. Following a server power cycle, the DPU initiates its boot sequence.

  2. The DPU holds the corresponding host port attached to the BlueField device in the BIOS execution phase for up to 20 minutes while the DPU Arm OS initializes. During this interval, the host port is paused and does not proceed to the OS boot phase.

  3. Once the DPU Arm OS for that specific port is reported as "up", the DPU releases the host port to continue its sequence and load the host OS.

Timeout Fallback

If the DPU does not become ready within the 20-minute timeout window, the host port is automatically released according to platform-defined fallback behavior. This prevents the host system from being permanently blocked by a non-responsive DPU.

Last updated: