DOCA Platform Framework

DPU Provisioning via Redfish API

DPF supports managing DPUs through Out-of-Band (Redfish) management.

Prerequisites

The following requirements must be satisfied by the DPU to be managed via Redfish:

  • The BMC firmware version of DPU must be 24.10 or higher

  • The BMC of DPU must be reset to factory defaults before installing DPF

  • The DPU OOB interface must be connected with DPF control plane

Note: DOCA Perftest Bootstrap provides Ansible tasks for batch upgrading BMC and resetting BMC to factory defaults.

DPF System Installation

Follow the installation steps to install the DPF system.

DPF Operator Configuration

To enable provisioning via the Redfish interface, apply the following DPFOperatorConfig:

YAML
---
apiVersion: operator.dpu.nvidia.com/v1alpha1 
kind: DPFOperatorConfig
metadata:
  name: dpfoperatorconfig
  namespace: dpf-operator-system
  labels:
    app.kubernetes.io/name: dpf-operator
    app.kubernetes.io/instance: dpf-operator
spec:
  provisioningController:
    bfbPVCName: "bfb-pvc"
    installInterface:
      installViaRedfish:
        # Set this to the IP of one of your control plane nodes + 8080 port
        bfbRegistryAddress: "192.168.49.2:8080"
  kamajiClusterManager:
    disable: false

Credentials

To authenticate with Redfish, provide a password for the BMC root user:

Note: Refer to the BlueField DPU Administrator Quick Start Guide for BMC password constraints.

Create the BMC password secret:

Bash
kubectl create secret generic -n dpf-operator-system bmc-shared-password --from-literal=password='ROOT_BMC_PASSWORD'

During the DPU provisioning process, DPF will update the passwords of all DPUs according to the provided credential. Note that the credential cannot be modified after creation.

Create DPU Device

Create a DPUDevice resource for each DPU:

Note: The DPUDevice is immutable, and creating a DPUDevice will not trigger DPU provisioning.

YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice 
metadata:
  name: dpu-device-1
  namespace: dpf-operator-system
spec:
    bmcIp: 10.0.110.122

Create DPU Node

Create a DPUNode resource for each host that has a DPU:

Note: The .spec.dpus field contains the names of each DPUDevice attached to the node. Currently, DPF only supports setting a single DPU for each DPUNode.

YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
  labels:
    feature.node.kubernetes.io/dpu-enabled: "true"
    feature.node.kubernetes.io/dpu-oob-bridge-configured: ""
  name: worker1
  namespace: dpf-operator-system
spec:
  dpus:
  - name: dpu-device-1
  nodeRebootMethod:
    external: {}

DPU Provisioning

Use DPUSet to deploy DPUs, refer DPUSet for more detail. Example configuration:

YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUSet
metadata:
  name: dpuset
  namespace: dpf-operator-system
spec:
  dpuNodeSelector:
    matchLabels:
      feature.node.kubernetes.io/dpu-enabled: "true"
  strategy:
    rollingUpdate:
      maxUnavailable: "10%"
    type: RollingUpdate
  dpuTemplate:
    spec:
      dpuFlavor: dpf-provisioning-hbn-ovn
      bfb:
        name: bf-bundle-new
      nodeEffect:
        noEffect: true

External Host Reboot

In the Redfish scenario, DPF cannot manage the DPU's host machine. During the DPU provisioning process, when the DPU CR reaches the rebooting phase, manual power-cycling is required by the user. The power-cycle operation must be completed within two hours; otherwise, the DPU join cluster's secret will expire, causing DPU CR pending in DPU Cluster Config phase. After the worker node boots up, the provisioning.dpu.nvidia.com/dpunode-external-reboot-required annotation on the DPUNode must be manually removed.

Deletion and clean up

Follow the Deletion and clean up steps to uninstall the DPF system.

Last updated: