DOCA Platform Framework (DPF) Documentation

Zero Trust Advanced Configuration

This section includes advanced configuration and additional information for the Zero Trust use case.

DPU Discovery and DPUNode and DPUDevice Object Creation

DPF provides two approaches for discovering and creating DPU resources:

  1. Automated Discovery: Using DPUDiscovery to automatically scan for DPUs and create DPUDevice and DPUNode resources.

  2. Manual Creation: Manually creating DPUDevice and DPUNode resources for each DPU.

You can choose either approach based on your deployment requirements. Automated discovery is recommended for larger deployments, while manual creation provides more control for smaller or specific configurations.

Automated DPU Discovery

DPUDiscovery enables automatic discovery of DPU devices and nodes by scanning specified IP ranges. This approach automatically creates DPUDevice and DPUNode resources for any discovered DPUs.

1. First, create a YAML file for the DPUDiscovery resource. Let's call it dpudiscovery.yaml:

YAML
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDiscovery
metadata:
  name: dpu-discovery-192.168.1-10
  namespace: dpf-operator-system
spec:
  # Define the IP range to scan
  ipRangeSpec:
    ipRange:
      startIP: "10.0.110.120"    # Replace with your start IP
      endIP: "10.0.110.125"     # Replace with your end IP

  # Optional: Set scan interval
  scanInterval: "3m"
  # Optional: Set number of workers (default is 1 per 255 IPs)
  workers: 1

2. Apply the resource using kubectl:

Bash
kubectl apply -f dpudiscovery.yaml

3. Check the status of the crawler:

Bash
kubectl get dpudiscovery dpu-discovery-192.168.1-10 -o yaml

The DPU discovery will:

  1. Start scanning the specified IP range

  2. Create DPUDevice and DPUNode* resources for any discovered DPUs

  3. Continue scanning at the specified interval

  4. Update its status with the last scan time and found DPUs

You can monitor the discovered DPUs with:

Bash
# List discovered DPU devices
kubectl get dpudevices

# List discovered DPU nodes
kubectl get dpunodes

* DPUDiscovery will skip the creation of a DPUNode if there is an existing one with the spec.dpus field containing the DPUDevices serial number.

Limitations

  • When using autodiscovery for DPUNodes, the created DPUNodes will be named after dpunode-<DPU_SERIAL_NUMBER>. In case the HBN DPUService is used in conjuction with this DPU provisioning mode, the HBN configuration needs to be adjusted to match the discovered nodes accordingly.

Manual DPU Resource Creation

If you prefer to manually create DPU resources or need more control over the creation process, you can create DPUDevice and DPUNode resources manually.

Creating DPUDevice manually

Create a DPUDevice resource for each DPU:

The DPUDevice is immutable, and creating a DPUDevice will not trigger DPU provisioning.

YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUDevice 
metadata:
  name: dpu-device-1
  namespace: dpf-operator-system
spec:
    bmcIp: 10.0.110.122

Creating a DPUNode manually

Create a DPUNode resource for each host that has a DPU:

The .spec.dpus field contains the names of each DPUDevice attached to the node.

YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUNode
metadata:
  labels:
    feature.node.kubernetes.io/dpu-enabled: "true"
  name: worker1
  namespace: dpf-operator-system
spec:
  dpus:
  - name: dpu-device-1
  nodeRebootMethod:
    external: {}

Secure Boot

DPF supports configuring UEFI Secure Boot on DPUs during Zero Trust provisioning. When secureBoot is set in the DPUDeployment (or DPUSet), the controller detects the current hardware state via the BMC and configures it automatically, performing the required ARM force restarts.

For configuration details, mode-specific behavior, and the impact of changing this setting on existing DPUs, see Secure Boot.

For more information on BlueField Secure Boot, see Secure Boot in the NVIDIA documentation.

External Host Reboot

In the Zero Trust scenario, DPF cannot manage the DPU's host machine. During the DPU provisioning process, when the DPU CR reaches the rebooting phase, manual power-cycling is required by the user. The power-cycle operation must be completed within two hours; otherwise, the DPU join cluster's secret will expire, causing DPU CR pending in DPU Cluster Config phase. After the worker node boots up, the provisioning.dpu.nvidia.com/dpunode-external-reboot-required annotation on the DPUNode must be manually removed.

If you use script-based host reboot (nodeRebootMethod.script on the DPUNode) instead of external power cycle, see DPUNode: Script reboot job failures and recovery for how Jobs, DPU phase DPURebooting, and recovery interact.

Last updated: