Follow this guide from the source GitHub repo at github.com/NVIDIA/doca-platform and moving to the docs/public/user-guides/zero-trust/use-cases/hbn-snap/README.md for better formatting of the code.
This configuration provides instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Zero Trust mode, utilizing DPU BMC and Redfish. It focuses on provisioning NVIDIA® BlueField®-3 DPUs using DPF, installing the HBN DPUService on those DPUs and enabling SNAP Storage on those DPUs.
Prerequisites
This guide should be run by cloning the repo from github.com/NVIDIA/doca-platform and moving to the docs/public/user-guides/zero-trust/use-cases/hbn-snap directory.
The system is set up as described in the prerequisites.
In addition, for this use case, the Top of Rack switch(ToR) must support BGP and EVPN, and should be configured to support unnumbered BGP towards the two ports of the DPU, where HBN will act as peer, and advertise routes over BGP to allow for ECMP from the DPU. The storage traffic between the DPU and the remote storage system goes through a VXLAN interface. Additional information about the required switch configuration can be found in the Technology Preview for DPF Zero Trust (DPF-ZT) with SNAP DPU Service in virtio-fs mode.
This guide includes examples for both SNAP Block (NVMe) and SNAP VirtioFS Storage. Depending on the storage type you want to deploy, you need to ensure that the following prerequisites are met:
SNAP Block (NVMe) Prerequisites
An remote SPDK target should be set up to provide persistent storage for SNAP Block Storage.
The SPDK target should be reachable from the DPUs.
The management interface of the SPDK target should be reachable from the control plane nodes.
The following tools must be installed on the machine where the commands contained in this guide run:
kubectl
helm
envsubst
Installation Guide
This guide assumes that the setup includes only 2 workers with DPUs. If your setup has more than 2 workers, then you will need to set additional variables to enable the rest of the DPUs.
0. Required Variables
The following variables are required by this guide. A sensible default is provided where it makes sense, but many will be specific to the target infrastructure.
Commands in this guide are run in the same directory that contains this readme.
Environment variables file
## IP Address for the Kubernetes API server of the target cluster on which DPF is installed.
## This should never include a scheme or a port.
## e.g. 10.10.10.10
export TARGETCLUSTER_API_SERVER_HOST=
## Port for the Kubernetes API server of the target cluster on which DPF is installed.
## e.g. 6443
export TARGETCLUSTER_API_SERVER_PORT=
## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not
## allocated by DHCP.
export DPUCLUSTER_VIP=
## Interface on which the DPUCluster load balancer will listen. Should be the management interface of the control plane node.
export DPUCLUSTER_INTERFACE=
## The repository URL for the NVIDIA Helm chart registry.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca
## The repository URL for the HBN container image.
## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository.
export HBN_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_hbn
## The repository URL for the SNAP VFS container image.
## Usually this is the NVIDIA NGC registry. For development purposes, this can be set to a different repository.
export SNAP_NGC_IMAGE_URL=nvcr.io/nvidia/doca/doca_vfs
## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides.
## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
## The DPF TAG is the version of the DPF components which will be deployed in this guide.
export TAG=v26.4.0
## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.4.0-92_26.04_ubuntu-24.04_64k_prod.bfb"
## IP_RANGE_START and IP_RANGE_END
## These define the IP range for DPU discovery via Redfish/BMC interfaces
## Example: If your DPUs have BMC IPs in range 192.168.1.100-110
## export IP_RANGE_START=192.168.1.100
## export IP_RANGE_END=192.168.1.110
export IP_RANGE_START=
export IP_RANGE_END=
# The password used for DPU BMC root login, must be the same for all DPUs
export BMC_ROOT_PASSWORD=
## Serial number of DPUs. If you have more than 2 DPUs, you will need to parameterize the system accordingly and expose
## additional variables.
## All serial numbers must be in lowercase.
export DPU1_SERIAL=
export DPU2_SERIAL=
Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file:
source manifests/00-env-vars/envvars.env
1. DPF Operator Installation
Create DPU BMC shared password secret
In Zero Trust mode, provisioning DPUs requires authentication with Redfish. In order to do that, you must set the same root password to access the BMC for all DPUs DPF is going to manage.
These verification commands may need to be run multiple times to ensure the condition is met.
Verify the DPF Operator installation with:
## Ensure the DPF Operator deployment is available.
kubectl rollout status deployment --namespace dpf-operator-system dpf-operator-controller-manager
## Ensure all pods in the DPF Operator system are ready.
kubectl wait --for=condition=ready --namespace dpf-operator-system pods --all
2. DPF System Installation
This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.
DPUCluster to serve as Kubernetes control plane for DPU nodes
YAML
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUCluster
metadata:
name: dpu-cplane-tenant1
namespace: dpu-cplane-tenant1
spec:
type: kamaji
maxNodes: 1000
clusterEndpoint:
# deploy keepalived instances on the nodes that match the given nodeSelector.
keepalived:
# interface on which keepalived will listen. Should be the oob interface of the control plane node.
interface: $DPUCLUSTER_INTERFACE
# Virtual IP reserved for the DPU Cluster load balancer. Must not be allocatable by DHCP.
vip: $DPUCLUSTER_VIP
# virtualRouterID must be in range [1,255], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host
virtualRouterID: 126
nodeSelector:
node-role.kubernetes.io/control-plane: ""
These verification commands may need to be run multiple times to ensure the condition is met.
Verify the DPF System with:
## Ensure the provisioning and DPUService controller manager deployments are available.
kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager
## Ensure all other deployments in the DPF Operator system are Available.
kubectl rollout status deployment --namespace dpf-operator-system
## Ensure bfb-registry pod is running.
kubectl wait --for=condition=ready --namespace dpf-operator-system pod/bfb-registry --timeout=600s
## Ensure bfb-registry service exists.
kubectl get svc bfb-registry --namespace dpf-operator-system
## Ensure the DPUCluster is ready for nodes to join.
kubectl wait --for=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all
3. DPU Provisioning and Service Installation
In this section, you will provision your DPUs and deploy the required services. You will need to create a DPUDeployment object that defines which DPUServices should be installed on each selected DPU. This provides a flexible way to specify and manage the services that run on your DPUs.
This guide includes examples for both SNAP Block (NVMe) and SNAP VirtioFS Storage. Please refer to the relevant sections below and follow the instructions to deploy the desired storage type.
Storage use-cases set RDMA_SET_NETNS_EXCLUSIVE="no" in the DPUFlavor, putting the DPU in shared RDMA mode. The default SFC NAD (mybrsfc) enables RDMA for SF interfaces, which is not compatible with shared RDMA mode. All services deployed on a DPU provisioned with a storage flavor that use SF interfaces must reference a NAD without RDMA. A custom DPUServiceNAD (mybrsfc-storage) is included in the manifests below for this reason.
host:
enabled: true
config:
targets:
nodes:
# name of the target
- name: spdk-target
# management address
rpcURL: http://10.0.110.25:8000
# type of the target, e.g. nvme-tcp, nvme-rdma
targetType: nvme-rdma
# target service IP
targetAddr: 10.0.124.1
# required parameter, name of the secret that contains connection
# details to access the DPU cluster.
# this secret should be created by the DPUServiceCredentialRequest API.
dpuClusterSecret: spdk-csi-controller-dpu-cluster-credentials
Apply DPU-side Storage Resources
In case more than 1 DPU exists per node, the relevant selector should be applied in the DPUDeployment to select the appropriate DPU. See DPUDeployment - DPUs Configuration to understand more about the selectors.
DPUServiceConfiguration and DPUServiceTemplate for SPDK CSI Controller on DPU
YAML
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
name: spdk-csi-controller-dpu
namespace: dpf-operator-system
spec:
deploymentServiceName: spdk-csi-controller-dpu
upgradePolicy:
applyNodeEffect: false
serviceConfiguration:
helmChart:
values:
dpu:
enabled: true
storageClass:
# the name of the storage class that will be created for spdk-csi,
# this StorageClass name should be used in the StorageVendor settings
name: spdkcsi-sc
# name of the secret that contains credentials for the remote SPDK target,
# content of the secret is injected during CreateVolume request
secretName: spdkcsi-secret
# namespace of the secret with credentials for the remote SPDK target
secretNamespace: dpf-operator-system
rbacRoles:
spdkCsiController:
# the name of the service account for spdk-csi-controller
# this value must be aligned with the value from the DPUServiceCredentialRequest
serviceAccount: spdk-csi-controller-sa
---
apiVersion: v1
kind: Secret
metadata:
name: spdkcsi-secret
namespace: dpf-operator-system
labels:
# this label enables replication of the secret from the host to the dpu cluster
dpu.nvidia.com/image-pull-secret: ""
stringData:
# name field in the "rpcTokens" list should match name of the
# spdk target from DPUService.helmChart.values.host.config.targets.nodes
secret.json: |-
{
"rpcTokens": [
{
"name": "spdk-target",
"username": "exampleuser",
"password": "examplepassword"
}
]
}
Verification
These verification commands may need to be run multiple times to ensure the condition is met.
Note that the DPUService name will have a random suffix. For example, doca-hbn-l2xsl.
Verify the DPU and Service installation with:
## Ensure the BFB is ready
kubectl wait --for=jsonpath='{.status.phase}'=Ready --namespace dpf-operator-system bfb bf-bundle-$TAG --timeout=600s
## Ensure the DPUServices are created and have been reconciled.
kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_hbn-snap
## Ensure the DPUServiceIPAMs have been reconciled
kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all
## Ensure the DPUServiceInterfaces have been reconciled
kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all
## Ensure the DPUServiceChains have been reconciled
kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all
## Ensure the DPUs have the condition Initialized (this may take time)
kubectl wait --for=condition=Initialized --namespace dpf-operator-system dpu --all
Be sure to create DPUVolumeAttachments that are using static PFs before rebooting the hosts. If you reboot the hosts without creating these attachments, the hardware initialization process on the host can be significantly delayed, due to timeouts caused by partially initialized emulated NVMe controllers. Creating the attachments ahead of time ensures that SNAP services can complete initialization of the emulated NVMe controllers right after the DPUs become ready.
Releasing the Node Effect Hold
Since the DPUDeployment is configured with nodeEffect.hold: true, the DPUs will pause at the "Node Effect" phase and wait for external action before proceeding with provisioning. This gives the administrator control over when the node effect is applied.
To check that DPUNodeMaintenance objects have been created and are in the hold state:
kubectl get dpunodemaintenances -n dpf-operator-system
Once you are ready for provisioning to proceed, release the hold by setting the annotation on the DPUNodeMaintenance objects to "false". You can do this per-node or all at once:
After releasing the hold, the DPUs will proceed through the remaining provisioning phases (BFB installation, OS installation, etc.).
Making the DPUs Ready
In order to make the DPUs ready, we will need to manually power cycle the hosts. This operation should be done in the most graceful manner by gracefully shutting down the Host and DPU, powering off the server and then powering it on to avoid corruption. This should happen when the object gives us the signal. The described flow can be automated by the admin depending on the infrastructure. The following verification command may need to be run multiple times to ensure the condition is met.
## Ensure the DPUs are in the Rebooting phase and condition Rebooted is false with WaitingForManualPowerCycleOrReboot reason
kubectl wait --for=jsonpath='{.status.conditions[?(@.type=="Rebooted")].reason}'=WaitingForManualPowerCycleOrReboot --namespace dpf-operator-system dpu --all
For the SNAP Block (NVMe) scenario, you do not need to wait for the host to fully boot after a power cycle before removing the annotation below. On some server platforms, device initialization may be waiting for a long time until the SNAP service on the DPU becomes ready, so it is recommended to remove the annotation immediately after initiating the host power cycle.
host:
enabled: true
config:
# required parameter, name of the secret that contains connection
# details to access the DPU cluster.
# this secret should be created by the DPUServiceCredentialRequest API.
dpuClusterSecret: nfs-csi-controller-dpu-cluster-credentials
Apply DPU-side Storage Resources
In case more than 1 DPU exists per node, the relevant selector should be applied in the DPUDeployment to select the appropriate DPU. See DPUDeployment - DPUs Configuration to understand more about the selectors.
DPUServiceConfiguration and DPUServiceTemplate for NFS CSI Controller on DPU
YAML
---
apiVersion: svc.dpu.nvidia.com/v1alpha1
kind: DPUServiceConfiguration
metadata:
name: nfs-csi-controller-dpu
namespace: dpf-operator-system
spec:
deploymentServiceName: nfs-csi-controller-dpu
upgradePolicy:
applyNodeEffect: false
serviceConfiguration:
helmChart:
values:
dpu:
enabled: true
storageClasses:
# List of storage classes to be created for nfs-csi
# These StorageClass names should be used in the StorageVendor settings
- name: nfs-csi
parameters:
server: 10.0.124.1
share: /srv/nfs/share
rbacRoles:
nfsCsiController:
# the name of the service account for nfs-csi-controller
# this value must be aligned with the value from the DPUServiceCredentialRequest
serviceAccount: nfs-csi-controller-sa
These verification commands may need to be run multiple times to ensure the condition is met.
Note that the DPUService name will have a random suffix. For example, doca-hbn-l2xsl.
Verify the DPU and Service installation with:
## Ensure the BFB is ready
kubectl wait --for=jsonpath='{.status.phase}'=Ready --namespace dpf-operator-system bfb bf-bundle-$TAG --timeout=600s
## Ensure the DPUServices are created and have been reconciled.
kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_hbn-snap
## Ensure the DPUServiceIPAMs have been reconciled
kubectl wait --for=condition=DPUIPAMObjectReconciled --namespace dpf-operator-system dpuserviceipam --all
## Ensure the DPUServiceInterfaces have been reconciled
kubectl wait --for=condition=ServiceInterfaceSetReconciled --namespace dpf-operator-system dpuserviceinterface --all
## Ensure the DPUServiceChains have been reconciled
kubectl wait --for=condition=ServiceChainSetReconciled --namespace dpf-operator-system dpuservicechain --all
## Ensure the DPUs have the condition Initialized (this may take time)
kubectl wait --for=condition=Initialized --namespace dpf-operator-system dpu --all
Since the DPUDeployment is configured with nodeEffect.hold: true, the DPUs will pause at the "Node Effect" phase and wait for external action before proceeding with provisioning. This gives the administrator control over when the node effect is applied.
To check that DPUNodeMaintenance objects have been created and are in the hold state:
kubectl get dpunodemaintenances -n dpf-operator-system
Once you are ready for provisioning to proceed, release the hold by setting the annotation on the DPUNodeMaintenance objects to "false". You can do this per-node or all at once:
After releasing the hold, the DPUs will proceed through the remaining provisioning phases (BFB installation, OS installation, etc.).
Making the DPUs Ready
In order to make the DPUs ready, we will need to manually power cycle the hosts. This operation should be done in the most graceful manner by gracefully shutting down the Host and DPU, powering off the server and then powering it on to avoid corruption. This should happen when the object gives us the signal. The described flow can be automated by the admin depending on the infrastructure. The following verification command may need to be run multiple times to ensure the condition is met.
## Ensure the DPUs are in the Rebooting phase and condition Rebooted is false with WaitingForManualPowerCycleOrReboot reason
kubectl wait --for=jsonpath='{.status.conditions[?(@.type=="Rebooted")].reason}'=WaitingForManualPowerCycleOrReboot --namespace dpf-operator-system dpu --all
Both Block and VirtioFS scenarios can be tested with the same steps.
After the DPUs are provisioned and the rest of the objects are Ready, we can test traffic by assigning an IP to the PF0 on the host for each DPU, and run a simple ping. Although the configuration is enabling both PFs, we focus on the PF0 for testing traffic. Assuming the PF0 is named ens5f0np0:
On the host with DPU with serial number DPU1_SERIAL:
ip link set dev ens5f0np0 up
ip addr add 10.0.121.1/29 dev ens5f0np0
ip route add 10.0.121.0/24 dev ens5f0np0 via 10.0.121.2
On the host with DPU with serial number DPU2_SERIAL:
ip link set dev ens5f0np0 up
ip addr add 10.0.121.9/29 dev ens5f0np0
ip route add 10.0.121.0/24 dev ens5f0np0 via 10.0.121.10
On the host with DPU with serial number DPU1_SERIAL:
$ ping 10.0.121.9 -c3
PING 10.0.121.9 (10.0.121.9) 56(84) bytes of data.
64 bytes from 10.0.121.9: icmp_seq=1 ttl=64 time=0.387 ms
64 bytes from 10.0.121.9: icmp_seq=2 ttl=64 time=0.344 ms
64 bytes from 10.0.121.9: icmp_seq=3 ttl=64 time=0.396 ms
--- 10.0.121.9 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 0.344/0.375/0.396/0.022 ms
Uninstall
This section covers only the DPF related components and not the prerequisites as these must be managed by the admin.
Delete storage resources
Be sure to unmount all volumes from the worker nodes before deleting the DPUVolumeAttachments or the operation may fail. For NVMe attachments, it is recommended to unbind the device from the driver before deleting the DPUVolumeAttachment. echo <pci_address> > /sys/bus/pci/drivers/nvme/unbind
Note: there can be a race condition with deleting the underlying Kamaji cluster which runs the DPU cluster control plane in this guide. If that happens it may be necessary to remove finalizers manually from DPUCluster and Datastore objects.