Networking Solutions

Technology Preview for DPF Zero Trust (DPF-ZT) with SNAP DPU Service in virtio-fs mode




Scope

This Technology Preview (TP) guide offers comprehensive instructions for deploying the NVIDIA DOCA Platform Framework (DPF) on high-performance, bare-metal infrastructure in Zero-Trust mode. The guide focuses on deploying the NVIDIA DOCA SNAP service in virtio-fs mode on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.

This guide is designed for experienced system administrators, system engineers, and solution architects looking to provision highly secure bare-metal environments with backed by NFS storage. We will take full advantage of NVIDIA DPU hardware acceleration and offload capabilities, maximizing datacenter workload efficiency and performance.


  • This reference implementation, as the name implies, is a specific, opiniated deployment example designed to address the usecase described above. 

  • While other approaches may exist to implement similar solutions, this document provides a detailed guide for this particular method.


Abbreviations and Acronyms

Term

Definition

Term

Definition

BFB

BlueField Bootstream

NFS

Network File System

BGP

Border Gateway Protocol

PVC

Persistent Volume Claim

CNI

Container Network Interface

RDG

Reference Deployment Guide

CRD

Custom Resource Definition

RDMA

Remote Direct Memory Access

CSI

Container Storage Interface

SF

Scalable Function

DOCA

Data Center Infrastructure-on-a-Chip Architecture

SFC

Service Function Chaining

DOCA SNAP

NVIDIA® DOCA™ Storage-Defined Network Accelerated Processing

SR-IOV

Single Root Input/Output Virtualization

DPF

DOCA Platform Framework

TOR

Top of Rack

DPU

Data Processing Unit

VF

Virtual Function

HBN

Host Based Networking

VLAN

Virtual LAN (Local Area Network)

IPAM

IP Address Management

VRR

Virtual Router Redundancy

K8S

Kubernetes

VTEP

Virtual Tunnel End Point

MAAS

Metal as a Service

VXLAN

Virtual Extensible LAN

Introduction

The NVIDIA BlueField-3 Data Processing Unit is a powerful infrastructure compute platform designed for high-speed processing of software-defined networking, storage, and cybersecurity. With a capacity of 400 Gb/s, BlueField-3 combines robust computing, high-speed networking, and extensive programmability to deliver hardware-accelerated, software-defined solutions for demanding workloads.

Deploying and managing DPUs and their associated DOCA services, especially at scale, can be quite challenging. Without a proper provisioning and orchestration system, handling the DPU lifecycle and configuring DOCA services place a heavy operational burden on system administrators. The NVIDIA DOCA Platform Foundation addresses this challenge by streamlining and automating the lifecycle management of DOCA services.

NVIDIA DOCA unlocks the full potential of the BlueField platform, enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads. One such example is NVIDIA DOCA SNAP, a DPU storage service that is designed to accelerate and optimize storage protocol by leveraging the capabilities of NVIDIA's BlueField DPUs. NVIDIA DOCA SNAP technology encompasses a family of services that enable hardware-accelerated virtualization of local storage running on NVIDIA BlueField products. The DOCA SNAP service present NFS based networked storage as local volume to the host, emulated by DOCA SNAP service on the DPU.  At its core, DOCA SNAP enables high-performance, low-latency access to storage by allowing applications to interact directly with raw remote file system volume - bypassing traditional filesystem overhead. As part of the DPF deployment model, the DOCA SNAP solution is composed of multiple functional components packaged into containers, which are deployed across both the x86 Kubernetes management and DPU Kubernetes clusters.

This reference implementation leverages open-source components, and provides an end-to-end walkthrough of the deployment process, including:

  • Infrastructure provisioning with MAAS

  • Integration with NVIDIA’s DPF

  • Deployment and orchestration of DPU-based services inside the Kubernetes cluster

  • Configuration of BlueField devices with enabled NFS(viroi-fs) emulation for DOCA SNAP service

  • Management of DPU resources and workloads using Kubernetes-native constructs

This guide provides a comprehensive, practical example of installing the DPF system with NVIDIA DOCA SNAP service on a Kubernetes cluster according to the "Storage Development Guide".



In our guide we used the NFS server as an example of storage backend service.
This storage backend service is used only for demonstration purposes and is not intended or supported for production usecases.

References


Solution Architecture

Key Components and Technologies

  • NVIDIA BlueField® Data Processing Unit (DPU)
    The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

  • NVIDIA DOCA Software Framework
    NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

  • NVIDIA ConnectX SmartNICs
    10/25/40/50/100/200 and 400G Ethernet Network Adapters
    The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
    NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

  • NVIDIA LinkX Cables 
    The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

  • NVIDIA Spectrum Ethernet Switches
    Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
    Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects. 
    NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® LinuxSONiC and NVIDIA Onyx®.

  • NVIDIA Cumulus Linux 
    NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

  • Kubernetes
    Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

  • Kubespray 
    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:A highly available clusterComposable attributesSupport for most popular Linux distributions

  • RDMA 
    RDMA is a technology that allows computers in a network to exchange data without involving the processor, cache or operating system of either computer.
    Like locally based DMA, RDMA improves throughput and performance and frees up compute resources.


Solution Design

Solution Logical Design

The logical design includes the following components: 

  • 1 x Hypervisor node (KVM based) with ConnectX-7

    • 1 x Firewall VM

    • 1 x Jump VM

    • 1 x MAAS VM 

    • 3 x VMs running all K8s management components for Host/DPU clusters

  • 2 x Worker nodes, each with a 1 x BlueField-3 NIC 

  • Storage Target Node with ConnectX-7 and NFS server apps

  • Single 200 GbE High-Speed (HS) switch

  • 1 GbE Host Management network
    image-2025-9-29_11-32-8.png


SFC Logical Diagram

The DOCA Platform Framework simplifies DPU management by providing orchestration through a K8s API. It handles the provisioning and lifecycle management of DPUs, orchestrates specialized DPU services, and automates service function chaining (SFC) tasks. This ensures seamless deployment of NVIDIA DOCA services, enabling efficient offloading and routing of traffic across the HBN data plane. The SFC logic diagram implemented in this guide is presented below.


image-2025-9-29_11-56-51.png

Volume Emulation Logical Diagram

The following logical diagram demonstrates the main components involved in a volume mount procedure to tenant host.

Upon receiving a new request for an emulated VIRTIO-FS volume, DOCA SNAP components bring a NFS volume via SNAP NFS client to the required DPU K8s worker node. The DPU then emulates it as a VIRTIO-FS volume to mount (manual procedure) on the same x86 host. 

image-2025-9-29_12-1-27.png


Firewall Design

The pfSense firewall in this solution serves a dual purpose:

  • Firewall – Provides an isolated environment for the DPF system, ensuring secure operations

  • Router – Enables internet access and connectivity between the host management network and the high-speed network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.

The following diagram illustrates the firewall design used in this solution:

image-2025-5-6_11-12-39-1.png



Software Stack Components

image-2025-9-9_9-17-48-1.png

Make sure to use the exact same versions for the software stack as described above.

Bill of Materials


image-2025-10-20_11-0-32-1.png



Deployment and Configuration

Node and Switch Definitions

These are the definitions and parameters used for deploying the demonstrated fabric:


Switch Port Usage

mgmt-switch

1

swp1-6

hs-switch

1

swp1,11-14,32

Hosts

Rack

Server Type

Server Name

Switch Port

IP and NICs

Default Gateway

Rack1


Hypervisor Node

hypervisor


mgmt-switch: swp1

hs-switch: swp1

lab-br (interface eno1): Trusted LAN IP

mgmt-br (interface eno2): -

hs-br (interface ens2f0np0): 

Trusted LAN GW

Rack1


Storage Target Node

target

mgmt-switch: swp4

hs-switch: swp32

enp1s0f0: 10.0.110.25/24

enp144s0f0np0: 10.0.124.1/24

10.0.110.254

Rack1


Worker Node

worker1

mgmt-switch: swp2

hs-switch: swp11-swp12

mgmt-switch: swp5

ens15f0: 10.0.110.21/24

ens5f0np0/ens5f1np1: 10.0.120.0/22
dpubmc: 10.0.110.201/24
dpuoob: 10.0.110.89/24

10.0.110.254

Rack1


Worker Node

worker2

mgmt-switch: swp3

hs-switch: swp13-swp14

mgmt-switch: swp6

ens15f0: 10.0.110.22/24

ens5f0np0/ens5f1np1: 10.0.120.0/22
dpubmc: 10.0.110.202/24
dpuoob: 10.0.110.80/24

10.0.110.254

Rack1

Firewall (Virtual)

fw

-

WAN (lab-br): Trusted LAN IP

LAN (mgmt-br): 10.0.110.254/24

OPT1 (hs-br): 172.169.50.1/30

Trusted LAN GW

Rack1


Jump Node (Virtual)

jump

-

enp1s0: 10.0.110.253/24

10.0.110.254

Rack1


MAAS (Virtual)

maas

-

enp1s0: 10.0.110.252/24

10.0.110.254

Rack1


Master Node (Virtual)

master1

-

enp1s0: 10.0.110.1/24

10.0.110.254

Rack1


Master Node (Virtual)

master2

-

enp1s0: 10.0.110.2/24

10.0.110.254

Rack1


Master Node (Virtual)

master3

-

enp1s0: 10.0.110.3/24

10.0.110.254

Wiring

Hypervisor Node

image-2025-6-8_15-6-14-1.png


Bare Metal Worker Node

image-2025-9-29_10-18-58-1.png


Storage Target Node

image-2025-6-8_15-5-42-1.png



Fabric Configuration

Updating Cumulus Linux

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

Configuring the Cumulus Linux Switch

The SN3700 switch (hs-switch), is configured as follows:

  • The following commands configure BGP unnumbered on hs-switch.

  • Cumulus Linux enables the BGP equal-cost multipathing (ECMP) option by default.


SN3700 Switch Console
nv set bridge domain br_default vlan 10 vni 10
nv set evpn enable on
nv set interface lo ip address 11.0.0.101/32
nv set interface lo type loopback
nv set interface swp1 ip address 172.169.50.2/30
nv set interface swp1 link speed auto
nv set interface swp1-32 type swp
nv set interface swp32 bridge domain br_default access 10
nv set nve vxlan enable on
nv set nve vxlan source address 11.0.0.101
nv set qos roce enable on
nv set qos roce mode lossless
nv set router bgp autonomous-system 65001
nv set router bgp enable on
nv set router bgp graceful-restart mode full
nv set router bgp router-id 11.0.0.101
nv set system hostname hs-switch
nv set vrf default router bgp address-family ipv4-unicast enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute connected enable on
nv set vrf default router bgp address-family ipv4-unicast redistribute static enable on
nv set vrf default router bgp address-family ipv6-unicast enable on
nv set vrf default router bgp address-family ipv6-unicast redistribute connected enable on
nv set vrf default router bgp address-family l2vpn-evpn enable on
nv set vrf default router bgp enable on
nv set vrf default router bgp neighbor swp11 peer-group hbn
nv set vrf default router bgp neighbor swp11 type unnumbered
nv set vrf default router bgp neighbor swp12 peer-group hbn
nv set vrf default router bgp neighbor swp12 type unnumbered
nv set vrf default router bgp neighbor swp13 peer-group hbn
nv set vrf default router bgp neighbor swp13 type unnumbered
nv set vrf default router bgp neighbor swp14 peer-group hbn
nv set vrf default router bgp neighbor swp14 type unnumbered
nv set vrf default router bgp path-selection multipath aspath-ignore on
nv set vrf default router bgp peer-group hbn address-family l2vpn-evpn enable on
nv set vrf default router bgp peer-group hbn remote-as external
nv set vrf default router static 0.0.0.0/0 address-family ipv4-unicast
nv set vrf default router static 0.0.0.0/0 via 172.169.50.1 type ipv4-address
nv set vrf default router static 10.0.110.0/24 address-family ipv4-unicast
nv set vrf default router static 10.0.110.0/24 via 172.169.50.1 type ipv4-address

nv config apply -y

The SN2201 switch (mgmt-switch) is configured as follows:

SN2201 Switch Console
nv set bridge domain br_default untagged 1
nv set interface swp1-4 link state up
nv set interface swp1-4 type swp
nv set interface swp1-4 bridge domain br_default
nv config apply -y

Installation and Configuration

Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.

All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must display the same interface name.

Make sure that you have DPU BMC and OOB MAC addresses.

Use this Reference Deployment Guide (RDG) for:

  • Host Configuration,

  • K8s Cluster Deployment and Configuration,

  • DPF Installation


DPU Service Installation

Prerequisites

Before start deployment a few adjustments are required.

  1. Add additional routing on each K8s control plane node to Target node:

    network:
      version: 2
      ethernets:
        enp1s0:
          match:
            macaddress: "52:54:00:cf:1d:38"
          addresses:
          - "10.0.110.1/24"
          nameservers:
            addresses:
            - 10.0.110.252
            search:
            - dpf.rdg.local.domain
          set-name: "enp1s0"
          mtu: 1500
          routes:
          - to: default
            via: 10.0.110.254
            metric: 50        
          - to: 10.0.124.1
            via: 10.0.110.25     
    


    Apply new configuration by run command:

         netplan apply

  2. Build and push nfs-csi-controller helm chart

    Run the following commands from the DPF repo root (~/doca-platform).

    export HELM_REGISTRY=<your-registry>  For example: export HELM_REGISTRY="oci://cr.example.com"
    export TAG="v0.1.0"
    make helm-package-nfs-csi-controller
    make helm-push-nfs-csi-controller

    Later, your Helm registry will be referenced in both dpuservicetemplate_nfs-csi-controller.yaml and dpuservicetemplate_nfs-csi-controller-dpu.yaml.

    Install GO

    GO language should be installed on your build machine.
    To install Go (Golang) on Ubuntu 24.04, two primary methods are available: using the apt package manager or using snap.

    • Installing Go using apt (from Ubuntu repositories):

      sudo apt install golang-go -y

    • Installing Go using snap (for potentially newer versions):

      sudo snap install go --classic

      The --classic argument grants the snap package classic confinement, allowing it broader access to the system.

    • Verify installation:

      go version


  3. Download the snap-virtiofs.zip file with the required YAML deployment files for this guide, then unarchive it.

    Jump Node Console

    $ cd ~
    $ unzip snap-virtiofs.zip 
    $ cd snap-virtiofs
    
    $ ls -R manifests
    manifests:
    00-env-vars  
    00-high-speed-switch-configuration  
    04-dpudeployment-installation
    
    manifests/00-env-vars:
    envvars.env
    
    manifests/00-high-speed-switch-configuration:
    switch-hs.conf
    
    manifests/04-dpudeployment-installation:
    bfb.yaml                                             dpuserviceconfiguration_snap-node-driver.yaml        dpuservicetemplate_snap-node-driver.yaml
    dpudeployment.yaml                                   dpuservicecredentialrequest_nfs-csi-controller.yaml  dpustoragepolicy_policy-fs.yaml
    dpuflavor.yaml                                       dpuservicecredentialrequest_snap-controller.yaml     dpustoragevendor_nfs-csi.yaml
    dpuserviceconfiguration_doca-snap.yaml               dpuservicetemplate_doca-snap.yaml                    hbn-dpuserviceconfig.yaml
    dpuserviceconfiguration_fs-storage-dpu-plugin.yaml   dpuservicetemplate_fs-storage-dpu-plugin.yaml        hbn-dpuservicetemplate.yaml
    dpuserviceconfiguration_nfs-csi-controller-dpu.yaml  dpuservicetemplate_nfs-csi-controller-dpu.yaml       hbn-ipam.yaml
    dpuserviceconfiguration_nfs-csi-controller.yaml      dpuservicetemplate_nfs-csi-controller.yaml           hbn-loopback-ipam.yaml
    dpuserviceconfiguration_snap-configuration.yaml      dpuservicetemplate_snap-configuration.yaml           physical-ifaces.yaml
    dpuserviceconfiguration_snap-controller.yaml         dpuservicetemplate_snap-controller.yaml              storage-ipam.yaml
    dpuserviceconfiguration_snap-host-controller.yaml    dpuservicetemplate_snap-host-controller.yaml
    
    
    


  4. Modify the variables in manifests/00-env-vars/envvars.env to fit your environment, then source the file: 

    Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPUCLUSTER_INTERFACE and BMC_ROOT_PASSWORD.

    It is necessary to set several environment variables before running this command.

    $ source manifests/00-env-vars/envvars.env

Change the DPUDeployment, DPUServiceConfig, DPUServiceTemplate and other necessary objects.

  1. Before deploying the objects under the manifests/04-dpudeployment-installationdirectory, a few adjustments need to be made. 


    1. Review  dpudeployment.yaml to reference the DPUFlavor suited for SNAP:

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUDeployment
      metadata:
        name: hbn-storage
        namespace: dpf-operator-system
      spec:
        dpus:
          bfb: bf-bundle
          flavor: dpf-provisioning-hbn-storage
          nodeEffect:
            noEffect: true    
          dpuSets:
          - nameSuffix: "dpuset1"
            nodeSelector:
              matchLabels:
                feature.node.kubernetes.io/dpu-enabled: "true"
        services:
          doca-hbn:
            serviceTemplate: doca-hbn
            serviceConfiguration: doca-hbn
          snap-host-controller:
            serviceTemplate: snap-host-controller
            serviceConfiguration: snap-host-controller
          snap-controller:
            serviceTemplate: snap-controller
            serviceConfiguration: snap-controller
          snap-node-driver:
            serviceTemplate: snap-node-driver
            serviceConfiguration: snap-node-driver
          snap-configuration:
            serviceTemplate: snap-configuration
            serviceConfiguration: snap-configuration
          doca-snap:
            serviceTemplate: doca-snap
            serviceConfiguration: doca-snap
          fs-storage-dpu-plugin:
            serviceTemplate: fs-storage-dpu-plugin
            serviceConfiguration: fs-storage-dpu-plugin
          nfs-csi-controller:
            serviceTemplate: nfs-csi-controller
            serviceConfiguration: nfs-csi-controller
          nfs-csi-controller-dpu:
            serviceTemplate: nfs-csi-controller-dpu
            serviceConfiguration: nfs-csi-controller-dpu
        serviceChains:
          switches:
            - ports:
              - serviceInterface:
                  matchLabels:
                    uplink: p0
              - service:
                  name: doca-hbn
                  interface: p0_if
            - ports:
              - serviceInterface:
                  matchLabels:
                    uplink: p1
              - service:
                  name: doca-hbn
                  interface: p1_if
            - ports:
              - serviceInterface:
                  matchLabels:
                    interface: hostpf0
              - service:
                  interface: pf0hpf_if
                  name: doca-hbn              
            - ports:
              - service:
                  name: doca-snap
                  interface: app_sf
                  ipam:
                    matchLabels:
                      svc.dpu.nvidia.com/pool: storage-pool
              - service:
                  name: fs-storage-dpu-plugin
                  interface: app_sf
                  ipam:
                    matchLabels:
                      svc.dpu.nvidia.com/pool: storage-pool
              - service:
                  name: doca-hbn
                  interface: snap_if
      


    2. Set the ipv4Subnet settings for the storage-pool (please note: GW IP should be assigned to DATA interface in Storage Target Node installation).

      Following IPAM configuration files:
        hbn-ipam.yaml
        hbn-loopback-ipam.yaml
        storage-ipam.yaml

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceIPAM
      metadata:
        name: pool1
        namespace: dpf-operator-system
      spec:
        ipv4Network:
          network: "10.0.120.0/22"
          gatewayIndex: 1
          prefixSize: 29
      
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceIPAM
      metadata:
        name: loopback
        namespace: dpf-operator-system
      spec:
        ipv4Network:
          network: "11.0.0.0/24"
          prefixSize: 32
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceIPAM
      metadata:
        name: storage-pool
        namespace: dpf-operator-system
      spec:
        metadata:
          labels:
            svc.dpu.nvidia.com/pool: storage-pool
        ipv4Subnet:
          subnet: "10.0.124.0/24"
          gateway: "10.0.124.1"
          perNodeIPCount: 8
      
    3. Set the storageClassessettings according to your environment in dpuserviceconfiguration_nfs-csi-controller-dpu.yaml. Appropriated NFS share folders should exist on Target server node. You can configure more than one storageClasses:

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceConfiguration
      metadata:
        name: nfs-csi-controller-dpu
        namespace: dpf-operator-system
      spec:
        deploymentServiceName: nfs-csi-controller-dpu
        serviceConfiguration:
          helmChart:
            values:
              dpu:
                enabled: true
                storageClasses:
                  # List of storage classes to be created for nfs-csi
                  # These StorageClass names should be used in the StorageVendor settings
                  - name: nfs-csi
                    parameters:
                      server: 10.0.124.1
                      share: /srv/nfs/share
                  - name: nfs-csi-nvme
                    parameters:
                      server: 10.0.124.1
                      share: /srv/nfs/nvmeshare                  
                rbacRoles:
                  nfsCsiController:
                    # the name of the service account for nfs-csi-controller
                    # this value must be aligned with the value from the DPUServiceCredentialRequest
                    serviceAccount: nfs-csi-controller-sa
      
    4. The rest of the configuration files in the folder manifest/04-dpudeployment-installation/ remain the same, including:

      • BFB provisioning YAML:
            bfb.yaml

      • DPUFlavor provisioning YAML
            dpuflavor.yaml

      • DOCA-SNAP DPUService deployment and configuration YAMLs:    
            dpuserviceconfig_doca-snap.yaml    
            dpuservicetemplate_doca-snap.yaml

      • HBN DPUService deployment and configuration YAMLs:    
            hbn-dpuserviceconfig.yaml
            hbn-dpuservicetemplate.yaml

      • SNAP configuration DPUService deployment and configuration YAMLs:    
            dpuserviceconfig_snap-configuration.yaml
            dpuservicetemplate_snap-configuration.yaml

      • SNAP controller DPUService deployment and configuration YAMLs:    
            dpuserviceconfig_snap-controller.yaml
            dpuservicetemplate_snap-controller.yaml

      • SNAP Host controller DPUService deployment and configuration YAMLs
            dpuserviceconfiguration_snap-host-controller.yaml
            dpuservicetemplate_snap-host-controller.yaml

      • SNAP node driver DPUService deployment and configuration YAMLs:    
            dpuserviceconfig_snap-node-driver.yaml
            dpuservicetemplate_snap-node-driver.yaml

      • NFS CSI controller DPUService deployment and configuration YAMLs:  
            dpuserviceconfiguration_nfs-csi-controller.yaml
            dpuservicetemplate_nfs-csi-controller.yaml (required repo update)

      • NFS CSI DPU controller DPUService deployment and configuration YAMLs:  
            dpuserviceconfiguration_nfs-csi-controller-dpu.yaml
            dpuservicetemplate_nfs-csi-controller-dpu.yaml (required repo update)

      • Storage vendor DPU pludin DPUService deployment and configuration YAMLs:
            dpuserviceconfiguration_fs-storage-dpu-plugin.yaml
            dpuservicetemplate_fs-storage-dpu-plugin.yaml

      • DPUServiceIPAM deployment and configuration YAMLs:
            hbn-loopback-ipam.yaml
            storage-ipam.yaml
            hbn-ipam.yaml

      • DPUServiceInterfaces for physical ports on the DPU:    
            physical-ifaces.yaml

      • SNAP DPUServiceCredentialRequest to allow cross cluster communication:        
            dpuservicecredentialrequest_nfs-csi-controller.yaml
            dpuservicecredentialrequest_snap-controller.yaml

      • StoragePolicy deployment and configuration YAMLs:    
            dpustoragepolicy_policy-fs.yaml

      • StorageVendor deployment and configuration YAMLs:    
            dpustoragevendor_nfs-csi.yaml

  2. Apply all of the YAML files mentioned above using the following command.

    $ cd manifest/04-dpudeployment-installation
    $ cat *.yaml | envsubst | kubectl apply -f -
    bfb.provisioning.dpu.nvidia.com/bf-bundle created
    dpudeployment.svc.dpu.nvidia.com/hbn-storage created
    dpuflavor.provisioning.dpu.nvidia.com/dpf-provisioning-hbn-storage created
    dpuserviceconfiguration.svc.dpu.nvidia.com/doca-snap created
    dpuserviceconfiguration.svc.dpu.nvidia.com/fs-storage-dpu-plugin created
    dpuserviceconfiguration.svc.dpu.nvidia.com/nfs-csi-controller-dpu created
    dpuserviceconfiguration.svc.dpu.nvidia.com/nfs-csi-controller created
    dpuserviceconfiguration.svc.dpu.nvidia.com/snap-configuration created
    dpuserviceconfiguration.svc.dpu.nvidia.com/snap-controller created
    dpuserviceconfiguration.svc.dpu.nvidia.com/snap-host-controller created
    dpuserviceconfiguration.svc.dpu.nvidia.com/snap-node-driver created
    dpuservicecredentialrequest.svc.dpu.nvidia.com/nfs-csi-controller-credentials created
    dpuservicecredentialrequest.svc.dpu.nvidia.com/snap-controller-credentials created
    dpuservicetemplate.svc.dpu.nvidia.com/doca-snap created
    dpuservicetemplate.svc.dpu.nvidia.com/fs-storage-dpu-plugin created
    dpuservicetemplate.svc.dpu.nvidia.com/nfs-csi-controller-dpu created
    dpuservicetemplate.svc.dpu.nvidia.com/nfs-csi-controller created
    dpuservicetemplate.svc.dpu.nvidia.com/snap-configuration created
    dpuservicetemplate.svc.dpu.nvidia.com/snap-controller created
    dpuservicetemplate.svc.dpu.nvidia.com/snap-host-controller created
    dpuservicetemplate.svc.dpu.nvidia.com/snap-node-driver created
    dpustoragepolicy.storage.dpu.nvidia.com/policy-fs created
    dpustoragepolicy.storage.dpu.nvidia.com/policy-fs-nvme created
    dpustoragevendor.storage.dpu.nvidia.com/nfs-csi created
    dpustoragevendor.storage.dpu.nvidia.com/nfs-csi-nvme created
    dpuserviceconfiguration.svc.dpu.nvidia.com/doca-hbn created
    dpuservicetemplate.svc.dpu.nvidia.com/doca-hbn created
    dpuserviceipam.svc.dpu.nvidia.com/pool1 created
    dpuserviceipam.svc.dpu.nvidia.com/loopback created
    dpuserviceinterface.svc.dpu.nvidia.com/p0 created
    dpuserviceinterface.svc.dpu.nvidia.com/p1 created
    dpuserviceinterface.svc.dpu.nvidia.com/hostpf0 created
    dpuserviceipam.svc.dpu.nvidia.com/storage-pool created
    


  3. To follow the progress of DPU provisioning, run the following command to check its current phase:

    Jump Node Console

    $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
    Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'
      Dpu Node Name:                                                     dpu-node-mt2334xz09f0
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  Initialized
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  BFBReady
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  NodeEffectReady
        Last Transition Time:  2025-09-28T11:43:59Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  FWConfigured
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  BFBPrepared
        Last Transition Time:  2025-09-28T11:56:32Z
        Type:                  OSInstalled
        Last Transition Time:  2025-09-28T11:59:32Z
        Type:                  Rebooted
      Phase:  Rebooting
      Dpu Node Name:                                                     dpu-node-mt2334xz09f1
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  Initialized
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  BFBReady
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  NodeEffectReady
        Last Transition Time:  2025-09-28T11:43:58Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  FWConfigured
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  BFBPrepared
        Last Transition Time:  2025-09-28T11:56:46Z
        Type:                  OSInstalled
        Last Transition Time:  2025-09-28T11:59:49Z
        Type:                  Rebooted
      Phase:  Rebooting                             
    
    


  4. Wait for the Rebooted stage and then Power Cycle the bare-metal host manual.
    After the DPU is up, run following command for each DPU worker:

    Jump Node Console

    kubectl annotate dpunodes dpu-node-mt2334xz09f0 -n dpf-operator-system  provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
    kubectl annotate dpunodes dpu-node-mt2334xz09f1 -n dpf-operator-system  provisioning.dpu.nvidia.com/dpunode-external-reboot-required-
    


  5. At this point, the DPU workers should be added to the cluster. As they being added to the cluster, the DPUs are provisioned.

    Jump Node Console

    $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
    Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'                                                                     setup5-jump: Wed May 21 10:45:44 2025
      Dpu Node Name:                                                     dpu-node-mt2334xz09f0
        Type:       InternalIP
        Type:       Hostname
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  Initialized
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  BFBReady
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  NodeEffectReady
        Last Transition Time:  2025-09-28T11:43:59Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  FWConfigured
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  BFBPrepared
        Last Transition Time:  2025-09-28T11:56:32Z
        Type:                  OSInstalled
        Last Transition Time:  2025-09-28T12:37:26Z
        Type:                  Rebooted
        Last Transition Time:  2025-09-28T12:37:26Z
        Type:                  DPUClusterReady
        Last Transition Time:  2025-09-28T12:37:26Z
        Type:                  Ready
      Phase:  Ready
      Dpu Node Name:                                                     dpu-node-mt2334xz09f1
        Type:       InternalIP
        Type:       Hostname
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  Initialized
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  BFBReady
        Last Transition Time:  2025-09-28T11:43:57Z
        Type:                  NodeEffectReady
        Last Transition Time:  2025-09-28T11:43:58Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  FWConfigured
        Last Transition Time:  2025-09-28T11:44:00Z
        Type:                  BFBPrepared
        Last Transition Time:  2025-09-28T11:56:46Z
        Type:                  OSInstalled
        Last Transition Time:  2025-09-28T12:37:26Z
        Type:                  Rebooted
        Last Transition Time:  2025-09-28T13:38:23Z
        Type:                  DPUClusterReady
        Last Transition Time:  2025-09-28T13:38:23Z
        Type:                  Ready
      Phase:  Ready
    


  6. Finally, validate that all the different DPU-related objects are now in the Ready state:

    Jump Node Console

    $ kubectl get secrets -n dpu-cplane-tenant1 dpu-cplane-tenant1-admin-kubeconfig -o json | jq -r '.data["admin.conf"]' | base64 --decode > /home/depuser/dpu-cluster.config
    $ echo "alias ki='KUBECONFIG=/home/depuser/dpu-cluster.config kubectl'" >> ~/.bashrc
    
    $ kubectl -n dpf-operator-system exec deploy/dpf-operator-controller-manager -- /dpfctl describe dpudeployments
    NAME                                                 NAMESPACE            STATUS       REASON    SINCE  MESSAGE                                                                                           
    DPFOperatorConfig/dpfoperatorconfig                  dpf-operator-system  Ready: True  Success   28m                                                                                                       
    └─DPUDeployments                                                                                                                                                                                           
      └─DPUDeployment/hbn-storage                        dpf-operator-system  Ready: True  Success   27m                                                                                                       
        ├─DPUServiceChains                                                                                                                                                                                     
        | └─DPUServiceChain/hbn-storage-b782l            dpf-operator-system  Ready: True  Success   144m                                                                                                      
        ├─DPUServiceInterfaces                                                                                                                                                                                 
        │ └─6 DPUServiceInterfaces...                    dpf-operator-system  Ready: True  Success   144m   See doca-hbn-p0-if-z2g6t, doca-hbn-p1-if-pbp47, doca-hbn-pf0hpf-if-qnzsw, doca-hbn-snap-if-qs22b,  
        │                                                                                                   doca-snap-app-sf-72zh7, fs-storage-dpu-plugin-app-sf-pcvc5                                         
        ├─DPUSets                                                                                                                                                                                              
        │ └─DPUSet/hbn-storage-dpuset1                   dpf-operator-system                                                                                                                                   
        │   └─BFB/bf-bundle                              dpf-operator-system  Ready: True  Ready     144m   File: bf-bundle-3.0.0-135_25.04_ubuntu-22.04_prod.bfb, DOCA: 3.0.0                                 
        │   └─DPUs                                                                                                                                                                                             
        │     └─2 DPUs...                                dpf-operator-system  Ready: True  DPUReady  90m    See dpu-node-mt2334xz09f0-mt2334xz09f0, dpu-node-mt2334xz09f1-mt2334xz09f1                         
        └─Services                                                                                                                                                                                             
          ├─DPUServiceTemplates                                                                                                                                                                                
          │ ├─DPUServiceTemplate/doca-hbn                dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/doca-snap               dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/fs-storage-dpu-plugin   dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/nfs-csi-controller      dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/nfs-csi-controller-dpu  dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/snap-configuration      dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/snap-controller         dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ ├─DPUServiceTemplate/snap-host-controller    dpf-operator-system  Ready: True  Success   144m                                                                                                      
          │ └─DPUServiceTemplate/snap-node-driver        dpf-operator-system  Ready: True  Success   144m                                                                                                      
          └─DPUServices                                                                                                                                                                                        
            └─9 DPUServices...                           dpf-operator-system  Ready: True  Success   144m   See doca-hbn-k9xdp, doca-snap-nnj2t, fs-storage-dpu-plugin-cqqfn, nfs-csi-controller-dpu-nzrqn,    
                                                                                                            nfs-csi-controller-w992c, snap-configuration-vz4gc, snap-controller-jr5c7,                         
                                                                                                            snap-host-controller-z7jds, snap-node-driver-gtsds       
    
    $ kubectl get dpu -A
    NAMESPACE             NAME                                 READY   PHASE   AGE
    dpf-operator-system   dpu-node-mt2334xz09f0-mt2334xz09f0   True    Ready   161m
    dpf-operator-system   dpu-node-mt2334xz09f1-mt2334xz09f1   True    Ready   161m
    
    $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all
    dpu.provisioning.dpu.nvidia.com/dpu-node-mt2334xz09f0-mt2334xz09f0 condition met
    dpu.provisioning.dpu.nvidia.com/dpu-node-mt2334xz09f1-mt2334xz09f1 condition met
    
    $ ki get pod -o wide -A
    NAMESPACE             NAME                                                              READY   STATUS    RESTARTS   AGE    IP            NODE                                 NOMINATED NODE   READINESS GATES
    dpf-operator-system   dpu-cplane-tenant1-doca-hbn-k9xdp-ds-bhtcx                        2/2     Running   0          49m    10.244.6.20   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-doca-hbn-k9xdp-ds-q5kb5                        2/2     Running   0          110m   10.244.4.10   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-doca-snap-nnj2t-8g9pb                          1/1     Running   0          49m    10.244.6.22   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-doca-snap-nnj2t-zg8s8                          1/1     Running   0          110m   10.244.4.11   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-fs-storage-dpu-plugin-cqqfn-fs-storage-79nb4   1/1     Running   0          110m   10.244.4.12   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-fs-storage-dpu-plugin-cqqfn-fs-storage-vbr4j   1/1     Running   0          49m    10.244.6.21   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-nvidia-k8s-ipam-controller-6cb8f65fc5-lnxcm    1/1     Running   0          7d1h   10.244.4.2    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-c8drf                  1/1     Running   0          49m    10.244.6.8    dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-nvidia-k8s-ipam-node-ds-f7tkd                  1/1     Running   0          113m   10.244.4.6    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-ovs-cni-arm64-5nhvq                            1/1     Running   0          49m    10.0.110.80   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-ovs-cni-arm64-z97g8                            1/1     Running   0          113m   10.0.110.89   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-sfc-controller-node-ds-69blq                   1/1     Running   0          49m    10.244.6.10   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-sfc-controller-node-ds-w27fn                   1/1     Running   0          113m   10.244.4.4    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-snap-node-driver-gtsds-5h8b7                   1/1     Running   0          49m    10.244.6.11   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   dpu-cplane-tenant1-snap-node-driver-gtsds-6h7z5                   1/1     Running   0          110m   10.244.4.7    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   kube-flannel-ds-fzplj                                             1/1     Running   0          113m   10.0.110.89   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   kube-flannel-ds-xdkpk                                             1/1     Running   0          49m    10.0.110.80   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   kube-multus-ds-djwr5                                              1/1     Running   0          49m    10.0.110.80   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   kube-multus-ds-lkh2w                                              1/1     Running   0          113m   10.0.110.89   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    dpf-operator-system   kube-sriov-device-plugin-b5sm9                                    1/1     Running   0          49m    10.0.110.80   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    dpf-operator-system   kube-sriov-device-plugin-dnl2s                                    1/1     Running   0          113m   10.0.110.89   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    kube-system           coredns-796d84c46b-c2wrm                                          1/1     Running   0          7d1h   10.244.4.5    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    kube-system           coredns-796d84c46b-dhtw6                                          1/1     Running   0          7d1h   10.244.4.3    dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    kube-system           kube-proxy-sf5dc                                                  1/1     Running   0          49m    10.0.110.80   dpu-node-mt2334xz09f1-mt2334xz09f1   <none>           <none>
    kube-system           kube-proxy-zg5mf                                                  1/1     Running   0          113m   10.0.110.89   dpu-node-mt2334xz09f0-mt2334xz09f0   <none>           <none>
    


    Congratulations, the DPF system with SNAP service has been successfully installed!

    Please reboot all Worker nodes in order to activate the latest BlueField firmware settings!



Deployment Validation


To verify the DPF deployment with DOCA SNAP storage services by using following simple workload:

  1. Create a simple DPUvolume with PVC storage provisioning:

    dpuvolume-nvme.yaml

    YAML
    ---
    apiVersion: storage.dpu.nvidia.com/v1alpha1
    kind: DPUVolume
    metadata:
      name: test-volume-virtiofs-hotplug-pf-nvme
      namespace: dpf-operator-system
    spec:
      dpuStoragePolicyName: policy-fs-nvme
      resources:
        requests:
          storage: 100Gi
      accessModes:
      - ReadWriteOnce
      volumeMode: Filesystem
    
    
    


  2. Create volume:

    Jump console

    Bash
    $ kubectl apply -f dpuvolume-nvme.yaml
    dpuvolume.storage.dpu.nvidia.com/test-volume-virtiofs-hotplug-pf-nvme created
    
    Look NFS share folder on Target Node:
    root@target:~# ll /srv/nfs/nvmeshare
    total 4
    drwxr-xr-x 3 root root   54 Sep 28 14:40 ./
    drwxr-xr-x 4 root root 4096 Jul 22 08:30 ../
    drwxr-xr-x 2 root root    6 Sep 28 14:40 pvc-a988763f-9d84-4dc2-930b-86657a931895/
    


  3. Create a DPUVolumeAttachment to attach DPUvolume to both Worker nodes:

    Jump console

    Bash
    ---
    apiVersion: storage.dpu.nvidia.com/v1alpha1
    kind: DPUVolumeAttachment
    metadata:
      name: test-volume-attachment-virtiofs-hotplug-pf-nvme-w1
      namespace: dpf-operator-system
    spec:
      dpuNodeName: dpu-node-mt2334xz09f0 # change to actual worker node name
      dpuVolumeName: test-volume-virtiofs-hotplug-pf-nvme
      functionType: pf
      hotplugFunction: true
    
    ---
    apiVersion: storage.dpu.nvidia.com/v1alpha1
    kind: DPUVolumeAttachment
    metadata:
      name: test-volume-attachment-virtiofs-hotplug-pf-nvme-w2
      namespace: dpf-operator-system
    spec:
      dpuNodeName: dpu-node-mt2334xz09f1 # change to actual worker node name
      dpuVolumeName: test-volume-virtiofs-hotplug-pf-nvme
      functionType: pf
      hotplugFunction: true
    
    
    
  4. Attach volume to both Worker nodes:

    Jump console

    Bash
    $ kubectl apply -f dpuvolumeattachment-nvme.yaml
    dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-virtiofs-hotplug-pf-nvme-w1 created
    dpuvolumeattachment.storage.dpu.nvidia.com/test-volume-attachment-virtiofs-hotplug-pf-nvme-w2 created
    
    


  5. Check volume attachment

    Jump console

    $ kubectl get dpuvolumeattachment -n dpf-operator-system -o yaml | grep filesystemTag
            filesystemTag: d8dffc9d64fbf39btag
            filesystemTag: d8dffc9d64fbf39btag
    
    The output above indicates that the DPUVolume has been successfully mounted on the DPU and the VirtIO-FS 
    emulation has successfully published the TAG, making it available for mounting on the x86 Worker node.
    
    
    
    
    


  6. Mount VIRTIO-FS volume on X86 hosts:

    Jump console

    Bash
    For Worker1:
    
    $ ssh worker1
    depuser@worker1:~$ sudo mkdir /mnt/virtio
    depuser@worker1:~$ sudo mount -t virtiofs d8dffc9d64fbf39btag /mnt/virtio
    
    depuser@worker1:~$ mount | grep virtio
    d8dffc9d64fbf39btag on /mnt/virtio type virtiofs (rw,relatime)
    
    depuser@worker1:~$ cd /mnt/virtio/
    
    depuser@worker1:/mnt/virtio$ sudo touch worker1.file
    
    depuser@worker1:/mnt/virtio$ ls 
    worker1.file
    
    For Worker2:
    
    $ ssh worker2
    depuser@worker2:~$ sudo mkdir /mnt/virtio
    depuser@worker2:~$ sudo mount -t virtiofs d8dffc9d64fbf39btag /mnt/virtio
    
    depuser@worker2:~$ mount | grep virtio
    d8dffc9d64fbf39btag on /mnt/virtio type virtiofs (rw,relatime)
    
    depuser@worker2:~$ cd /mnt/virtio/
    
    depuser@worker2:/mnt/virtio$ sudo touch worker2.file
    
    
    depuser@worker1:/mnt/virtio$ ls 
    worker1.file worker2.file
    


  7. FIO performance test:

    Jump console

    Bash
    Create FIO job file.
    
    cat /mnt/virtio/job-4k.fio
    
    [global]
    ioengine=libaio
    direct=1
    iodepth=32
    rw=read
    bs=4k
    size=10G
    numjobs=8
    runtime=60
    time_based
    group_reporting
    
    [job1]
    filename=/mnt/virtio/test.fio
    
    Run FIO job.
    
    $ sudo fio /mnt/virtio/job-4k.fio
    job1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
    ...
    fio-3.36
    Starting 8 processes
    job1: Laying out IO file (1 file / 10240MiB)
    Jobs: 8 (f=8): [R(8)][100.0%][r=491MiB/s][r=126k IOPS][eta 00m:00s]
    job1: (groupid=0, jobs=8): err= 0: pid=3636: Mon Sep 29 05:25:21 2025
    read: IOPS=126k, BW=492MiB/s (516MB/s)(28.8GiB/60006msec)
    slat (usec): min=2, max=538, avg=12.48, stdev=16.18
    clat (usec): min=82, max=432099, avg=2006.02, stdev=5859.85
    lat (usec): min=316, max=432109, avg=2018.50, stdev=5858.35
    clat percentiles (usec):
    | 1.00th=[ 420], 5.00th=[ 449], 10.00th=[ 465], 20.00th=[ 478],
    | 30.00th=[ 490], 40.00th=[ 498], 50.00th=[ 506], 60.00th=[ 519],
    | 70.00th=[ 537], 80.00th=[ 562], 90.00th=[ 644], 95.00th=[18744],
    | 99.00th=[27395], 99.50th=[30802], 99.90th=[44827], 99.95th=[49546],
    | 99.99th=[60031]
    bw ( KiB/s): min=57640, max=586624, per=100.00%, avg=504163.97, stdev=6096.69, samples=952
    iops : min=14410, max=146656, avg=126040.96, stdev=1524.17, samples=952
    lat (usec) : 100=0.01%, 250=0.01%, 500=42.67%, 750=49.16%, 1000=1.25%
    lat (msec) : 2=0.20%, 4=0.01%, 10=0.01%, 20=2.32%, 50=4.33%
    lat (msec) : 100=0.05%, 250=0.01%, 500=0.01%
    cpu : usr=2.47%, sys=25.21%, ctx=6479868, majf=0, minf=968
    IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
    submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
    complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
    issued rwts: total=7560139,0,0,0 short=0,0,0,0 dropped=0,0,0,0
    latency : target=0, window=0, percentile=100.00%, depth=32
    
    Run status group 0 (all jobs):
    READ: bw=492MiB/s (516MB/s), 492MiB/s-492MiB/s (516MB/s-516MB/s), io=28.8GiB (31.0GB), run=60006-60006msec
    


Done.

Authors


VR.jpg

Vitaliy Razinkov

Vitaliy Razinkov is a Solutions Architect on the NVIDIA Networking team, specializing in complex Kubernetes, OpenShift, and Microsoft solutions. With over 25 years of experience in senior technical roles, he brings deep expertise in designing and implementing advanced infrastructures. Vitaliy has authored several reference design guides on Microsoft technologies, RoCE/RDMA-accelerated machine learning in Kubernetes/OpenShift, and containerized solutions—all available on the NVIDIA Networking Documentation site.

















Last updated: