Networking Solutions

RDG for DPF Host Trusted with OVN-Kubernetes

 Created on January  6, 2026

Scope

This Reference Deployment Guide (RDG) provides detailed instructions for deploying a Kubernetes (K8s) cluster using the DOCA Platform Framework (DPF). The guide focuses on setting up an accelerated OVN-Kubernetes service on NVIDIA® BlueField®-3 DPUs to deliver secure, isolated, and hardware-accelerated environments.

This guide is designed for experienced system administrators, system engineers, and solution architects who seek to deploy high-performance Kubernetes clusters with Host-Based Networking enabled on NVIDIA BlueField DPUs.

  • This reference implementation, as the name implies, is a specific, opinionated deployment example designed to address the use case described above. 

  • While other approaches may exist to implement similar solutions, this document provides a detailed guide for this particular method.

Abbreviations and Acronyms

Term

Definition

Term

Definition

BFB

BlueField Bootstream

MAAS

Metal as a Service

BGP

Border Gateway Protocol

OVN

Open Virtual Network

CNI

Container Network Interface

RDG

Reference Deployment Guide

CSI

Container Storage Interface 

RDMA

Remote Direct Memory Access

DOCA

Data Center Infrastructure-on-a-Chip Architecture

SFC

Service Function Chaining

DPF

DOCA Platform Framework

SR-IOV

Single Root Input/Output Virtualization

DPU

Data Processing Unit

TOR

Top of Rack

DTS

DOCA Telemetry Service

VLAN

Virtual LAN (Local Area Network)

GENEVE

Generic Network Virtualization Encapsulation 

VRR

Virtual Router Redundancy 

IPAM

IP Address Management 

VTEP

Virtual Tunnel End Point

K8S

Kubernetes



Introduction

The NVIDIA BlueField-3 Data Processing Unit (DPU) is a 400 Gb/s infrastructure compute platform designed for line-rate processing of software-defined networking, storage, and cybersecurity workloads. It combines powerful compute resources, high-speed networking, and advanced programmability to deliver hardware-accelerated, software-defined solutions for modern data centers.

NVIDIA DOCA unleashes the full potential of the BlueField platform by enabling rapid development of applications and services that offload, accelerate, and isolate data center workloads.

OVN-Kubernetes is a Kubernetes CNI network plugin that provides robust networking for Kubernetes clusters. Built on Open Virtual Network (OVN) and Open vSwitch (OVS), it supports hardware acceleration to offload OVS packet processing to NIC/DPU hardware. With OVS-DOCA, an extension of traditional OVS-DPDK and OVS-Kernel, accelerated OVN-Kubernetes delivers industry-leading performance, functionality, and efficiency. Running OVN-Kubernetes on the DPU reserves host CPUs exclusively for workloads, maximizing system resources.

Deploying and managing DPUs and their associated DOCA services, especially at scale, presents operational challenges. Without a robust provisioning and orchestration system, tasks such as lifecycle management, service deployment, and network configuration for service function chaining (SFC) can quickly become complex and error prone. This is where the DOCA Platform Framework (DPF) comes into play.

DPF automates the full DPU lifecycle, streamlines the deployment of DOCA services, and simplifies advanced network configurations. With DPF, services such as HBN can be deployed seamlessly, allowing for efficient offloading and intelligent routing of traffic through the DPU data plane.

By leveraging DPF, users can scale and automate DPU management across Kubernetes customer environments - optimizing performance while simplifying operations.

As part of the reference implementation, open-source components outside the scope of DPF (e.g., MAAS, pfSense, Kubespray) are used to simulate a realistic customer deployment environment.

The guide includes the full end-to-end deployment process, including:

  • Infrastructure provisioning

  • DPF deployment

  • DPU provisioning

  • Service configuration and deployment

  • Service chaining

It also demonstrates some performance optimizations, with results validated through standard RDMA and TCP workload tests.

We will deploy OVN-K8s over a simple bridged network, using a single highspeed uplink on each worker node


image-2025-5-27_10-32-5.png

If you are interested in a DPF deployment that incorporates DOCA's Host-Based Networking Service (HBN), utilizing ECMP-based dual uplinks and a large-scale BGP/EVPN fabric, please refer to this RDG that covers DPF with both the HBN and OVN-Kubernetes services and the deployment of additional DOCA Services.

References

Solution Architecture

Key Components and Technologies

  • NVIDIA BlueField® Data Processing Unit (DPU)
    The NVIDIA® BlueField® data processing unit (DPU) ignites unprecedented innovation for modern data centers and supercomputing clusters. With its robust compute power and integrated software-defined hardware accelerators for networking, storage, and security, BlueField creates a secure and accelerated infrastructure for any workload in any environment, ushering in a new era of accelerated computing and AI.

  • NVIDIA DOCA Software Framework
    NVIDIA DOCA™ unlocks the potential of the NVIDIA® BlueField® networking platform. By harnessing the power of BlueField DPUs and SuperNICs, DOCA enables the rapid creation of applications and services that offload, accelerate, and isolate data center workloads. It lets developers create software-defined, cloud-native, DPU- and SuperNIC-accelerated services with zero-trust protection, addressing the performance and security demands of modern data centers.

  • NVIDIA ConnectX SmartNICs
    10/25/40/50/100/200 and 400G Ethernet Network Adapters
    The industry-leading NVIDIA® ConnectX® family of smart network interface cards (SmartNICs) offer advanced hardware offloads and accelerations.
    NVIDIA Ethernet adapters enable the highest ROI and lowest Total Cost of Ownership for hyperscale, public and private clouds, storage, machine learning, AI, big data, and telco platforms.

  • NVIDIA LinkX Cables 
    The NVIDIA® LinkX® product family of cables and transceivers provides the industry’s most complete line of 10, 25, 40, 50, 100, 200, and 400GbE in Ethernet and 100, 200 and 400Gb/s InfiniBand products for Cloud, HPC, hyperscale, Enterprise, telco, storage and artificial intelligence, data center applications.

  • NVIDIA Spectrum Ethernet Switches
    Flexible form-factors with 16 to 128 physical ports, supporting 1GbE through 400GbE speeds.
    Based on a ground-breaking silicon technology optimized for performance and scalability, NVIDIA Spectrum switches are ideal for building high-performance, cost-effective, and efficient Cloud Data Center Networks, Ethernet Storage Fabric, and Deep Learning Interconnects. 
    NVIDIA combines the benefits of NVIDIA Spectrum switches, based on an industry-leading application-specific integrated circuit (ASIC) technology, with a wide variety of modern network operating system choices, including NVIDIA Cumulus® LinuxSONiC and NVIDIA Onyx®.

  • NVIDIA Cumulus Linux 
    NVIDIA® Cumulus® Linux is the industry's most innovative open network operating system that allows you to automate, customize, and scale your data center network like no other.

  • NVIDIA Network Operator
    The NVIDIA Network Operator simplifies the provisioning and management of NVIDIA networking resources in a Kubernetes cluster. The operator automatically installs the required host networking software - bringing together all the needed components to provide high-speed network connectivity. These components include the NVIDIA networking driver, Kubernetes device plugin, CNI plugins, IP address management (IPAM) plugin and others. The NVIDIA Network Operator works in conjunction with the NVIDIA GPU Operator to deliver high-throughput, low-latency networking for scale-out, GPU computing clusters.

  • Kubernetes
    Kubernetes is an open-source container orchestration platform for deployment automation, scaling, and management of containerized applications.

  • Kubespray 
    Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for generic OS/Kubernetes clusters configuration management tasks and provides:A highly available clusterComposable attributesSupport for most popular Linux distributions

  • OVN-Kubernetes
    OVN-Kubernetes (Open Virtual Networking - Kubernetes) is an open-source project that provides a robust networking solution for Kubernetes clusters with OVN (Open Virtual Networking) and Open vSwitch (Open Virtual Switch) at its core. It is a Kubernetes networking conformant plugin written according to the CNI (Container Network Interface) specifications.

  • RDMA 
    RDMA is a technology that allows computers in a network to exchange data without involving the processor, cache or operating system of either computer.
    Like locally based DMA, RDMA improves throughput and performance and frees up compute resources.

Solution Design

Solution Logical Design

The logical design includes the following components: 

  • 1 x Hypervisor node (KVM based) with ConnectX-7:

    • 1 x Firewall VM

    • 1 x Jump VM

    • 1 X MaaS VM

    • 3 x K8s Master VMs running all K8s management components

  • 2 x Worker nodes (PCI Gen5), each with a 1 x BlueField-3 NIC 

  • Single High-Speed (HS) switch, 1 x L3 HS underlay network

  • 1 Gb Host Management network

image-2025-6-4_9-54-18.png

K8s Cluster Logical Design

The following K8s logical design illustration demonstrates the main components of the DPF system, among them:

  • 3 x K8s Master VMs running all K8s management components

  • 2 x K8s Worker nodes (x86)

  • 2 x K8s DPU Workers running the OVN-K8s DOCA service

  • 1 x Kamaji (K8s Control-Plane Manager)  

  • 1 x DPU Control Plane (Tenant Cluster)

  • Connectivity to High-Speed/1Gb networks

image-2025-5-18_16-24-32-1.png

Firewall Design

The pfSense firewall in this solution serves a dual purpose:

  • Firewall – provides an isolated environment for the DPF system, ensuring secure operations

  • Router – enables internet access and connectivity between the host management network and the high-speed network

  • DHCP Server for the highspeed network

Port-forwarding rules for SSH and RDP are configured on the firewall to route traffic to the jump node’s IP address in the host management network. From the jump node, administrators can manage and access various devices in the setup, as well as handle the deployment of the Kubernetes (K8s) cluster and DPF components.

The following diagram illustrates the firewall design used in this solution:

FW_Design_Updated_2.png

Software Stack Components 

SW_stack_OVN_only.png

Make sure to use the exact same versions for the software stack as described above.

Bill of Materials

image-2025-6-5_10-58-39-1.png

Deployment and Configuration

Node and Switch Definitions

These are the definitions and parameters used for deploying the demonstrated fabric:

Switches Ports Usage

Hostname

Rack ID

Ports

hs-switch

1

swp1-3

mgmt-switch

1

swp1-3

Hosts

Rack

Server Type

Server Name

Switch Port

IP and NICs

Default Gateway

Rack1


Hypervisor Node

hypervisor

mgmt-switch: swp1

hs-switch: swp1

mgmt-br (interface eno2): -

hs-br (interface ens2f0np0): -

lab-br (interface eno1): Trusted LAN IP

Trusted LAN GW

Rack1


Worker Node

worker1

mgmt-switch: swp2

hs-switch: swp2

ens15f0: 10.0.110.21/24

ens5f0np0: 10.0.120.0/22

10.0.110.254

Rack1


Worker Node

worker2

mgmt-switch: swp3

hs-switch: swp4

ens15f0: 10.0.110.22/24

ens5f0np0: 10.0.120.0/22

10.0.110.254

Rack1

Firewall (Virtual)

fw

-

LAN (mgmt-br): 10.0.110.254/24

OPT1 (hs-br): 10.0.123.254/22

WAN (lab-br): Trusted LAN IP

Trusted LAN GW

Rack1


Jump Node (Virtual)

jump

-

enp1s0: 10.0.110.253/24

10.0.110.254

Rack1


MaaS (Virtual)

maas

-

enp1s0: 10.0.110.252/24

10.0.110.254

Rack1


Master Node (Virtual)

master1

-

enp1s0: 10.0.110.1/24

10.0.110.254

Rack1


Master Node (Virtual)

master2

-

enp1s0: 10.0.110.2/24

10.0.110.254

Rack1


Master Node (Virtual)

master3

-

enp1s0: 10.0.110.3/24

10.0.110.254

Wiring

Hypervisor Node 

image-2025-5-18_16-17-13-1.png

K8s Worker Node

image-2025-6-5_12-14-8-1.png

Fabric Configuration

Updating Cumulus Linux

As a best practice, make sure to use the latest released Cumulus Linux NOS version.

For information on how to upgrade Cumulus Linux, refer to the Cumulus Linux User Guide.

Configuring the Cumulus Linux Switch

For the SN3700 switch (hs-switch), is configured as follows:

  • The following commands configure a plain bridge on hs-switch.


SN3700 Switch Console
nv set bridge domain br_default untagged 1
nv set interface swp1-3 link state up
nv set interface swp1-3 type swp
nv set interface swp1-3 bridge domain br_default
nv config apply -y

The SN2201 switch (mgmt-switch) is configured as follows:

SN2201 Switch Console
nv set bridge domain br_default untagged 1
nv set interface swp1-3 link state up
nv set interface swp1-3 type swp
nv set interface swp1-3 bridge domain br_default
nv config apply -y

Host Configuration

Make sure that the BIOS settings on the worker node servers have SR-IOV enabled and that the servers are tuned for maximum performance.

All worker nodes must have the same PCIe placement for the BlueField-3 NIC and must show the same interface name.

Hypervisor Installation and Configuration

The hypervisor used in this Reference Deployment Guide (RDG) is based on Ubuntu 24.04 with KVM.

While this document does not detail the KVM installation process, it is important to note that the setup requires the following ISOs to deploy the Firewall, Jump, and MaaS virtual machines (VMs):

  • Ubuntu 24.04

  • pfSense-CE-2.7.2

To implement the solution, three Linux bridges must be created on the hypervisor:

Ensure a DHCP record is configured for the lab-br bridge interface in your trusted LAN to assign it an IP address.

  • lab-br – connects the Firewall VM to the trusted LAN. 

  • mgmt-br – Connects the various VMs to the host management network.

  • hs-br – Connects the Firewall VM to the high-speed network.

Additionally, an MTU of 9000 must be configured on the management and high-speed bridges (mgmt-br and hs-br) as well as their uplink interfaces to ensure optimal performance.

Hypervisor netplan configuration
YAML
network:
    ethernets:
        eno1:
            dhcp4: false
        eno2:
            dhcp4: false
            mtu: 9000
        ens2f0np0:
            dhcp4: false
            mtu: 9000
    bridges:
      lab-br:
         interfaces: [eno1]
         dhcp4: true
      mgmt-br:
         interfaces: [eno2]
         dhcp4: false
         mtu: 9000
      hs-br:
         interfaces: [ens2f0np0]
         dhcp4: false
         mtu: 9000
    version: 2

Apply the configuration:

Hypervisor Console
$ sudo netplan apply 

Prepare Infrastructure Servers

Firewall VM - pfSense Installation and Interface Configuration

Download the pfSense CE (Community Edition) ISO to your hypervisor and proceed with the software installation.

Suggested spec:

  • vCPU: 2

  • RAM: 2GB

  • Storage: 10GB

  • Network interfaces

    • Bridge device connected to lab-br

    • Bridge device connected to mgmt-br

    • Bridge device connected to hs-br

The Firewall VM must be connected to all three Linux bridges on the hypervisor. Before beginning the installation, ensure that three virtual network interfaces of type "Bridge device" are configured. Each interface should be connected to a different bridge (lab-br, mgmt-br, and hs-br) as illustrated in the diagram below.

FW_VM_NIC.png

After completing the installation, the setup wizard displays a menu with several options, such as "Assign Interfaces" and "Reboot System." During this phase, you must configure the network interfaces for the Firewall VM.

  1. Select Option 2: "Set interface(s) IP address" and configure the interfaces as follows:

    • WAN – Trusted LAN IP (Static/DHCP)

    • LAN – Static IP 10.0.110.254/24

    • OPT1 – Static IP 10.0.123.254/22

  2. Once the interface configuration is complete, use a web browser within the host management network to access the Firewall web interface and finalize the configuration.

Next, proceed with the installation of the Jump VM. This VM will serve as a platform for running a browser to access the Firewall’s web interface for post-installation configuration.

Jump VM

Suggested specifications:

  • vCPU: 4

  • RAM: 8GB

  • Storage: 25GB

  • Network interface: Bridge device, connected to mgmt-br

Procedure:

  1. Proceed with a standard Ubuntu 24.04 installation. Use the following login credentials across all hosts in this setup:

    Username

    Password

    depuser

    user

  2. Enable internet connectivity and DNS resolution by creating the following Netplan configuration:

    Use 10.0.110.254 as a temporary DNS nameserver until the MaaS VM is installed and configured. After completing the MaaS installation, update the Netplan file to replace this address with the MaaS IP: 10.0.110.252.


    Jump Node netplan

    YAML
    network:
        ethernets:
            enp1s0:
                dhcp4: false
                addresses: [10.0.110.253/24]
                nameservers:
                  search: [dpf.rdg.local.domain]
                  addresses: [10.0.110.254]
                routes:
                  - to: default
                    via: 10.0.110.254
        version: 2
    
  3. Apply the configuration:

    Jump Node Console

    depuser@jump:~$ sudo netplan apply 
    
  4. Update and upgrade the system:

    Jump Node Console

    depuser@jump:~$ sudo apt update -y
    depuser@jump:~$ sudo apt upgrade -y
    
  5. Install and configure the Xfce desktop environment and XRDP (complementary packages for RDP):

    Jump Node Console

    depuser@jump:~$ sudo apt install -y xfce4 xfce4-goodies
    depuser@jump:~$ sudo apt install -y lightdm-gtk-greeter
    depuser@jump:~$ sudo apt install -y xrdp
    depuser@jump:~$ echo "xfce4-session" | tee .xsession
    depuser@jump:~$ sudo systemctl restart xrdp
    
  6. Install Firefox for accessing the Firewall web interface:

    Jump Node Console

    $ sudo apt install -y firefox
    
  7. Install and configure an NFS server with the /mnt/dpf_share directory:

    Jump Node Console

    $ sudo apt install -y nfs-server
    $ sudo mkdir -m 777 /mnt/dpf_share
    $ sudo vi /etc/exports
    
  8. Add the following line to /etc/exports:

    Jump Node Console

    /mnt/dpf_share 10.0.110.0/24(rw,sync,no_subtree_check)
    
  9. Restart the NFS server:

    Jump Node Console

    $ sudo systemctl restart nfs-server
    
  10. Create the directory bfb under /mnt/dpf_share with the same permissions as the parent directory:

    Jump Node Console

    $ sudo mkdir -m 777 /mnt/dpf_share/bfb
    
  11. Generate an SSH key pair for depuser in the jump node (later on will be imported to the admin user in MaaS to enable password-less login to the provisioned servers):

    Jump Node Console

    depuser@jump:~$ ssh-keygen -t rsa
    
  12. Reboot the jump node to display the graphical user interface:

    Jump Node Console

    depuser@jump:~$ sudo reboot
    

    After setting up port-forwarding rules on the firewall (next steps), remote login to the graphical interface of the Jump node will be available.

    Concurrent login to the local graphical console and using RDP isn't possible, make sure to first log out from the local console when switching to RDP connection.

Firewall VM – Web Configuration

From your Jump node, open Firefox web browser and go to the pfSense web UI (http://10.0.110.254, default credentials are admin/pfsense). You should see a page similar to the following:

The IP addresses from the trusted LAN network under "DNS servers" and "Interfaces - WAN" are blurred.

firewall_main_page_blur.png

Proceed with the following configurations: 

The following screenshots display only a part of the configuration view. Make sure to not miss any of the steps mentioned below!

  • Interfaces:

    • WAN (lab-br) – mark “Enable interface”, unmark “Block private networks and loopback addresses”

    • LAN (mgmt-br) – mark “Enable interface”, “IPv4 configuration type”: Static IPv4 ("IPv4 Address": 10.0.110.254/24, "IPv4 Upstream Gateway": None), “MTU”: 9000

    • OPT1 (hs-br) – mark “Enable interface”, “IPv4 configuration type”: Static IPv4 ("IPv4 Address": 10.0.123.254/22, "IPv4 Upstream Gateway": None), “MTU”: 9000
      Firewall_LAN_Interface.png

  • Firewall:

    • NAT -> Port Forward -> Add rule -> “Interface”: WAN, “Address Family”: IPv4, “Protocol”: TCP, “Destination”: WAN address, “Destination port range”: (“From port”: SSH, “To port”: SSH), “Redirect target IP”: (“Type”: Address or Alias, “Address”: 10.0.110.253), “Redirect target port”: SSH, “Description”: NAT SSH

    • NAT -> Port Forward -> Add rule -> “Interface”: WAN, “Address Family”: IPv4, “Protocol”: TCP, “Destination”: WAN address, “Destination port range”: (“From port”: MS RDP, “To port”: MS RDP), “Redirect target IP”: (“Type”: Address or Alias, “Address”: 10.0.110.253), “Redirect target port”: MS RDP, “Description”: NAT RDP
      pfsense_nat_forward_ssh.png
      Firewall_NAT_rules.png

    • Rules -> OPT1 -> Add rule -> “Action”: Pass, “Interface”: OPT1, “Address Family”: IPv4+IPv6, “Protocol”: Any, “Source”: Any, “Destination”: Any
      Firewall_OPT1_Rules.png

  • Services

    • DHCP Server -> OPT1: Enable DHCP Server, Set Address Pool Range: 10.0.120.1 - 10.0.123.253  image-2025-5-26_17-14-6.png
      Scroll down to "Other DHCP Options" -
        Gateway: "none" (we will not be sending a default gateway address)
        Domain Name: "dpf.rdg.local.domain"
      image-2025-5-26_17-22-21.png

      Scroll down to "custom DHCP Options":
      Add option number 121 with a String value of "20:a9:fe:63:64:0a:00:7b:fe".
      This value encodes the route entry "to 169.254.99.100/32 via 10.0.123.254" that will be used by DPF to internally assign a gateway through the high-speed network for OVN-K8s.
      Add option number 26 with an unsigned 16-bit integer value of "9000". This will set the MTU on the host interface to 9000. image-2025-5-27_13-46-54.png

MaaS VM

Suggested specifications:

  • vCPU: 4 

  • RAM: 4GB 

  • Storage: 50GB

  • Network interface: Bridge device, connected to mgmt-br

Procedure:

  1. Perform a regular Ubuntu installation on the MaaS VM.

  2. Create the following Netplan configuration to enable internet connectivity and DNS resolution:

    Use 10.0.110.254 as a temporary DNS nameserver. After the MaaS installation, replace this with the MaaS IP address (10.0.110.252) in both the Jump and MaaS VM Netplan files.


    MaaS netplan

    YAML
    network:
        ethernets:
            enp1s0:
                dhcp4: false
                addresses: [10.0.110.252/24]
                nameservers:
                  search: [dpf.rdg.local.domain]
                  addresses: [10.0.110.254]
                routes:
                  - to: default
                    via: 10.0.110.254
        version: 2
    
  3. Apply the netplan configuration:

    MaaS Console

    depuser@maas:~$ sudo netplan apply 
    
  4. Update and upgrade the system:

    MaaS Console

    depuser@maas:~$ sudo apt update -y
    depuser@maas:~$ sudo apt upgrade -y
    
  5. Install PostgreSQL and configure the database for MaaS: 

    MaaS Console

    $ sudo -i
    # apt install -y postgresql
    # systemctl enable --now postgresql
    # systemctl disable --now systemd-timesyncd
    # export MAAS_DBUSER=maasuser
    # export MAAS_DBPASS=maaspass
    # export MAAS_DBNAME=maas
    # sudo -i -u postgres psql -c "CREATE USER \"$MAAS_DBUSER\" WITH ENCRYPTED PASSWORD '$MAAS_DBPASS'"
    # sudo -i -u postgres createdb -O "$MAAS_DBUSER" "$MAAS_DBNAME"
    
  6. Install MaaS:

    MaaS Console

    # snap install --channel=3.5/stable maas
    
  7. Initialize MaaS:

    MaaS Console

    # maas init region+rack --maas-url http://10.0.110.252:5240/MAAS --database-uri "postgres://$MAAS_DBUSER:$MAAS_DBPASS@localhost/$MAAS_DBNAME"
    
  8. Create an admin account: 

    MaaS Console

    # maas createadmin --username admin --password admin --email admin@example.com
    
  9. Save the admin API key:

    MaaS Console

    # maas apikey --username admin > admin-apikey
    
  10. Log in to the MaaS server:

    MaaS Console

    # maas login admin http://localhost:5240/MAAS "$(cat admin-apikey)"
    
  11. Configure MaaS (Substitute <Trusted_LAN_NTP_IP> and <Trusted_LAN_DNS_IP> with the IP addresses in your environment):

    MaaS Console

    # maas admin domain update maas name="dpf.rdg.local.domain"
    # maas admin maas set-config name=ntp_servers value="<Trusted_LAN_NTP_IP>"
    # maas admin maas set-config name=network_discovery value="disabled"
    # maas admin maas set-config name=upstream_dns value="<Trusted_LAN_DNS_IP>"
    # maas admin maas set-config name=dnssec_validation value="no"
    # maas admin maas set-config name=default_osystem value="ubuntu"
    
  12. Define and configure IP ranges and subnets: 

    MaaS Console

    # maas admin ipranges create type=dynamic start_ip="10.0.110.51" end_ip="10.0.110.120"
    # maas admin ipranges create type=dynamic start_ip="10.0.110.21" end_ip="10.0.110.30"
    # maas admin ipranges create type=reserved start_ip="10.0.110.10" end_ip="10.0.110.10" comment="c-plane VIP"
    # maas admin ipranges create type=reserved start_ip="10.0.110.200" end_ip="10.0.110.200" comment="kamaji VIP"
    # maas admin ipranges create type=reserved start_ip="10.0.110.251" end_ip="10.0.110.254" comment="dpfmgmt"
    # maas admin vlan update 0 untagged dhcp_on=True primary_rack=maas mtu=9000
    # maas admin dnsresources create fqdn=kube-vip.dpf.rdg.local.domain ip_addresses=10.0.110.10
    # maas admin dnsresources create fqdn=jump.dpf.rdg.local.domain ip_addresses=10.0.110.253
    # maas admin dnsresources create fqdn=fw.dpf.rdg.local.domain ip_addresses=10.0.110.254
    
  13. Configure static DHCP leases for the worker nodes (replace MAC address as appropriate with your workers MGMT interface MAC):

    MaaS Console

    # maas admin reserved-ips create ip="10.0.110.21" mac_address="04:32:01:60:0d:da" comment="worker1" 
    # maas admin reserved-ips create ip="10.0.110.22" mac_address="04:32:01:5f:cb:e0" comment="worker2"
    


  14. Complete MaaS setup:

    1. Connect to the Jump node GUI and access the MaaS UI at http://10.0.110.252:5240/MAAS.

    2. On the first page, verify the "Region Name" and "DNS Forwarder," then continue.

    3. On the image selection page, select Ubuntu 24.04 LTS (amd64) and sync the image. maas_OS_Image_Mix_Good.png

    4. Import the previously generated SSH key (id_rsa.pub) for the depuser into the MaaS admin user profile and finalize the setup. import_sshkey.png

  15. Go to Settings → Deploy, set "Default OS release" to Ubuntu 24.04 LTS Noble Numbat, and save. maas_os-version_deployment.png

  16. Update the DNS nameserver IP address in both Jump and MaaS VM Netplan files from 10.0.110.254 to 10.0.110.252 and reapply the configuration.

K8s Master VMs

Suggested specifications:

  • vCPU: 8

  • RAM: 16GB

  • Storage: 100GB

  • Network interface: Bridge device, connected to mgmt-br

  1. Before provisioning the Kubernetes (K8s) Master VMs with MaaS, create the required virtual disks with empty storage. Use the following one-liner to create three 100 GB QCOW2 virtual disks:

    Hypervisor Console

    $ for i in $(seq 1 3); do qemu-img create -f qcow2 /var/lib/libvirt/images/master$i.qcow2 100G; done
    

     This command generates the following disks in the /var/lib/libvirt/images/ directory:

    • master1.qcow2

    • master2.qcow2

    • master3.qcow2

  2. Configure VMs in virt-manager:

    1. Open virt-manager and create three virtual machines:

      • Assign the corresponding virtual disk (master1.qcow2, master2.qcow2, or master3.qcow2) to each VM.

      • Configure each VM with the suggested specifications (vCPU, RAM, storage, and network interface).

    2. During the VM setup, ensure the NIC is selected under the Boot Options tab. This ensures the VMs can PXE boot for MaaS provisioning.

    3. Once the configuration is complete, shut down all the VMs.

  3. After the VMs are created and configured, proceed to provision them via the MaaS interface. MaaS will handle the OS installation and further setup as part of the deployment process.

Provision Master VMs and Worker Nodes Using MaaS

Master VMs
Install virsh and Set Up SSH Access
  1. SSH to the MaaS VM from the Jump node:

    MaaS Console

    depuser@jump:~$ ssh maas
    depuser@maas:~$ sudo -i
    
  2. Install the virsh client to communicate with the hypervisor:

    MaaS Console

    # apt install -y libvirt-clients
    
  3. Generate an SSH key for the root user and copy it to the hypervisor user in the libvirtd group:

    MaaS Console

    # ssh-keygen -t rsa
    # ssh-copy-id ubuntu@<hypervisor_MGMT_IP>
    
  4. Verify SSH access and virsh communication with the hypervisor:

    MaaS Console

    # virsh -c qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system list --all
    

    Expected output:

    MaaS Console

     Id   Name          State
    ------------------------------
     1    fw     running
     2    jump   running
     3    maas   running
     -    master1       shut off
     -    master2       shut off
     -    master3       shut off
    
  5. Copy the SSH key to the required MaaS directory (for snap-based installations):

    MaaS Console

    # mkdir -p /var/snap/maas/current/root/.ssh
    # cp .ssh/id_rsa* /var/snap/maas/current/root/.ssh/
    
Get MAC Addresses of the Master VMs

Retrieve the MAC addresses of the Master VMs:

MaaS Console
# for i in $(seq 1 3); do virsh -c qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system dumpxml master$i | grep 'mac address'; done


Example output:

MaaS Console
<mac address='52:54:00:a9:9c:ef'/>
<mac address='52:54:00:19:6b:4d'/>
<mac address='52:54:00:68:39:7f'/>
Add Master VMs to MaaS
  1. Add the Master VMs to MaaS:

    Once added, MaaS will automatically start the newly added VMs commissioning (discovery and introspection).


    MaaS Console

    # maas admin machines create hostname=master1 architecture=amd64/generic mac_addresses='52:54:00:a9:9c:ef' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master1 skip_bmc_config=1 testing_scripts=none
    Success.
    Machine-readable output follows:
    {
        "description": "",
        "status_name": "Commissioning",
    ...
        "status": 1, 
    ...
        "system_id": "c3seyq",
    ...
        "fqdn": "master1.dpf.rdg.local.domain",
        "power_type": "virsh",
    ...
        "status_message": "Commissioning",
        "resource_uri": "/MAAS/api/2.0/machines/c3seyq/"
    }
    
    # maas admin machines create hostname=master2 architecture=amd64/generic mac_addresses='52:54:00:19:6b:4d' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master2 skip_bmc_config=1 testing_scripts=none
    
    # maas admin machines create hostname=master3 architecture=amd64/generic mac_addresses='52:54:00:68:39:7f' power_type=virsh power_parameters_power_address=qemu+ssh://ubuntu@<hypervisor_MGMT_IP>/system power_parameters_power_id=master3 skip_bmc_config=1 testing_scripts=none
    


    Repeat the command for master2 and master3 with their respective MAC addresses.

  2. Verify commissioning by waiting for the status to change to "Ready" in MaaS. maas_masters_commission_virsh_updated.png
    After commissioning, the next phase is the deployment (OS provisioning).

Configure OVS Bridges on Master VMs

To be able to have persistency across reboots, create an OVS-bridge from each management interface of the master nodes and assign it a static IP address.

For each Master VM:

  1. Create an OVS bridge in the MaaS Network tab:

    1. Navigate to NetworkManagement InterfaceCreate Bridge.

    2. Configure as follows:

      1. Name: brenp1s0 (prefix br added to the interface name)

      2. Bridge Type: Open vSwitch (ovs)

      3. Subnet: 10.0.110.0/24

      4. IP Mode: Static Assign

      5. Address: Assign 10.0.110.1 for master1, 10.0.110.2 for master2, and 10.0.110.3 for master3. maas_master1_ovs_bridge_updated.png

  2. Save the interface settings for each VM.

Deploy Master VMs Using Cloud-Init
  1. Use the following cloud-init script to configure the necessary software and ensure OVS bridge persistency:

    Replace enp1s0 and brenp1s0 in the following cloud-init with your interface names as displayed in MaaS network tab.


    Master nodes cloud-init

    YAML
    #cloud-config
    system_info:
      default_user:
        name: depuser
        passwd: "$6$jOKPZPHD9XbG72lJ$evCabLvy1GEZ5OR1Rrece3NhWpZ2CnS0E3fu5P1VcZgcRO37e4es9gmriyh14b8Jx8gmGwHAJxs3ZEjB0s0kn/"
        lock_passwd: false
        groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
        sudo: ["ALL=(ALL) NOPASSWD:ALL"]
        shell: /bin/bash
    ssh_pwauth: True
    package_upgrade: true
    runcmd:
        - apt-get update
        - apt-get -y install openvswitch-switch nfs-common
        - |
          UPLINK_MAC=$(cat /sys/class/net/enp1s0/address)
          ovs-vsctl set Bridge brenp1s0 other-config:hwaddr=$UPLINK_MAC
          ovs-vsctl br-set-external-id brenp1s0 bridge-id brenp1s0 -- br-set-external-id brenp1s0 bridge-uplink enp1s0
    
  2. Deploy the master VMs:

    1. Select all three Master VMs → ActionsDeploy.

    2. Toggle Cloud-init user-data and paste the cloud-init script.

    3. Start the deployment and wait for the status to change to "Ubuntu 24.04 LTS". maas_master_vms_deployment_before.png maas_master_vms_deployment_complete_updated.png

Verify Deployment
  • SSH into the Master VMs from the Jump node:

    Jump Node Console

    depuser@jump:~$ ssh master1
    depuser@master1:~$
    
  • Run sudo without password:

    Master1 Console

    depuser@master1:~$ sudo -i
    root@master1:~#
    
  • Verify installed packages:

    Master1 Console

    root@master1:~# apt list --installed | egrep 'openvswitch-switch|nfs-common'
    nfs-common/noble,now 1:2.6.4-3ubuntu5.1 amd64 [installed]
    openvswitch-switch/noble-updates,now 3.3.0-1ubuntu3.1 amd64 [installed]
    
  • Check OVS bridge attributes:  

    Master1 Console

    root@master1:~# ovs-vsctl list bridge brenp1s0
    

    Output example:

    Master1 Console

    ...
    external_ids        : {bridge-id=brenp1s0, bridge-uplink=enp1s0, netplan="true", "netplan/global/set-fail-mode"=standalone, "netplan/mcast_snooping_enable"="false", "netplan/rstp_enable"="false"}
    ...
    other_config        : {hwaddr="52:54:00:a9:9c:ef"}
    ...
    
  • Verify that enp1s0 and brenp1s0 are configured with 9000 MTU (replace enp1s0 and brenp1s0 with your interface names):

    Master1 Console

    root@master1:~# ip a show enp1s0; ip a show brenp1s0
    2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master ovs-system state UP group default qlen 1000
        link/ether 52:54:00:a9:9c:ef brd ff:ff:ff:ff:ff:ff
    4: brenp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
        link/ether 52:54:00:a9:9c:ef brd ff:ff:ff:ff:ff:ff
        inet 10.0.110.1/24 brd 10.0.110.255 scope global brenp1s0
           valid_lft forever preferred_lft forever
        inet6 fe80::5054:ff:fea9:9cef/64 scope link
           valid_lft forever preferred_lft forever
    
Finalize Setup

Reboot the Master VMs to complete the provisioning. 

Master1 Console
root@master1:~# reboot
Worker Nodes
Create Worker Machines in MaaS
  1. Add the worker nodes to MaaS using ipmi as the power type. Replace placeholders with your specific IPMI credentials and IP addresses:

    MaaS Console

    # maas admin machines create hostname=worker1 architecture=amd64 power_type=ipmi power_parameters_power_driver=LAN_2_0 power_parameters_power_user=<IPMI_username_worker1> power_parameters_power_pass=<IPMI_password_worker1> power_parameters_power_address=<IPMI_address_worker1>
    

    Output example: 

    MaaS Console

    ...
    Success.
    Machine-readable output follows:
    {
        "description": "",
        "status_name": "Commissioning",
    ...
        "status": 1,
    ...
        "system_id": "pbskd3",
    ...
        "fqdn": "worker1.dpf.rdg.local.domain",
    ...
        "power_type": "ipmi",
    ...
        "resource_uri": "/MAAS/api/2.0/machines/pbskd3/"
    }
    
  2. Repeat the command for worker2 with its respective credentials:

    MaaS Console

    # maas admin machines create hostname=worker2 architecture=amd64 power_type=ipmi power_parameters_power_driver=LAN_2_0 power_parameters_power_user=<IPMI_username_worker2> power_parameters_power_pass=<IPMI_password_worker2> power_parameters_power_address=<IPMI_address_worker2>
    

Once added, MaaS will automatically start commissioning the worker nodes (discovery and introspection).

Create a Tag for Kernel Parameters

Create an entity called "Tag" to configure kernel parameters for the worker nodes.

  1. In the MaaS UI sidebar, go to Organization → Tags → Create New Tag and define

    • "Tag name": compute_performance

    • "Kernel options":

  2. Substitute the values for isolcpus, nohz_full, and rcu_nocbs to the CPU cores in the NUMA node which the BlueField-3 is connected to:

    If you are not sure in which NUMA node BlueField is connected to, you can later perform this step after the worker node is deployed (although redeployment would be necessary).


    Kernel options for worker nodes

    intel_iommu=on iommu=pt numa_balancing=disable processor.max_cstate=0 isolcpus=28-55,84-111 nohz_full=28-55,84-111 rcu_nocbs=28-55,84-111
    
  3. Apply the tag:

    1. Go to Machines → Select a worker node → ConfigurationEdit Tag → Select compute_performance → Save.

    2. Repeat for the other worker node.

Adjust Network Settings

For each worker node, configure the network interfaces:

  • Management Adapter:

    • Go to Network → Select the host management adapter (e.g., ens15f0) → Create Bridge

    • Name: br-dpu

    • Bridge Type: Standard

    • Subnet: 10.0.110.0/24

    • IP Mode: DHCP

    • Save the interface

Deploy Worker Nodes Using Cloud-Init
  1. Use the following cloud-init script for deployment. Replace ens5f0np0 with your actual interface name:

    Worker node cloud-init

    YAML
    #cloud-config
    system_info:
      default_user:
        name: depuser
        passwd: "$6$jOKPZPHD9XbG72lJ$evCabLvy1GEZ5OR1Rrece3NhWpZ2CnS0E3fu5P1VcZgcRO37e4es9gmriyh14b8Jx8gmGwHAJxs3ZEjB0s0kn/"
        lock_passwd: false
        groups: [adm, audio, cdrom, dialout, dip, floppy, lxd, netdev, plugdev, sudo, video]
        sudo: ["ALL=(ALL) NOPASSWD:ALL"]
        shell: /bin/bash
    ssh_pwauth: True
    package_upgrade: true
    runcmd:
      - apt-get update
      - apt-get -y install nfs-common
      - sysctl --system 
      - sed -i '/^\s*ens5f0np0:/,/^\s*mtu:/ { /^\s*mtu:/d }' /etc/netplan/*.yaml
      - netplan apply
    
  2. Deploy the worker nodes by selecting the worker nodes in MaaS → Actions → Deploy → Customize options → Enable Cloud-init user-data → Paste the cloud-init script → Deploy.

Verify Deployment

After the deployment is complete verify that the worker nodes have been deployed successfully with the following commands:

  • SSH without password from the jump node:

    Jump Node Console

    depuser@jump:~$ ssh worker1
    depuser@worker1:~$
    
  • Run sudo without password:

    Worker1 Console

    depuser@worker1:~$ sudo -i
    root@worker1:~#
    
  • Validate that nfs-common package was installed: 

    Worker1 Console

    root@worker1:~# apt list --installed | grep 'nfs-common'
    nfs-common/noble,now 1:2.6.4-3ubuntu5.1 amd64 [installed] 
    
  • /proc/cmdline is configured with the correct parameters and that IOMMU is indeed in passthrough mode:

    Worker1 Console

    root@worker1:~# cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinuz-6.8.0-90-generic root=UUID=60af5180-8c82-45cb-ba04-84a587d14317 ro intel_iommu=on iommu=pt numa_balancing=disable processor.max_cstate=0 isolcpus=28-55,84-111 nohz_full=28-55,84-111 rcu_nocbs=28-55,84-111
    
    root@worker1:~# dmesg | grep 'type: Passthrough'
    [    5.068360] iommu: Default domain type: Passthrough (set via kernel command line)
    
  • P0 interface has dhcp4 set to true and does not have mtu line in netplan configuration file.

    Worker1 Console

    root@worker1:~# cat /etc/netplan/50-cloud-init.yaml
    network:
    ...
    		ens5f0np0:
                dhcp4: true
                match:
                    macaddress: a0:88:c2:46:78:c4
                set-name: ens5f0np0
    ...
    
  • ens15f0 and br-dpu are with 9000 MTU (replace ens15f0 with your interface name):

    Worker1 Console

    root@worker1:~# ip a show ens15f0; ip a show br-dpu
    2: ens15f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq master br-dpu state UP group default qlen 1000
        link/ether 04:32:01:60:0d:da brd ff:ff:ff:ff:ff:ff
        altname enp53s0f0
    8: br-dpu: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
        link/ether 04:32:01:60:0d:da brd ff:ff:ff:ff:ff:ff
        inet 10.0.110.21/24 metric 100 brd 10.0.110.255 scope global dynamic br-dpu
           valid_lft 403sec preferred_lft 403sec
        inet6 fe80::632:1ff:fe60:dda/64 scope link
           valid_lft forever preferred_lft forever
    
Finalize Deployment

Reboot the worker nodes:

Jump Node Console
root@worker1:~# reboot

The infrastructure is now ready for the K8s deployment.

maas_worker_nodes_after_deployment_updated_2.png

K8s Cluster Deployment and Configuration

Kubespray Deployment and Configuration

In this solution, the Kubernetes (K8s) cluster is deployed using a modified Kubespray (based on tag v2.28.1) with a non-root depuser account from the Jump Node. The modifications in Kubespray are designed to meet the DPF prerequisites as described in the User Manual and facilitate cluster deployment and scaling.

  1. Download the modified Kubespray archive: modified_kubespray_v2.28.1.tar.gz

  2. Extract the contents and navigate to the extracted directory:

    Jump Node Console

    $ tar -xzf /home/depuser/modified_kubespray_v2.28.1.tar.gz
    $ cd kubespray/
    depuser@jump:~/kubespray$
    
  3. Set the K8s API VIP address and DNS record. Replace it with your own IP address and DNS record if different:

    Jump Node Console

    depuser@jump:~/kubespray$ sed -i '/# kube_vip_address:/s/.*/kube_vip_address: 10.0.110.10/' inventory/mycluster/group_vars/k8s_cluster/addons.yml
    depuser@jump:~/kubespray$ sed -i '/apiserver_loadbalancer_domain_name:/s/.*/apiserver_loadbalancer_domain_name: "kube-vip.dpf.rdg.local.domain"/' roles/kubespray_defaults/defaults/main/main.yml
    
  4. Install the necessary dependencies and set up the Python virtual environment:

    Jump Node Console

    depuser@jump:~/kubespray$ sudo apt -y install python3-pip jq python3.12-venv
    depuser@jump:~/kubespray$ python3 -m venv .venv
    depuser@jump:~/kubespray$ source .venv/bin/activate
    (.venv) depuser@jump:~/kubespray$ python3 -m pip install --upgrade pip
    (.venv) depuser@jump:~/kubespray$ pip install -U -r requirements.txt
    (.venv) depuser@jump:~/kubespray$ pip install ruamel-yaml
    
  5. Review and edit the inventory/mycluster/hosts.yaml file to define the cluster nodes. The following is the configuration for this deployment:

    • All of the nodes are already labeled and annotated as per DPF user manual prerequisites.

    • The worker nodes include additional kubelet configuration which will be applied during their deployment to achieve best performance, allowing:

      • Containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node.

      • Reserve some cores for the system using the reservedSystemCPUs option (kubelet requires a CPU reservation greater than zero to be made when the static policy is enabled), and make sure they belong to NUMA 0 (because the NIC in the example is wired to NUMA node 1, use cores from NUMA 1 if the NIC is wired to NUMA node 0).

      • Define the topology to be single-numa-node so it only allows a pod to be admitted if all requested CPUs and devices can be allocated from exactly one NUMA node.

    • The workers under thekube_node group are marked with # to only deploy the cluster with control plane nodes at the beginning (worker nodes will be added later on after the various components that are necessary for the DPF system are installed).


    inventory/mycluster/hosts.yaml

    YAML
    all:
      hosts:
        master1:
          ansible_host: 10.0.110.1
          ip: 10.0.110.1
          access_ip: 10.0.110.1
          node_labels:
            "k8s.ovn.org/zone-name": "master1"
        master2:
          ansible_host: 10.0.110.2
          ip: 10.0.110.2
          access_ip: 10.0.110.2
          node_labels:
            "k8s.ovn.org/zone-name": "master2"
        master3:
          ansible_host: 10.0.110.3
          ip: 10.0.110.3
          access_ip: 10.0.110.3
          node_labels:
            "k8s.ovn.org/zone-name": "master3"
        worker1:
          ansible_host: 10.0.110.21
          ip: 10.0.110.21
          access_ip: 10.0.110.21
          node_labels:
            "node-role.kubernetes.io/worker": ""
            "k8s.ovn.org/dpu-host": ""
            "k8s.ovn.org/zone-name": "worker1"
          node_annotations:
            "k8s.ovn.org/remote-zone-migrated": "worker1"
          kubelet_cpu_manager_policy: static
          kubelet_topology_manager_policy: single-numa-node
          kubelet_reservedSystemCPUs: 0-7
        worker2:
          ansible_host: 10.0.110.22
          ip: 10.0.110.22
          access_ip: 10.0.110.22
          node_labels:
            "node-role.kubernetes.io/worker": ""
            "k8s.ovn.org/dpu-host": ""
            "k8s.ovn.org/zone-name": "worker2"
          node_annotations:
            "k8s.ovn.org/remote-zone-migrated": "worker2"
          kubelet_cpu_manager_policy: static
          kubelet_topology_manager_policy: single-numa-node
          kubelet_reservedSystemCPUs: 0-7
      children:
        kube_control_plane:
          hosts:
            master1:
            master2:
            master3:
        kube_node:
          hosts:
    #       worker1:
    #       worker2:
        etcd:
          hosts:
            master1:
            master2:
            master3:
        k8s_cluster:
          children:
            kube_control_plane:
            kube_node:
    

Deploying Cluster Using Kubespray Ansible Playbook

  1. Run the following command from the Jump Node to initiate the deployment process:

    Ensure you are in the Python virtual environment (.venv) when running the command.

    Jump Node Console

    (.venv) depuser@jump:~/kubespray$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
    
  2. It takes a while for this deployment to complete. Make sure there are no errors. Successful result example:
    image-2025-9-15_16-24-5.png

    It is recommended to keep the shell from which Kubespray has been running open, later on it will be useful when performing cluster scale out to add the worker nodes.

K8s Deployment Verification

To simplify managing the K8s cluster from the Jump Host, set up kubectl with bash auto-completion.

  1. Copy kubectl and the kubeconfig file from master1 to the Jump Host:

    Jump Node Console

    ## Connect to master1
    depuser@jump:~$ ssh master1
    depuser@master1:~$ cp /usr/local/bin/kubectl /tmp/
    depuser@master1:~$ sudo cp /root/.kube/config /tmp/kube-config
    depuser@master1:~$ sudo chmod 644 /tmp/kube-config
    
  2. In another terminal tab, copy the files to the Jump Host:

    Jump Node Console

    depuser@jump:~$ scp master1:/tmp/kubectl /tmp/
    depuser@jump:~$ sudo chown root:root /tmp/kubectl
    depuser@jump:~$ sudo mv /tmp/kubectl /usr/local/bin/
    depuser@jump:~$ mkdir -p ~/.kube
    depuser@jump:~$ scp master1:/tmp/kube-config ~/.kube/config
    depuser@jump:~$ chmod 600 ~/.kube/config
    
  3. Enable bash auto-completion for kubectl:

    1. Verify if bash-completion is installed:

      Jump Node Console

      depuser@jump:~$ type _init_completion
      

      If installed, the output will include:

      Jump Node Console

      _init_completion is a function
      
    2. If not installed, install it:

      Jump Node Console

      depuser@jump:~$ sudo apt install -y bash-completion
      
    3. Set up the kubectl completion script:

      Jump Node Console

      depuser@jump:~$ kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null
      depuser@jump:~$ bash
      
  4. Check the status of the nodes in the cluster:

    Jump Node Console

    depuser@jump:~$ kubectl get nodes
    


    Expected output:

    Nodes will be in the NotReady state because the deployment did not include CNI components.

    Jump Node Console

    NAME      STATUS     ROLES           AGE   VERSION
    master1   NotReady   control-plane   42m   v1.31.12
    master2   NotReady   control-plane   41m   v1.31.12
    master3   NotReady   control-plane   41m   v1.31.12
    
  5. Check the pods in all namespaces:

    Jump Node Console

    depuser@jump:~$ kubectl get pods -A
    


    Expected output:

    coredns and dns-autoscaler pods will be in the Pending state due to the absence of CNI components.

    Jump Node Console

    NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
    kube-system   coredns-776bb9db5d-ndr7j          0/1     Pending   0          41m
    kube-system   dns-autoscaler-6ffb84bd6-xj9bv    0/1     Pending   0          41m
    kube-system   kube-apiserver-master1            1/1     Running   0          43m
    kube-system   kube-apiserver-master2            1/1     Running   0          42m
    kube-system   kube-apiserver-master3            1/1     Running   0          42m
    kube-system   kube-controller-manager-master1   1/1     Running   1          43m
    kube-system   kube-controller-manager-master2   1/1     Running   1          42m
    kube-system   kube-controller-manager-master3   1/1     Running   1          42m
    kube-system   kube-scheduler-master1            1/1     Running   1          43m
    kube-system   kube-scheduler-master2            1/1     Running   1          42m
    kube-system   kube-scheduler-master3            1/1     Running   1          42m
    kube-system   kube-vip-master1                  1/1     Running   0          43m
    kube-system   kube-vip-master2                  1/1     Running   0          42m
    kube-system   kube-vip-master3                  1/1     Running   0          42m
    


DPF Installation

Software Prerequisites and Required Variables

  1. Start by installing the remaining software perquisites.

    Jump Node Console

    ## Connect to master1 to copy helm client utility that was installed during kubespray deployment
    $ depuser@jump:~$ ssh master1
    depuser@master1:~$ cp /usr/local/bin/helm /tmp/
    
    ## In another tab 
    depuser@jump:~$ scp master1:/tmp/helm /tmp/
    depuser@jump:~$ sudo chown root:root /tmp/helm
    depuser@jump:~$ sudo mv /tmp/helm /usr/local/bin/
    
    ## Verify that envsubst utility is installed 
    depuser@jump:~$ which envsubst
    /usr/bin/envsubst
    
  2. Proceed to clone the doca-platform Git repository:

    Jump Node Console

    $ git clone https://github.com/NVIDIA/doca-platform.git
    
  3. Change directory to doca-platform and checkout to tag v25.10.0:

    Jump Node Console

    $ cd doca-platform/
    $ git checkout v25.10.0
    
  4. Change directory to readme.md from where all the commands will be run:

    Jump Node Console

    $ cd docs/public/user-guides/host-trusted/use-cases/ovnk/
    
  5. Use the following file to define the required variables for the installation:

    • Replace the values for the variables in the following file with the values that fit your setup. Specifically, pay attention to DPU_P0 , DPU_P0_VF1 and DPUCLUSTER_INTERFACE.

    manifests/00-env-vars/envvars.env

    Bash
    ## IP Address for the Kubernetes API server of the target cluster on which DPF is installed.
    ## This should never include a scheme or a port.
    ## e.g. 10.10.10.10
    export TARGETCLUSTER_API_SERVER_HOST=10.0.110.10
    
    ## Port for the Kubernetes API server of the target cluster on which DPF is installed.
    export TARGETCLUSTER_API_SERVER_PORT=6443
    
    ## IP address range for hosts in the target cluster on which DPF is installed.
    ## This is a CIDR in the form e.g. 10.10.10.0/24
    export TARGETCLUSTER_NODE_CIDR=10.0.110.0/24
    
    ## IP address range for VTEPs used by OVN Kubernetes. This should align with the VTEP CIDR used in the DHCP server that
    ## serves the high speed fabric. In configurations where different ranges are used per rack, the value should be set to
    ## the superset CIDR that includes all these ranges.
    ## This is a CIDR in the form e.g. 10.0.120.0/22
    export VTEP_CIDR=10.0.120.0/22
    
    ## Virtual IP used by the load balancer for the DPU Cluster. Must be a reserved IP from the management subnet and not allocated by DHCP.
    export DPUCLUSTER_VIP=10.0.110.200
    
    ## DPU_P0 is the name of the first port of the DPU. This name must be the same on all worker nodes.
    export DPU_P0=ens5f0np0
    
    ## DPU_P0_VF1 is the name of the second Virtual Function (VF) of the first port of the DPU. This name must be the same on all worker nodes.
    ## Note: The VF will be created after the DPU is provisioned and the phase "Host Network Configuration" is completed.
    export DPU_P0_VF1=ens5f0v1
    
    ## Interface/bridge on which the DPUCluster load balancer will listen. Should be the management interface/bridge of the control plane node.
    export DPUCLUSTER_INTERFACE=brenp1s0
    
    ## IP address to the NFS server used as storage for the BFB.
    export NFS_SERVER_IP=10.0.110.253
    
    ## The repository URL for the NVIDIA Helm chart registry.
    ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
    export HELM_REGISTRY_REPO_URL=https://helm.ngc.nvidia.com/nvidia/doca
    
    ## The repository URL for the OVN-Kubernetes Helm chart.
    ## Usually this is the NVIDIA GHCR repository. For development purposes, this can be set to a different repository.
    export OVN_KUBERNETES_REPO_URL=oci://ghcr.io/nvidia
    
    ## POD_CIDR is the CIDR used for pods in the target Kubernetes cluster.
    export POD_CIDR=10.233.64.0/18
    
    ## SERVICE_CIDR is the CIDR used for services in the target Kubernetes cluster.
    ## This is a CIDR in the form e.g. 10.10.10.0/24
    export SERVICE_CIDR=10.233.0.0/18
    
    ## The DPF REGISTRY is the Helm repository URL where the DPF Operator Chart resides.
    ## Usually this is the NVIDIA Helm NGC registry. For development purposes, this can be set to a different repository.
    export REGISTRY=https://helm.ngc.nvidia.com/nvidia/doca
    
    ## The DPF TAG is the version of the DPF components which will be deployed in this guide.
    export TAG=v25.10.0
    
    ## URL to the BFB used in the `bfb.yaml` and linked by the DPUSet.
    export BFB_URL="https://content.mellanox.com/BlueField/BFBs/Ubuntu24.04/bf-bundle-3.2.1-34_25.11_ubuntu-24.04_64k_prod.bfb"
    
    
    
  6. Export environment variables for the installation:

    Jump Node Console

    $ source manifests/00-env-vars/envvars.env
    

CNI Installation 

OVN Kubernetes is used as the primary CNI for the cluster. On worker nodes, the primary CNI will be accelerated by offloading work to the DPU. On control plane nodes, OVN Kubernetes will run without offloading.

  1. Create the NS for the CNI:

    Jump Node Console

    $ kubectl create ns ovn-kubernetes
    
  2. Install the OVN Kubernetes CNI components from the helm chart substituting the environment variables with the ones we defined before.

    Note that MTU field with value of 8940 has been added to the yaml to override the default value and to be able to achieve better performance results.

    YAML
    commonManifests:
      enabled: true
    nodeWithoutDPUManifests:
      enabled: true
    controlPlaneManifests:
      enabled: true
    nodeWithDPUManifests:
      enabled: true
      nodeMgmtPortNetdev: $DPU_P0_VF1
      dpuServiceAccountNamespace: dpf-operator-system
    gatewayOpts: --gateway-interface=$DPU_P0
    k8sAPIServer: https://$TARGETCLUSTER_API_SERVER_HOST:$TARGETCLUSTER_API_SERVER_PORT
    ## Note this CIDR is followed by a trailing /24 which informs OVN Kubernetes on how to split the CIDR per node.
    podNetwork: $POD_CIDR/24
    serviceNetwork: $SERVICE_CIDR
    mtu: 8940
    
    
  3. Run the following command: 

    Jump Node Console

    $ envsubst < manifests/01-cni-installation/helm-values/ovn-kubernetes.yml | helm upgrade --install -n ovn-kubernetes ovn-kubernetes ${OVN_KUBERNETES_REPO_URL}/ovn-kubernetes-chart --version $TAG --values - 
    
  4. Verify the CNI installation:

    The following verification commands may need to be run multiple times to ensure the condition is met.

    Jump Node Console

    $ kubectl wait --for=condition=ready --namespace ovn-kubernetes pods --all --timeout=300s
    pod/ovn-kubernetes-cluster-manager-54b48f96d4-9b29q condition met
    pod/ovn-kubernetes-node-7bg2p condition met
    pod/ovn-kubernetes-node-jfmbh condition met
    pod/ovn-kubernetes-node-pt75h condition met
     
    $ kubectl wait --for=condition=ready nodes --all
    node/master1 condition met
    node/master2 condition met
    node/master3 condition met
    
    
    

DPF Operator Installation 

Create Storage Required by the DPF Operator

  • YAML
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: bfb-pv
    spec:
      capacity:
        storage: 10Gi
      volumeMode: Filesystem
      accessModes:
        - ReadWriteMany
      nfs:
        path: /mnt/dpf_share/bfb
        server: $NFS_SERVER_IP
      persistentVolumeReclaimPolicy: Delete
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: bfb-pvc
      namespace: dpf-operator-system
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 10Gi
      volumeMode: Filesystem
      storageClassName: ""
    
    
    
  • Run the following command to substitute the environment variables using envsubst and apply the yaml files:

    Jump Node Console

    $ kubectl create ns dpf-operator-system
    $ cat manifests/02-dpf-operator-installation/*.yaml | envsubst | kubectl apply -f -
    

Additional Dependencies

  1. The DPF Operator requires several prerequisite components to function properly in a Kubernetes environment. Starting with DPF v25.7, all Helm dependencies have been removed from the DPF chart. This means that all dependencies must be installed manually before installing the DPF chart itself. The following commands describe an opiniated approach to install those dependencies (for more information, check: Helm Prerequisites - NVIDIA Docs). 

    1. Install helmfile binary:

      Jump Node Console

      $ wget https://github.com/helmfile/helmfile/releases/download/v1.1.2/helmfile_1.1.2_linux_amd64.tar.gz
      $ tar  -xvf helmfile_1.1.2_linux_amd64.tar.gz
      $ sudo mv ./helmfile /usr/local/bin/
      
    2. Change directory to doca-platform:

      Use another shell from the one where you run all the other installation commands for DPF.


      Jump Node Console

      $ cd doca-platform/
      
    3. Install Helm dependencies using the following command:

      Jump Node Console

      $ make HELMFILE_FILE=deploy/helmfiles/prereqs.yaml test-deploy-helmfile
      


  2. Ensure that the KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT environment variables are set in the node-feature-discovery-worker DaemonSet:

    Run this command from the previous shell where the environment variables were exported.


    Jump Node Console

    $ kubectl -n dpf-operator-system set env daemonset/node-feature-discovery-worker \
    KUBERNETES_SERVICE_HOST=$TARGETCLUSTER_API_SERVER_HOST \
    KUBERNETES_SERVICE_PORT=$TARGETCLUSTER_API_SERVER_PORT
    

DPF Operator Deployment 

  1. Run the following command to substitute the environment variables and install the DPF Operator:

    Jump Node Console

    $ helm repo add --force-update dpf-repository ${REGISTRY}
    $ helm repo update
    $ helm upgrade --install -n dpf-operator-system dpf-operator dpf-repository/dpf-operator --version=$TAG
    
    
  2. Verify the DPF Operator installation by ensuring the deployment is available and all the pods are ready:

    The following verification commands may need to be run multiple times to ensure the conditions are met.

    Jump Node Console

    $ kubectl rollout status deployment --namespace dpf-operator-system dpf-operator-controller-manager
    deployment "dpf-operator-controller-manager" successfully rolled out
    
    $ kubectl wait --for=condition=ready --namespace dpf-operator-system pods --all
    pod/argo-cd-argocd-application-controller-0 condition met
    pod/argo-cd-argocd-redis-6c6b84f6fb-xj5jg condition met
    pod/argo-cd-argocd-repo-server-65cfb96746-r2rmr condition met
    pod/argo-cd-argocd-server-5bbdb4b6b9-4dwhm condition met
    pod/dpf-operator-controller-manager-5dd7555c6d-dqmdt condition met
    pod/kamaji-95587fbc7-sn45q condition met
    pod/kamaji-etcd-0 condition met
    pod/kamaji-etcd-1 condition met
    pod/kamaji-etcd-2 condition met
    pod/maintenance-operator-74bd5774b7-lssgq condition met
    pod/node-feature-discovery-gc-6b48f49cc4-6mmsd condition met
    pod/node-feature-discovery-master-747d789485-d5x2s condition met
    


DPF System Installation 

This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.

  1. The following YAML files define the DPFOperatorConfig to install the DPF System components and the DPUCluster to serve as Kubernetes control plane for DPU nodes.

    • Note that to achieve high performance results you need to adjust the operatorconfig.yaml to support MTU 9000.

    YAML
    ---
    apiVersion: operator.dpu.nvidia.com/v1alpha1
    kind: DPFOperatorConfig
    metadata:
      name: dpfoperatorconfig
      namespace: dpf-operator-system
    spec:
      overrides:
        kubernetesAPIServerVIP: $TARGETCLUSTER_API_SERVER_HOST
        kubernetesAPIServerPort: $TARGETCLUSTER_API_SERVER_PORT
      provisioningController:
        bfbPVCName: "bfb-pvc"
        dmsTimeout: 900
      kamajiClusterManager:
        disable: false
      networking:
        controlPlaneMTU: 9000
        highSpeedMTU: 9000
    
    
    
    YAML
    ---
    apiVersion: provisioning.dpu.nvidia.com/v1alpha1
    kind: DPUCluster
    metadata:
      name: dpu-cplane-tenant1
      namespace: dpu-cplane-tenant1
    spec:
      type: kamaji
      maxNodes: 10
      clusterEndpoint:
        # deploy keepalived instances on the nodes that match the given nodeSelector.
        keepalived:
          # interface on which keepalived will listen. Should be the oob interface of the control plane node.
          interface: $DPUCLUSTER_INTERFACE
          # Virtual IP reserved for the DPU Cluster load balancer. Must not be allocatable by DHCP.
          vip: $DPUCLUSTER_VIP
          # virtualRouterID must be in range [1,255], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host
          virtualRouterID: 126
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
    
    
    
  2. Create NS for the Kubernetes control plane of the DPU nodes:

    Jump Node Console

    $ kubectl create ns dpu-cplane-tenant1
    
  3. Apply the previous YAML files:

    Jump Node Console

    $ cat manifests/03-dpf-system-installation/*.yaml | envsubst | kubectl apply -f -
    
  4. Verify the DPF system by ensuring that the provisioning and DPUService controller manager deployments are available, that all other deployments in the DPF Operator system are available, and that the DPUCluster is ready for nodes to join. 

    Jump Node Console

    $ kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager
    deployment "dpf-provisioning-controller-manager" successfully rolled out
    deployment "dpuservice-controller-manager" successfully rolled out
    
    $ kubectl rollout status deployment --namespace dpf-operator-system
    deployment "argo-cd-argocd-applicationset-controller" successfully rolled out
    deployment "argo-cd-argocd-redis" successfully rolled out
    deployment "argo-cd-argocd-repo-server" successfully rolled out
    deployment "argo-cd-argocd-server" successfully rolled out
    deployment "dpf-operator-controller-manager" successfully rolled out
    deployment "dpf-provisioning-controller-manager" successfully rolled out
    deployment "dpuservice-controller-manager" successfully rolled out
    deployment "kamaji" successfully rolled out
    deployment "kamaji-cm-controller-manager" successfully rolled out
    deployment "maintenance-operator" successfully rolled out
    deployment "node-feature-discovery-gc" successfully rolled out
    deployment "node-feature-discovery-master" successfully rolled out
    deployment "servicechainset-controller-manager" successfully rolled out
    
    $ kubectl wait --for=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all
    dpucluster.provisioning.dpu.nvidia.com/dpu-cplane-tenant1 condition met
    

Install Components to Enable Accelerated CNI Nodes

OVN Kubernetes accelerates traffic by attaching a VF to each pod using the primary CNI. This VF is used to offload flows to the DPU. This section details the components needed to connect pods to the offloaded OVN Kubernetes CNI.

 Install Multus and SRIOV Network Operator using NVIDIA Network Operator

  1. Add the NVIDIA Network Operator Helm repository:

    Jump Node Console

    $ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia --force-update
    
  2. The following network-operator.yaml values file will be applied:

    YAML
    nfd:
      enabled: false
      deployNodeFeatureRules: false
    sriovNetworkOperator:
      enabled: true
    sriov-network-operator:
      operator:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: node-role.kubernetes.io/master
                      operator: Exists
                - matchExpressions:
                    - key: node-role.kubernetes.io/control-plane
                      operator: Exists
      crds:
        enabled: true
      sriovOperatorConfig:
        deploy: true
        configDaemonNodeSelector: null
    operator:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/master
                    operator: Exists
              - matchExpressions:
                  - key: node-role.kubernetes.io/control-plane
                    operator: Exists
    
    
    
    
    


    Deploy the operator:

    Jump Node Console

    $ helm upgrade --no-hooks --install --create-namespace --namespace nvidia-network-operator network-operator nvidia/network-operator --version 25.7.0 -f ./manifests/04-enable-accelerated-cni/helm-values/network-operator.yml
    
  3. Ensure all the pods in nvidia-network-operator namespace are ready:

    Jump Node Console

    $ kubectl wait --for=condition=ready --namespace nvidia-network-operator pods --all
    pod/network-operator-66b5cdbc79-ghjcn condition met
    pod/network-operator-sriov-network-operator-6b87b5cf96-tbkxm condition met
    

 Install OVN Kubernetes resource injection webhook

The OVN Kubernetes resource injection webhook is injected into each pod scheduled to a worker node with a request for a VF and a Network Attachment Definition. This webhook is part of the same helm chart as the other components of the OVN Kubernetes CNI. Here it is installed by adjusting the existing helm installation to add the webhook component to the installation.

  1. The following ovn-kubernetes.yaml values file will be applied:

    YAML
    ovn-kubernetes-resource-injector:
      ## Enable the ovn-kubernetes-resource-injector
      enabled: true
    
  2. Run the following commands:

    Jump Node Console

    $ envsubst < manifests/04-enable-accelerated-cni/helm-values/ovn-kubernetes.yml | helm upgrade --install -n ovn-kubernetes ovn-kubernetes-resource-injector ${OVN_KUBERNETES_REPO_URL}/ovn-kubernetes-chart --version $TAG --values -
    

     

  3. Verify that the resource injector deployment successfully rolled out.

    Jump Node Console

    $ kubectl rollout status deployment --namespace ovn-kubernetes ovn-kubernetes-resource-injector
    deployment "ovn-kubernetes-resource-injector" successfully rolled out
    

 Apply NicClusterPolicy and SriovNetworkNodePolicy

  1. The following NicClusterPolicy and SriovNetworkNodePolicy configuration files should be applied.

    YAML
    ---
    apiVersion: mellanox.com/v1alpha1
    kind: NicClusterPolicy
    metadata:
      name: nic-cluster-policy
    spec:
      secondaryNetwork:
        multus:
          image: multus-cni
          imagePullSecrets: []
          repository: ghcr.io/k8snetworkplumbingwg
          version: v3.9.3
    
    
    
    YAML
    ---
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovNetworkNodePolicy
    metadata:
      name: bf3-p0-vfs
      namespace: nvidia-network-operator
    spec:
      nicSelector:
        deviceID: "a2dc"
        vendor: "15b3"
        pfNames:
        - $DPU_P0#2-45
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      numVfs: 46
      resourceName: bf3-p0-vfs
      isRdma: true
      externallyManaged: true
      deviceType: netdevice
      linkType: eth
    
    
    


    Apply those configuration files:

    Jump Node Console

    $ cat manifests/04-enable-accelerated-cni/*.yaml | envsubst | kubectl apply -f -
    
  2. Verify the DPF system by ensuring that the following DaemonSets were successfully rolled out:

    Jump Node Console

    $ kubectl wait --for=condition=Ready --namespace nvidia-network-operator pods --all
    pod/network-operator-6cc6cfb48-nxj94 condition met
    pod/network-operator-sriov-network-operator-7b5f54db8c-4nfrg condition met
    $ kubectl rollout status daemonset --namespace nvidia-network-operator kube-multus-ds sriov-network-config-daemon sriov-device-plugin 
    daemon set "kube-multus-ds" successfully rolled out
    daemon set "sriov-network-config-daemon" successfully rolled out
    daemon set "sriov-device-plugin" successfully rolled out
    

     

DPU Provisioning and Service Installation  

  1. Before deploying the objects under manifests/05-dpudeployment-installationdirectory, few adjustments need to be made to later achieve better performance results. 

    1. Create a new DPUFlavor using the following YAML:

      • The parameter NUM_VF_MSIX is configured to be 48 in the provided example, which is suited for the servers that were used in this RDG.
        Set it to the physical number of cores in the NUMA node the NIC is located in. 

      • Hugepages amount is increased to 8072.

      YAML
      ---
      apiVersion: provisioning.dpu.nvidia.com/v1alpha1
      kind: DPUFlavor
      metadata:
      name: ovnk-$TAG
      namespace: dpf-operator-system
      spec:
      grub:
      kernelParameters:
      - console=hvc0
      - console=ttyAMA0
      - earlycon=pl011,0x13010000
      - fixrttc
      - net.ifnames=0
      - biosdevname=0
      - iommu.passthrough=1
      - cgroup_no_v1=net_prio,net_cls
      - hugepagesz=2048kB
      - hugepages=8072
      nvconfig:
      - device: "*"
      parameters:
      - PF_BAR2_ENABLE=0
      - PER_PF_NUM_SF=1
      - PF_TOTAL_SF=20
      - PF_SF_BAR_SIZE=10
      - NUM_PF_MSIX_VALID=0
      - PF_NUM_PF_MSIX_VALID=1
      - PF_NUM_PF_MSIX=228
      - INTERNAL_CPU_MODEL=1
      - INTERNAL_CPU_OFFLOAD_ENGINE=0
      - SRIOV_EN=1
      - NUM_OF_VFS=46
      - LAG_RESOURCE_ALLOCATION=1
      - LINK_TYPE_P1=ETH
      - LINK_TYPE_P2=ETH
      - NUM_VF_MSIX=48
      ovs:
      rawConfigScript: |
      _ovs-vsctl() {
      ovs-vsctl --no-wait --timeout 15 "$@"
      }
      
      _ovs-vsctl set Open_vSwitch . other_config:doca-init=true
      _ovs-vsctl set Open_vSwitch . other_config:dpdk-max-memzones=50000
      _ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
      _ovs-vsctl set Open_vSwitch . other_config:pmd-quiet-idle=true
      _ovs-vsctl set Open_vSwitch . other_config:max-idle=20000
      _ovs-vsctl set Open_vSwitch . other_config:max-revalidator=5000
      _ovs-vsctl set Open_vSwitch . other_config:ctl-pipe-size=1024
      _ovs-vsctl --if-exists del-br ovsbr1
      _ovs-vsctl --if-exists del-br ovsbr2
      _ovs-vsctl --may-exist add-br br-sfc
      _ovs-vsctl set bridge br-sfc datapath_type=netdev
      _ovs-vsctl set bridge br-sfc fail_mode=secure
      _ovs-vsctl --may-exist add-port br-sfc p0
      _ovs-vsctl set Interface p0 type=dpdk
      _ovs-vsctl set Interface p0 mtu_request=9216
      _ovs-vsctl set Port p0 external_ids:dpf-type=physical
      
      _ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-datapath-type=netdev
      _ovs-vsctl --may-exist add-br br-ovn
      _ovs-vsctl set bridge br-ovn datapath_type=netdev
      _ovs-vsctl br-set-external-id br-ovn bridge-id br-ovn
      _ovs-vsctl br-set-external-id br-ovn bridge-uplink puplinkbrovntobrsfc
      _ovs-vsctl set Interface br-ovn mtu_request=9216
      _ovs-vsctl --may-exist add-port br-ovn pf0hpf
      _ovs-vsctl set Interface pf0hpf type=dpdk
      _ovs-vsctl set Interface pf0hpf mtu_request=9216
      
      bfcfgParameters:
      - UPDATE_ATF_UEFI=yes
      - UPDATE_DPU_OS=yes
      - WITH_NIC_FW_UPDATE=yes
      
      hostNetworkInterfaceConfigs:
      - portNumber: 0
      dhcp: true
      mtu: 9000
      
      configFiles:
      - path: /etc/mellanox/mlnx-bf.conf
      operation: override
      raw: |
      ALLOW_SHARED_RQ="no"
      IPSEC_FULL_OFFLOAD="no"
      ENABLE_ESWITCH_MULTIPORT="yes"
      permissions: "0644"
      - path: /etc/mellanox/mlnx-ovs.conf
      operation: override
      raw: |
      CREATE_OVS_BRIDGES="no"
      OVS_DOCA="yes"
      permissions: "0644"
      - path: /etc/mellanox/mlnx-sf.conf
      operation: override
      raw: ""
      permissions: "0644"
      


    2. Set the mtu to 8940 for the OVN DPUServiceConfig (to deploy the OVN Kubernetes workloads on the DPU with the same MTU as in the host):

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceConfiguration
      metadata:
        name: ovn
        namespace: dpf-operator-system
      spec:
        deploymentServiceName: "ovn"
        serviceConfiguration:
          helmChart:
            values:
              k8sAPIServer: https://$TARGETCLUSTER_API_SERVER_HOST:$TARGETCLUSTER_API_SERVER_PORT
              podNetwork: $POD_CIDR/24
              serviceNetwork: $SERVICE_CIDR
              mtu: 8940
              dpuManifests:
                kubernetesSecretName: "ovn-dpu" # user needs to populate based on DPUServiceCredentialRequest
                vtepCIDR: $VTEP_CIDR
                hostCIDR: $TARGETCLUSTER_NODE_CIDR
                externalDHCP: true
                gatewayDiscoveryNetwork: "169.254.99.100/32" # This is a "dummy" subnet used to get the default gateway address from DHCP server (via option 121)
      
      
      
  2. The rest of the configuration files remain the same, including:

    • OVN DPUServiceCredentialRequest to allow cross cluster communication.

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceCredentialRequest
      metadata:
        name: ovn-dpu
        namespace: dpf-operator-system
      spec:
        serviceAccount:
          name: ovn-dpu
          namespace: dpf-operator-system
        duration: 24h
        type: tokenFile
        secret:
          name: ovn-dpu
          namespace: dpf-operator-system
        metadata:
          labels:
            dpu.nvidia.com/image-pull-secret: ""
      
      
      
    • DPUServiceInterfaces for physical ports on the DPU.

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceInterface
      metadata:
        name: p0
        namespace: dpf-operator-system
      spec:
        template:
          spec:
            template:
              metadata:
                labels:
                  uplink: "p0"
              spec:
                interfaceType: physical
                physical:
                  interfaceName: p0
      
      
      
    • OVN DPUServiceInterface to define the ports attached to OVN workloads on the DPU.

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceInterface
      metadata:
        name: ovn
        namespace: dpf-operator-system
      spec:
        template:
          spec:
            template:
              metadata:
                labels:
                  port: ovn
              spec:
                interfaceType: ovn
      
      
      
    • BFB to download BlueField Bitstream to a shared volume.

      YAML
      ---
      apiVersion: provisioning.dpu.nvidia.com/v1alpha1
      kind: BFB
      metadata:
        name: bf-bundle-$TAG
        namespace: dpf-operator-system
      spec:
        url: $BFB_URL
      
      
      
    • OVN DPUServiceTemplate to deploy OVN Kubernetes workloads to the DPUs.

      YAML
      ---
      apiVersion: svc.dpu.nvidia.com/v1alpha1
      kind: DPUServiceTemplate
      metadata:
        name: ovn
        namespace: dpf-operator-system
      spec:
        deploymentServiceName: "ovn"
        helmChart:
          source:
            repoURL: $OVN_KUBERNETES_REPO_URL
            chart: ovn-kubernetes-chart
            version: $TAG
          values:
            commonManifests:
              enabled: true
            dpuManifests:
              enabled: true
            leaseNamespace: "ovn-kubernetes"
            gatewayOpts: "--gateway-interface=br-ovn"
      
      
      
  3. Apply all of the YAML files mentioned above using the following command:

    Jump Node Console

    $ cat manifests/05-dpudeployment-installation/*.yaml | envsubst | kubectl apply -f - 
    
  4. Verify the DPUService installation by ensuring the DPUServices are created and have been reconciled, that the DPUServiceInterfaces have been reconciled, and that the DPUServiceChains have been reconciled:

    Notes
    • These verification commands may need to be run multiple times to ensure the conditions are met.

    • When using DPUDeployment, the DPUService name will have the DPUDeployment name added as prefix. For example, ovn-vwxyz.

    Jump Node Console

    $ kubectl wait --for=condition=ApplicationsReconciled --namespace dpf-operator-system dpuservices -l svc.dpu.nvidia.com/owned-by-dpudeployment=dpf-operator-system_ovn
    a.com/owned-by-dpudeployment=dpf-operator-system_ovn
    dpuservice.svc.dpu.nvidia.com/blueman-5bdx6 condition met
    dpuservice.svc.dpu.nvidia.com/dts-s7xsm condition met
    dpuservice.svc.dpu.nvidia.com/ovn-cpfjf condition met
    
    $ kubectl wait --for=condition=ServiceInterfaceSetReady --namespace dpf-operator-system dpuserviceinterface --all
    dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met
    dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met
    
    $ kubectl wait --for=condition=ServiceChainSetReady --namespace dpf-operator-system dpuservicechain --all
    dpuservicechain.svc.dpu.nvidia.com/ovn-wqq8h condition met
    

K8s Cluster Scale-out 

Add Worker Nodes to the Cluster 

At this point workers should be added to the cluster. As workers are added to the cluster, DPUs will be provisioned and DPUServices will begin to be spun up.

  1. Return to the shell where Kubespray was previously run to deploy the cluster, uncomment the workers under the kube_node group in the hosts.yaml file, and add the worker nodes to the cluster:

    Ensure you are in the Python virtual environment (.venv) when running the command.

    Jump Node Console

    (.venv) depuser@jump:~/kubespray$ cat inventory/mycluster/hosts.yaml
    ...
        kube_node:
          hosts:
            worker1:
            worker2: 
    ...
    
    (.venv) depuser@jump:~/kubespray$ ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root scale.yml
    
  2. The scale-out shouldn't take a long time, and a successful run should look similar to the following output: image-2025-9-16_10-45-46.png

 Verification

  1. To follow the progress of the DPU provisioning, run the following command to check in which phase it currently is:

    Jump Node Console

    $ watch -n10 "kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'"
    Every 10.0s: kubectl describe dpu -n dpf-operator-system | grep 'Node Name\|Type\|Last\|Phase'                                                                                                          
    
    Type:       InternalIP
        Type:       Hostname
        Last Transition Time:  2026-01-04T12:14:24Z
        Type:                  Ready
        Last Transition Time:  2026-01-04T11:39:31Z
        Type:                  BFBPrepared
        Last Transition Time:  2026-01-04T11:39:05Z
        Type:                  BFBReady
        Last Transition Time:  2026-01-04T12:02:55Z
        Type:                  DPUClusterReady
        Last Transition Time:  2026-01-04T11:39:05Z
        Type:                  Initialized
        Last Transition Time:  2026-01-04T11:39:30Z
        Type:                  NodeEffectReady
        Last Transition Time:  2026-01-04T12:14:24Z
        Type:                  NodeEffectRemoved
        Last Transition Time:  2026-01-04T11:55:43Z
        Type:                  CheckedHostRebootNeed
        Last Transition Time:  2026-01-04T11:39:31Z
        Type:                  FWConfigured
        Last Transition Time:  2026-01-04T12:02:51Z
        Type:                  HostNetworkReady
        Last Transition Time:  2026-01-04T11:39:30Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2026-01-04T11:55:42Z
        Type:                  OSInstalled
        Last Transition Time:  2026-01-04T12:01:23Z
        Type:                  Rebooted
      Phase:                Ready
      Dpu Node Name:                                      worker2
        Type:       InternalIP
        Type:       Hostname
        Last Transition Time:  2026-01-04T12:13:24Z
        Type:                  Ready
        Last Transition Time:  2026-01-04T11:39:26Z
        Type:                  BFBPrepared
        Last Transition Time:  2026-01-04T11:39:00Z
        Type:                  BFBReady
        Last Transition Time:  2026-01-04T12:01:45Z
        Type:                  DPUClusterReady
        Last Transition Time:  2026-01-04T11:38:59Z
        Type:                  Initialized
        Last Transition Time:  2026-01-04T11:39:24Z
        Type:                  NodeEffectReady
        Last Transition Time:  2026-01-04T12:13:23Z
        Type:                  NodeEffectRemoved
        Last Transition Time:  2026-01-04T11:54:58Z
        Type:                  CheckedHostRebootNeed
        Last Transition Time:  2026-01-04T11:39:26Z
        Type:                  FWConfigured
        Last Transition Time:  2026-01-04T12:01:39Z
        Type:                  HostNetworkReady
        Last Transition Time:  2026-01-04T11:39:25Z
        Type:                  InterfaceInitialized
        Last Transition Time:  2026-01-04T11:54:56Z
        Type:                  OSInstalled
        Last Transition Time:  2026-01-04T12:00:09Z
        Type:                  Rebooted
      Phase:                Ready
    
  2. Validate that the DPUs have been provisioned successfully by ensuring they're in ready state:

    Jump Node Console

    $ kubectl wait --for=condition=ready --namespace dpf-operator-system dpu --all
    dpu.provisioning.dpu.nvidia.com/worker1-mt2404xz0c98 condition met
    dpu.provisioning.dpu.nvidia.com/worker2-mt2333xz0xq3 condition met
    
  3. Ensure that the following DaemonSets have 2 ready replicas:

    Jump Node Console

    $ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace nvidia-network-operator kube-multus-ds sriov-network-config-daemon sriov-device-plugin
    daemonset.apps/kube-multus-ds condition met
    daemonset.apps/sriov-network-config-daemon condition met
    daemonset.apps/sriov-device-plugin condition met
    
    $ kubectl wait ds --for=jsonpath='{.status.numberReady}'=2 --namespace ovn-kubernetes ovn-kubernetes-node-dpu-host
    daemonset.apps/ovn-kubernetes-node-dpu-host condition met
    
  4. Validate that all the different DPUServices,  DPUServiceInterfaces and DPUServiceChains objects are now in ready state

    Jump Node Console

    $ kubectl wait --for=condition=ServiceInterfaceSetReady --namespace dpf-operator-system dpuserviceinterface --all
    dpuserviceinterface.svc.dpu.nvidia.com/ovn condition met
    dpuserviceinterface.svc.dpu.nvidia.com/p0 condition met
    
    $ kubectl wait --for=condition=ServiceChainSetReady --namespace dpf-operator-system dpuservicechain --all
    dpuservicechain.svc.dpu.nvidia.com/ovn-wqq8h condition met
    
    $ kubectl -n dpf-operator-system exec deployment/dpf-operator-controller-manager -- /dpfctl describe all --show-resources=dpu --show-conditions=dpu
    NAME NAMESPACE STATUS REASON SINCE MESSAGE
    DPFOperatorConfig/dpfoperatorconfig dpf-operator-system Ready: True Success 17m
    └─DPUs
    └─2 DPUs... dpf-operator-system Ready: True DPUReady 12m See worker1-mt2404xz0c98, worker2-mt2333xz0xq3
    

Congratulations, the DPF system has been successfully installed!

Infrastructure Bandwidth Validation 

Verify the deployment and that you can reach link-speed performance results on the DPF system by using various tests:

  1. RDMA bandwidth measurement

  2. Iperf TCP bandwidth measurement 

Each of the tests is described thoroughly. At the end of each test, you'll see the achieved performance. 

Make sure that the servers are tuned for maximum performance (not covered in this document).  

Performance Tests

RoCE Bandwidth Test 

  1. Apply the following NetworkPolicy to enable stateless traffic:

    stateless_netpolicy.yaml

    YAML
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: multi-port-egress
      namespace: default
      annotations:
        k8s.ovn.org/acl-stateless: "true"
    spec:
      podSelector: {}
      policyTypes:
      - Egress
      - Ingress
      egress:
       - {}
      ingress:
       - {}
    

    Jump Node Console

    $ kubectl apply -f stateless_netpolicy.yaml
    
  2. Create a test Deployment using the following YAML to create 2 replicas on 2 different worker nodes:

    The container image specified below must include NVIDIA user space drivers and perftest 

    testapp-performance-test-deployment.yaml

    YAML
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: testapp-performance
      labels:
        app: testapp-performance
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: testapp-performance
      template:
        metadata:
          labels:
            app: testapp-performance
        spec:
          topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: testapp-performance
          containers:
          - name: testapp-pod
            image: <container_image>
            imagePullPolicy: Always
            command: ['sh', '-c', 'trap : TERM INT; sleep infinity & wait']
            securityContext:
              capabilities:
                add: [ "IPC_LOCK" ]
            resources:
              requests:
                cpu: '24'
                memory: '8Gi'
              limits:
                cpu: '24'
                memory: '8Gi'
    
  3. Apply the resource:

    Jump Node Console

    $ kubectl apply -f testapp-performance-test-deployment.yaml
    
  4. Validate that the deployment is running successfully: 

    Jump Node Console

    $ kubectl get pods -o wide
    NAME                                 READY   STATUS    RESTARTS   AGE     IP            NODE      NOMINATED NODE   READINESS GATES
    testapp-performance-fd6954bd-9hq99   1/1     Running   0          2m39s   10.233.68.4   worker2   <none>           <none>
    testapp-performance-fd6954bd-s56m7   1/1     Running   0          53s     10.233.67.4   worker1   <none>           <none>
    
  5. Connect to one of the pods in the Deployment:

    Jump Node Console

    $ kubectl exec -it testapp-performance-fd6954bd-9hq99 -- bash
    
  6. From within the container, check its IP address on its interface and see that it is recognizable as an RDMA device:

    First Pod Console

    root@testapp-performance-fd6954bd-9hq99:/# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    130: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8940 qdisc mq state UP group default qlen 1000
        link/ether 0a:58:0a:e9:44:04 brd ff:ff:ff:ff:ff:ff permaddr c2:8e:07:eb:52:e5
        altname enp137s0f0v36
        inet 10.233.68.4/24 brd 10.233.68.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::c08e:7ff:feeb:52e5/64 scope link 
           valid_lft forever preferred_lft forever
    
    
    root@testapp-performance-fd6954bd-9hq99:/# rdma link | grep eth0
    link mlx5_38/1 state ACTIVE physical_state LINK_UP netdev eth0
    
  7. Start the ib_write_bw server side:

    First Pod Console

    root@testapp-performance-fd6954bd-9hq99:/# ib_write_bw -a
    
    ************************************
    * Waiting for client to connect... *
    ************************************
    
  8. Using another console window, reconnect to the jump node and connect to the second pod in the deployment. 

    Jump Node Console

    $ kubectl exec -it testapp-performance-fd6954bd-s56m7 -- bash
    
  9. From within the container, start the ib_read_lat client (use the IP address from the server-side container) and check the bandwidth results:

    First Pod Console

    root@testapp-performance-fd6954bd-s56m7:/# ib_write_bw -a --report_gbits -F 10.233.68.4
    ---------------------------------------------------------------------------------------
                        RDMA_Write BW Test
     Dual-port       : OFF          Device         : mlx5_10
     Number of qps   : 1            Transport type : IB
     Connection type : RC           Using SRQ      : OFF
     PCIe relax order: ON
     ibv_wr* API     : ON
     TX depth        : 128
     CQ Moderation   : 100
     Mtu             : 4096[B]
     Link type       : Ethernet
     GID index       : 5
     Max inline data : 0[B]
     rdma_cm QPs     : OFF
     Data ex. method : Ethernet
    ---------------------------------------------------------------------------------------
     local address: LID 0000 QPN 0x01a8 PSN 0x9719ae RKey 0x058505 VAddr 0x007861d47cf000
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:233:67:05
     remote address: LID 0000 QPN 0x0968 PSN 0x6a9c2e RKey 0x04a505 VAddr 0x0079c10c637000
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:233:68:04
    ---------------------------------------------------------------------------------------
     #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
     2          5000           0.012810            0.009982            0.623879
     4          5000             0.20               0.19               5.990129
     8          5000             0.39               0.39               6.037097
     16         5000             0.80               0.78               6.060446
     32         5000             1.55               1.54               5.998464
     64         5000             3.07               3.06               5.976294
     128        5000             6.32               6.14               6.000758
     256        5000             12.72              12.34              6.027505
     512        5000             24.60              24.50              5.982084
     1024       5000             49.20              49.01              5.983012
     2048       5000             98.40              97.84              5.971414
     4096       5000             169.78             168.60             5.145195
     8192       5000             192.47             192.30             2.934319
     16384      5000             192.90             192.86             1.471428
     32768      5000             193.18             193.13             0.736721
     65536      5000             193.29             193.26             0.368611
     131072     5000             193.32             193.32             0.184363
     262144     5000             193.34             193.34             0.092190
     524288     5000             193.37             193.37             0.046103
     1048576    5000             193.38             193.38             0.023053
     2097152    5000             193.38             193.38             0.011526
     4194304    5000             193.39             193.39             0.005763
     8388608    5000             193.39             193.39             0.002882
    ---------------------------------------------------------------------------------------
    

iPerf TCP Bandwidth Test

  1. Create a test Deployment using the YAML from the previous example to create a pod on each worker that you can use to test TCP connectivity and performance.

    The container image specified in the test must include iperf.

  2. Connect to one of the pods in the deployment:

    Jump Node Console

    $ kubectl exec -it testapp-performance-fd6954bd-9hq99 -- bash
    
  3. Before starting the iperf3 server listeners, and to be able to achieve good results, check  the cores the pod is currently running on:

    To be able to bind to specific cores, make sure to schedule a pod in Guaranteed QoS class.

    Check on which worker node the pod is running on (In this example core: 28-51): 

    Jump Node Console

    root@testapp-performance-fd6954bd-9hq99: taskset -pc 1
    pid 1's current affinity list: 28-51
    
  4. Back within the container of the pod, use the following script to start multiple iperf3 servers (1 for each core) on different ports:

    iperf_server.sh

    Bash
    #!/bin/bash
    
    # Cores to bind the iperf3 server processes to
    CORES=$1
    
    # Calculate the first_core and last_core to provide the CPU range
    first_core=$(echo $CORES | cut -d "-" -f1)
    last_core=$(echo $CORES | cut -d "-" -f2)
    
    # Loop over the ports (5201 + i*2) for i in the given CPU range and run iperf3 servers
    for i in $(seq $first_core $last_core); do
       echo "Running iperf3 server on core $i"
       taskset -c $i iperf3 -s -p $((5201 + i * 2)) > /dev/null 2>&1 &
    done
    
  5. Start the script using the previous CPU range (leave 1 core as a buffer):

    First Pod Console

    root@testapp-performance-fd6954bd-9hq99:/# chmod +x iperf_server.sh
    root@testapp-performance-fd6954bd-9hq99:/# ./iperf_server.sh 28-50
    Running iperf3 server on core 28
    Running iperf3 server on core 29
    
    ...
    ...
    Running iperf3 server on core 49
    Running iperf3 server on core 50
    
    root@testapp-performance-fd6954bd-9hq99:/# ps -ef | grep iperf3
    root          38       1  0 14:39 pts/0    00:00:00 iperf3 -s -p 5257
    root          39       1  0 14:39 pts/0    00:00:00 iperf3 -s -p 5259
    ...
    ...
    root          59       1  0 14:39 pts/0    00:00:00 iperf3 -s -p 5299
    root          60       1  0 14:39 pts/0    00:00:00 iperf3 -s -p 5301
    
  6. Connect to the second pod:

    Jump Node Console

    $ kubectl exec -it testapp-performance-fd6954bd-s56m7 -- bash
    
  7. Follow the previously displayed method to identify the CPU cores the second pod is running on.

  8. Use the following script to start multiple iperf3 clients that will connect to each iperf3 server in the first pod:

    • The script receives 3 parameters: server IP to connect to, the cores it will spawn the iperf3 processes to, and the duration the iperf3 test will run. Make sure to pass all 3 when initiating the script and providing the CPU cores as a range (28-50 in this example).

    • jq and bc should be installed on the pod to properly run it. 

    iperf_client.sh

    Bash
    #!/bin/bash
    
    # IP address of the server where iperf3 servers are running
    SERVER_IP=$1  # Change to your server's IP
    
    # Cores to bind the iperf3 client processes to
    CORES=$2
    
    # Duration to run the iperf3 test
    DUR=$3
    
    # Variable to accumulate the total bandwidth in Gbit/sec
    total_bandwidth_Gbit=0
    
    # Calculate the first_core and last_core to provide the CPU range
    first_core=$(echo $CORES | cut -d "-" -f1)
    last_core=$(echo $CORES | cut -d "-" -f2)
    
    # Array to store the PIDs of background tasks
    pids=()
    
    # Loop over the ports (5201 + i*2) for i in the given CPU range
    for i in $(seq $first_core $last_core); do
        port=$((5201 + i * 2))
        cpu_core=$i  # Assign CPU core based on the value of i
        output_file="iperf3_client_results_$port.log"
    
        # Run the iperf3 client in the background with CPU core binding
        timeout $(( DUR +5 )) taskset -c $cpu_core iperf3 -c $SERVER_IP -p $port -t $DUR -J > $output_file &
        pid=$!
        pids+=("$pid")
    done
    
    # Wait for all background tasks to complete and check their status
    for pid in "${pids[@]}"; do
        wait $pid
        if [[ $? -ne 0 ]]; then
            echo "Process with PID $pid failed or timed out."
        fi
    done
    
    # Summarize the results from each log file
    echo "Summary of iperf3 client results:"
    for i in $(seq $first_core $last_core); do
        port=$((5201 + i * 2))
        output_file="iperf3_client_results_$port.log"
    
        if [[ -f $output_file ]]; then
            echo "Results for port $port:"
    
            # Parse the results and print a summary
            bandwidth_bps=$(jq '.end.sum_received.bits_per_second' $output_file)
    
            if [[ -n $bandwidth_bps ]]; then
               # Convert bandwidth from bps to Gbit/sec
               bandwidth_Gbit=$(echo "scale=3; $bandwidth_bps / 1000000000" | bc)
               echo "  Bandwidth: $bandwidth_Gbit Gbit/sec"
    
               # Accumulate the bandwidth for the total summary
               total_bandwidth_Gbit=$(echo "scale=3; $total_bandwidth_Gbit + $bandwidth_Gbit" | bc)
    
               # Delete current log file
               rm $output_file
            else
               echo "No bandwidth data found in $output_file"
            fi
    
        else
            echo "No results found for port $port"
        fi
    done
    
    # Print the total bandwidth summary
    echo "Total Bandwidth across all streams: $total_bandwidth_Gbit Gbit/sec"
    
  9. Run the script and check the performance results:

    Second Pod Console

    root@testapp-performance-fd6954bd-s56m7:/# chmod +x iperf_client.sh
    root@testapp-performance-fd6954bd-s56m7:/# ./iperf_client.sh 10.233.68.4 28-50 30
    Summary of iperf3 client results:
    Results for port 5257:
      Bandwidth: 3.843 Gbit/sec
    Results for port 5259:
      Bandwidth: 11.506 Gbit/sec
    Results for port 5261:
      Bandwidth: 11.492 Gbit/sec
    Results for port 5263:
      Bandwidth: 11.492 Gbit/sec
    Results for port 5265:
      Bandwidth: 5.734 Gbit/sec
    Results for port 5267:
      Bandwidth: 5.700 Gbit/sec
    Results for port 5269:
      Bandwidth: 5.769 Gbit/sec
    Results for port 5271:
      Bandwidth: 3.873 Gbit/sec
    Results for port 5273:
      Bandwidth: 5.772 Gbit/sec
    Results for port 5275:
      Bandwidth: 11.556 Gbit/sec
    Results for port 5277:
      Bandwidth: 11.513 Gbit/sec
    Results for port 5279:
      Bandwidth: 5.820 Gbit/sec
    Results for port 5281:
      Bandwidth: 5.816 Gbit/sec
    Results for port 5283:
      Bandwidth: 11.501 Gbit/sec
    Results for port 5285:
      Bandwidth: 3.820 Gbit/sec
    Results for port 5287:
      Bandwidth: 11.505 Gbit/sec
    Results for port 5289:
      Bandwidth: 5.815 Gbit/sec
    Results for port 5291:
      Bandwidth: 11.507 Gbit/sec
    Results for port 5293:
      Bandwidth: 11.559 Gbit/sec
    Results for port 5295:
      Bandwidth: 11.550 Gbit/sec
    Results for port 5297:
      Bandwidth: 11.541 Gbit/sec
    Results for port 5299:
      Bandwidth: 5.737 Gbit/sec
    Results for port 5301:
      Bandwidth: 5.822 Gbit/sec
    Results for port 5303:
      Bandwidth: 5.742 Gbit/sec
    Total Bandwidth across all streams: 195.985 Gbit/sec
    

Authors


GZ.jpg

Guy Zilberman

Guy Zilberman is a solution architect at NVIDIA's Networking Solutions Labs, bringing extensive experience from several leadership roles in cloud computing. He specializes in designing and implementing solutions for cloud and containerized workloads, leveraging NVIDIA's advanced networking technologies. His work primarily focuses on open-source cloud infrastructure, with expertise in platforms such as Kubernetes (K8s) and OpenStack.



SD.jpg

Shachar Dor

Shachar Dor joined the Solutions Lab team after working more than ten years as a software architect at NVIDIA Networking (previously Mellanox Technologies), where he was responsible for the architecture of network management products and solutions. Shachar's focus is on networking technologies, especially around fabric bring-up, configuration, monitoring, and life-cycle management. 

Shachar has a strong background in software architecture, design, and programming through his work on multiple projects and technologies also prior to joining the company. 



Last updated: