DOCA Platform Framework (DPF) Documentation

DPFOperatorConfig

Overview

The DPFOperatorConfig controls how DPF operates in your Kubernetes cluster. This guide explains the major configuration options. When the config is applied, the DPF Operator will deploy all necessary components and configure them according to the configuration.

Basic Configuration Example

This basic config example enables the Kamaji cluster manager and sets the PVC name of the BFB PVC which is necessary for the provisioning controller to download the bf-bundle.

In the current implementation the DPFOperatorConfig resource is a singleton. This means that only one instance of this resource can exist in the cluster. If you try to create a second instance, the controllers will not work as expected.

You can find the full API documentation in the API Reference.

YAML
apiVersion: operator.doca-platform.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
  name: dpf-operator-config
spec:
  staticClusterManager:
    disable: true
  kamajiClusterManager:
    disable: false
  provisioningController:
    bfbPVCName: bfb-pvc

We can verify if the configuration is applied correctly by checking the status of the DPFOperatorConfig resource.

$ kubectl -n dpf-operator-system get dpfoperatorconfig
NAME                READY   PHASE     AGE
dpfoperatorconfig   True    Success   1h

or via dpfctl

$ kubectl -n dpf-operator-system exec deployment/dpf-operator-controller-manager -- /dpfctl describe all
NAME                                      NAMESPACE            STATUS  REASON   SINCE  MESSAGE
DPFOperatorConfig/dpfoperatorconfig       dpf-operator-system
            ├─Ready                                            True    Success  1h
            ├─ImagePullSecretsReconciled                       True    Success  1h
            ├─SystemComponentsReady                            True    Success  1h
            └─SystemComponentsReconciled                       True    Success  1h

Configuration Options

Networking

There are networking options that can be configured. The MTU for the control plane and high-speed interfaces can be configured. The default value is set to 1500, however it can be adjusted if required.

YAML
spec:
  networking:
    controlPlaneMTU: 1500    # Management network MTU (range: 1280-9216, default: 1500)
    highSpeedMTU: 1500       # High-speed interface MTU (range: 1280-9216, default: 1500)

Image Pull Secrets

Specify secrets for pulling container images. This is only necessary if your container registry requires authentication. If you are using the public GHCR registry, which is the default, you don't need to configure this.

YAML
spec:
  imagePullSecrets:
    - "my-registry-secret"
    - "another-secret"

Resources

All system components deployed by the DPF Operator support standard Kubernetes resource requests and limits. Resources can be configured per component at the container level. Components may have multiple containers with different resource requirements that can be configured independently.

Below is an example of configuring resources for the SFC Controller component:

YAML
spec:
  sfcController:
    controller:
      resources:
        requests:
          cpu: 6
          memory: 2Gi
        limits:
          cpu: 8
          memory: 4Gi

This pattern applies to all components listed in the Optional Component Configurations section below.
For production deployments, it's recommended to set appropriate resource limits based on your cluster's workload.

Optional Component Configurations

The following components can be configured to enable/disable features or specify a different container image.
By default, all components are enabled with preconfigured images, and changes are usually only needed for development, testing, or specific deployments.

YAML
spec:
  cniInstaller: { }
  dpuDetector: { }
  dpuServiceController: { }
  flannel: { }
  kamajiClusterManager: { }
  multus: { }
  nvipam: { }
  ovsCNI: { }
  provisioningController: { }
  serviceSetController: { }
  sfcController: { }
  sriovDevicePlugin: { }
  staticClusterManager: { }

To disable a component or override its container image, use the following configuration:

YAML
spec:
  sriovDevicePlugin:
    disable: true
  dpuDetector:
    daemon:
      image: "my-registry/my-dpu-detector:latest"

Deprecated: Setting the image at component level (e.g., spec.dpuDetector.image) is deprecated. Use the sub-component specific image field instead (e.g., spec.dpuDetector.daemon.image).

For a detailed description of each component and its available configuration options, see
the API Reference.

DPU Service Controller Configuration options

  • spec.dpuServiceController.disableDPUReadyTaints: When set to true, disables the automatic tainting of DPU nodes when they're not ready.

YAML
spec:
  dpuServiceController:
    disableDPUReadyTaints: true

Flannel Configuration Options

  • spec.flannel.podCIDR: CIDR range for pod networking when using Flannel CNI.

YAML
spec:
  flannel:
    podCIDR: "10.244.0.0/16"

Component Deployment Configuration

Several components support additional deployment configuration options:

  • helmChart: Override the Helm chart repository/version for the component

YAML
spec:
  multus:
    helmChart: "custom-repo/multus:v1.0.0"

SFC Controller Configuration Options

  • spec.sfcController.SecureFlowDeletionTimeout: Used to control the secure flow deletion feature.

    The default value is 0, which means that the feature is disabled.
    When set with a valid duration value, indicating the API server unavailability threshold, SFC controller will delete all openflow flows to prevent unintended packet leaks, if API server is unavailable for more than the specified duration.
    Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.

YAML
spec:
  sfcController:
    SecureFlowDeletionTimeout: 5m

Provisioning Controller Configuration Options

  • spec.provisioningController.bfbPVCName: (Required) Name of the PVC containing the BFB (BF Bundle) for provisioning DPUs.

  • spec.provisioningController.maxDPUParallelInstallations: Controls the maximum number of DPUs that can be provisioned concurrently. The default value is 50. The value must be at least 1.

  • spec.provisioningController.maxUnavailableDPUNodes: Maximum number of DPU nodes that can be unavailable during updates. The provisioning controller interacts with the maintenance-operator to implement the drain node effect. The number of nodes that can be applied node effect simultaneously is determined by MaxUnavailableDPUNodes in dpfoperatorconfig and MaxParallelOperations in the NodeMaintenance-operator configuration. NodeMainteanceOperator has higher priority than what is defined in the DPFOperatorConfig. The default value of DPFOperatorConfig.MaxUnavailableDPUNodes is 50. For the default MaintenanceOperatorConfig values see instructions in helm prerequisites.

The maxDPUParallelInstallations and maxUnavailableDPUNodes options can be configured together and can be combined with maxParallelOperations and maxUnavailable in Nvidia NodeMaintenance-operator configuration. Below are some examples to show the expected behaviour.

maxDPUParallelInstallations in DPFOperatorconfig

maxUnavailableDPUNodes in DPFOperatorconfig

maxParallelOperations in Nvidia NodeMaintenanceConfig

maxUnavailable in Nvidia NodeMaintenanceConfig

max number of DPUs in provisioning

max number of Nodes under node effect in NodeMaintenanceOperator

5

1

10

5

up to 5 DPUs provisioning in parallel

up to 1 node under node effect

1

5

10

10

up to 1 DPU provisioning

up to 1 node under node effect

5

5

1

5

up to 5 DPUs provisioning in parallel

up to 1 node under node effect

5

5

10

2

up to 5 DPUs provisioning in parallel

up to 2 node under node effect

  • spec.provisioningController.bfCFGTemplateConfigMap: Name of ConfigMap containing bf-cfg template for DPU configuration.

  • spec.provisioningController.customCASecretName: Name of Secret containing custom CA certificates for secure communication.

  • spec.provisioningController.dmsTimeout: Timeout in seconds for DMS (DPU Management Service) operations.

  • spec.provisioningController.multiDPUOperationsSyncWaitTime: Wait time for synchronizing operations across multiple DPUs. Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration.

  • spec.provisioningController.registry: Configuration for the container registry used during provisioning.

    • address: Registry address

    • port: Registry port

  • spec.provisioningController.installInterface: Method for installing DPU firmware. Choose one:

    • installViaHostAgent: Install via host agent

    • installViaGNOI: Install via gNOI protocol

    • installViaRedfish: Install via Redfish API with additional options:bfbRegistry.disable: Disable the BFB registrybfbRegistry.port: Port for BFB registrybfbRegistryAddress: Address of BFB registryskipDpuNodeDiscovery: Skip automatic DPU node discovery

YAML
spec:
  provisioningController:
    bfbPVCName: bfb-pvc
    maxDPUParallelInstallations: 25  # Limit concurrent provisioning to 25 DPUs
    maxUnavailableDPUNodes: 5
    dmsTimeout: 600
    multiDPUOperationsSyncWaitTime: 30s
    customCASecretName: my-ca-secret
    registry:
      address: "registry.example.com"
      port: 5000
    installInterface:
      installViaRedfish:
        bfbRegistry:
          port: 8080
        skipDpuNodeDiscovery: false

Advanced Overrides

The overrides section allows customization of system-level paths and settings. These are typically only needed for non-standard deployments or testing scenarios.

YAML
spec:
  overrides:
    # Pause reconciliation of the DPFOperatorConfig
    paused: false

    # Kubernetes API server configuration
    kubernetesAPIServerVIP: "192.168.1.100"
    kubernetesAPIServerPort: 6443

    # DPU filesystem paths for CNI
    dpuCNIPath: "/etc/cni/net.d"
    dpuCNIBinPath: "/opt/cni/bin"

    # DPU OpenVSwitch paths
    dpuOpenvSwitchBinPath: "/usr/bin"
    dpuOpenvSwitchRunPath: "/var/run/openvswitch"
    dpuOpenvSwitchSystemSharedPath: "/usr/share/openvswitch"
    dpuOpenvSwitchSystemSharedLib64Path: "/usr/lib64"

    # Flannel-specific overrides
    flannelSkipCNIConfigInstallation: false

Override Options

  • paused: When set to true, pauses reconciliation of the DPFOperatorConfig resource.

  • kubernetesAPIServerVIP: Override the Kubernetes API server virtual IP address.

  • kubernetesAPIServerPort: Override the Kubernetes API server port (default: 6443).

  • dpuCNIPath: Path to CNI configuration directory on DPU nodes.

  • dpuCNIBinPath: Path to CNI binaries on DPU nodes.

  • dpuOpenvSwitchBinPath: Path to OpenvSwitch binaries on DPU nodes.

  • dpuOpenvSwitchRunPath: Path to OpenvSwitch runtime directory on DPU nodes.

  • dpuOpenvSwitchSystemSharedPath: Path to OpenvSwitch shared directory on DPU nodes.

  • dpuOpenvSwitchSystemSharedLib64Path: Path to OpenvSwitch 64-bit libraries on DPU nodes.

  • flannelSkipCNIConfigInstallation: Skip automatic CNI configuration installation for Flannel.

Last updated: