DOCA SDK Documentation

DOCA Stream Receive Performance Application Guide

This guide outlines the implementation of the DOCA Stream Receive Performance application, built on top of the NVIDIA® BlueField® DPU.

Introduction

The Stream Receive Performance application is designed to measure and evaluate RX performance using the NVIDIA DOCA RMAX library. It leverages the capabilities of DOCA RMAX and NVIDIA Rivermax to support efficient, high-performance media and data streaming.

Key Technologies

  • DOCA RMAX API – A component of the NVIDIA DOCA framework, optimized for networking tasks in media streaming use cases.

  • NVIDIA Rivermax SDK – Built to exploit BlueField DPU hardware acceleration, enabling direct data transfers between the NIC and GPU, minimizing CPU load.

This architecture delivers high throughput, ultra-low latency, and minimal CPU utilization making it an ideal solution for demanding real-time streaming workloads.

Deployment Notes

  • DOCA Rivermax applications must run on BlueField target DPUs with root privileges or other additional permissions and capabilities

  • Ensure the DPU has a valid IP address configured

  • Allocate an appropriate number of huge pages for optimal performance. Refer to "Rivermax Performance-oriented Development Guidelines" for details. 

    To access this document, join the NVIDIA Rivermax SDK developers' program and access documentation in the Rivermax Developer page.

  • Runtime configurations can be tuned even after the application starts, allowing dynamic performance optimization

For complete setup steps and advanced configurations, refer to DOCA RMAX documentation.

System Design

The application is designed to receive and process network packets using the DOCA library. It is structured around three core components:

  • Configuration management – Manages the initialization, parsing, validation, and cleanup of application configuration parameters

  • Global resources management – Handles the allocation and management of shared resources such as memory maps, buffer inventories, and progress engines

  • Stream management – Manages the lifecycle of data streams used for packet reception, including setup, execution, and teardown

Architecture

The architecture comprises several key modules and their responsibilities.

Main Application

  • Initialization – Sets up logging, parses command-line arguments, and initializes the configuration.

  • Device listing – If the --list flag is passed, it enumerates and prints available devices, then exits.

  • Stream processing – Initializes global resources, configures the stream, and enters the packet reception loop.

Configuration Management

  • Initialization – Applies default values and creates the CPU affinity mask.

  • Argument parsing – Parses command-line arguments and updates the configuration accordingly.

  • Validation – Verifies that all required parameters are provided.

  • Destruction – Frees any configuration-related resources.

Global Resources Management

  • Initialization – Sets up shared memory maps, buffer inventories, and progress engines required for data handling

  • Destruction – Cleans up and releases global resources

Stream Management

  • Initialization – Configures and starts the stream, allocates memory buffers, and attaches the necessary flows

  • Packet reception loop – Processes incoming packets, manages events, and collects runtime statistics

  • Destruction – Detaches flows, stops the stream, and releases associated buffers

Application Functions and Roles

Function(s)

Role

main

Entry point of the application, handles overall flow control

init_configdestroy_config

Manage application configuration

register_argp_params

Register command-line arguments

init_globalsdestroy_globals

Manage global resources

init_streamdestroy_stream

Manage stream setup and teardown

run_recv_loop

Main loop for receiving and processing packets

handle_completionhandle_error

Event handlers for packet reception and errors

Data Structures

  • app_config – Holds configuration parameters for the application

  • globals – Holds global resources required by the application

  • stream_data – Manages the state and data associated with streaming

Event Handling

  • Completion Events – Handled by handle_completion, updates statistics and optionally dumps packet content

  • Error Events – Handled by handle_error, logs errors and stops the receive loop

Flow

  1. Initialization – Set up logging, configuration, and global resources.

  2. Device listing – Optionally list available devices.

  3. Stream setup – Configure and initialize the stream.

  4. Packet reception – Enter the main loop to receive and process packets.

  5. Teardown – Clean up resources and exit.

image-2025-3-16_15-23-2.png

DOCA Libraries

This application leverages the following DOCA library:

Dependencies

The RMAX library must be compiled and run, and a Rivermax license is required to run this application, as is the case with every application using DOCA RMAX.  Refer to NVIDIA Rivermax SDK page to obtain that license.

Compiling the Application

Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

DOCA reference applications are installed with full source code and build instructions. This allows you to compile them as-is or modify the source code to create custom versions.

For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The source code for the application is located in the following directory: 

/opt/mellanox/doca/applications/stream_receive_perf/

Compiling All Applications

All DOCA applications are defined under a single meson project. So, by default, the compilation includes all of them.

To build all the applications together, run: 

cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build

doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

Compiling Only the Current Application

  1. To directly build only the stream receive performance application:

     cd /opt/mellanox/doca/applications/
    meson /tmp/build -Denable_all_applications=false -Denable_stream_receive_perf=true
    ninja -C /tmp/build
    

    doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

  2. Alternatively, one can set the desired flags in the meson_options.txt file instead of providing them in the compilation command line:Edit the following flags in /opt/mellanox/doca/applications/meson_options.txt:Set enable_all_applications to falseSet enable_stream_receive_perf to trueThe same compilation commands should be used, as were shown in the previous section: cd /opt/mellanox/doca/applications/ meson /tmp/build ninja -C /tmp/build doca_stream_receive_perf is created under /tmp/build/stream_receive_perf/.

Running the Application

Prerequisites

This application can run on the target DPU only.

This application must be run with root privileges or other additional permissions and capabilities.

For detailed instructions on running without root privileges, please refer to the Environment section in DOCA Rivermax.

  • An IP address to the device being used must be set up .

  • It is recommended to have at least 800 huge pages enabled to achieve maximum performance:

    dpu> echo 1000000000 > /proc/sys/kernel/shmmax
    dpu> echo 800 > /proc/sys/vm/nr_hugepages
    

Application Execution

The stream receive performance application is provided in source form, hence a compilation is required before the application can be executed.

  • Application usage instructions

     Usage: doca_stream_receive_perf  [DOCA Flags] [Program Flags]
     
    DOCA Flags:
      -h, --help                        Print a help synopsis
      -v, --version                     Print program version information
      -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      -j, --json <path>                 Parse command line flags from an input json file
     
    Program Flags:
       --list                            List available devices
       --scatter-type                    Scattering type: RAW (default) or ULP
       --tstamp-format                   Timestamp format: raw (default), free-running or synced
        -s, --src-ip                      Source address to read from
        -d, --dst-ip                      Destination address to bind to
        -i, --local-ip                    IP of the local interface to receive data
        -p, --dst-port                    Destination port to read from
        -K, --packets                     Number of packets to allocate memory for (default 262144)
        -y, --payload-size                Packet's payload size (default 1500)
        -e, --app-hdr-size                Packet's application header size (default 0)
        -a, --cpu-affinity                Comma separated list of CPU affinity cores for the application main thread
        --sleep                           Amount of microseconds to sleep between requests (default 0)
        --min                             Block until at least this number of packets are received (default 0)
        --max                             Maximum number of packets to return in one completion
        --dump                            Dump packet content
    

    For additional information, please refer to the "DOCA Stream Receive Performance Application Guide | Command Line Flags" section below.

    The above usage printout can be printed to the command line using the -h (or --help) options:

    ./doca_stream_receive_perf -h
    
  • CLI example for listing available devices:

    ./doca_stream_receive_perf --list
    
  • CLI example for receiving a stream sent from 1.1.63.5 to the local NIC address 1.1.64.67 and port 7000:

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 1.1.64.67 --src-ip 1.1.63.5 --dst-port 7000
    


  • CLI example for receiving a stream receiving a stream sent on 239.0.0.1 to the local NIC 1.1.64.67 from 1.1.63.5 and port 7000

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 239.0.0.1 --src-ip 1.1.63.5 --dst-port 7000
    
  • CLI example for receiving a stream using header-data split mode. This example receives a stream sent from 1.1.63.5 to the local NIC address 1.1.64.67 and port 7000The application header size is 20 bytes, and the payload size is 1200 bytes:

    ./doca_stream_receive_perf --local-ip 1.1.64.67 --dst-ip 1.1.64.67 --src-ip 1.1.63.5 --dst-port 7000 --app-hdr-size 20 --payload-size 1200
    

    Setting the application header size enables header-data split mode which separates the application header from the payload.

Command Line Flags

General Flags

Short Flag

Long Flag

Description

-h

--help

Prints a help synopsis and exits

-v

--version

Prints program version information and exits

-l

--log-level

Sets the numeric log level for the application:

  • 10 – DISABLE

  • 20 – CRITICAL 

  • 30 – ERROR

  • 40 – WARNING

  • 50 – INFO

  • 60 – DEBUG

  • 70 – TRACE (requires compilation with TRACE support)

N/A

--sdk-log-level

Sets the SDK numeric log level using the same 10-70 scale as above

N/A

--log-filter

Filters logs from specific modules (comma-separated list)

-j

--json

Parses command-line flags from a specified input JSON file

Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.

Program Flags

Short Flag

Long Flag

Description

N/A

list

List all available devices, dump their IPv4 addresses, and tell whether or not the PTP clock is supported

N/A

scatter-type

Scattering type:

  • RAW (default)

  • ULP

N/A

tstamp-format

Timestamp format:

  • raw (default)

  • free-running

  •  synced

s

src-ip

Source IP address to read from

d

dst-ip

Destination IP address to bind to

i

local-ip

IP of the local interface to receive data

p

dst-port

Destination port to read from

K

packets

Number of packets to allocate memory for (default 262144)

y

payload-size

Packet's payload size (default 1500)

e

app-hdr-size

Packet's application header size (default 0)

a

cpu-affinity

list of CPU affinity cores for the application main thread

N/A

sleep

Amount of microseconds to sleep between requests

N/A

min

Block until at least this number of packets are received

N/A

max

Maximum number of packets to return in one completion

N/A

dump

Dump packet content

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the compilation, installation, or execution of the DOCA applications.

Application Code Flow

    1. Parse application argument. 

      1. Initialize arg parser resources and register DOCA general parameters.

        init_config();
        
      2. Register stream receive performance application parameters.

        register_argp_params();
        
      3. Parse the arguments.

        doca_argp_start();
        
        1. Parse app parameters.

    2. Device listing. 

      If the list parameter is set to true, the application lists all available devices.

      1. Initializes the DOCA RMAX library.

        doca_rmax_init();

      2. Enumerates and lists all available devices.

        list_devices();
        
    3. Stream receive: if the list parameter is not set, the application proceeds to receive stream. 

      1. Mandatory Arguments Check.

        mandatory_args_set();

      2. CPU Affinity Mask (if it is set).

        doca_rmax_set_cpu_affinity_mask();

      3. Initializes the DOCA RMAX library.

        doca_rmax_init();

      4. Device opening.

        open_device();

      5. Global Resources Initialization.

        init_globals();

      6. Stream Initialization.

        init_stream();

    4. Main Loop.

      run_recv_loop();
      
    5. Clean-up.

      1. Cleans up and destroys the stream.

        destroy_stream();
        
      2. Releases and destroys global application resources.

        destroy_globals();

      3. Closes the device.

        doca_dev_close();

      4. Releases the DOCA RMAX library.

        doca_rmax_release();

      5. Destroys the ARGP resources.

        doca_argp_destroy();

      1. Releases resources allocated by the application configuration.

        destroy_config();

References

  • /opt/mellanox/doca/applications/stream_receive_perf/

Last updated: