DOCA SDK Documentation

DOCA Telemetry DPA

This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField® and NVIDIA® ConnectX® networking platforms using the DOCA Telemetry DPA API.

DOCA Telemetry DPA is supported at alpha level.

Introduction

The DOCA Telemetry DPA library provides access to detailed telemetry data and performance statistics for the Data Path Accelerator (DPA) on supported NVIDIA networking platforms. With its API, developers can monitor and analyze DPA processes, threads, and profiling data for efficient application performance optimization.

Prerequisites

To utilize DOCA Telemetry DPA, your system must meet the following baseline requirements:

  • Firmware: Version >=28/32/40.43.1000 is required for ConnectX-7, BlueField-3, and ConnectX-8 devices.

  • Driver: The fwctl driver must be fully installed and actively loaded on the system.

Verifying the fwctl Driver

To verify that the fwctl driver is successfully loaded, check the device directories: 

$ ls /sys/class/fwctl/
$ ls /dev/fwctl

The expected output for a standard 2-port device is fwctl0 fwctl1.

Manually Loading the Driver

If the directories /sys/class/fwctl or /dev/fwctl do not exist or are empty, the module may be installed but inactive.

Check for the module's presence:

$ grep fwctl -R /lib/modules/$(uname -r)/

If the output confirms the presence of fwctl.ko and mlx5_fwctl.ko, manually load the module and verify its status:

$ sudo modprobe mlx5_fwctl
$ lsmod | grep fwctl

Reinstalling the DOCA Host Package

If you cannot locate the installed fwctl module while manually loading the driver, or if the modprobe command fails to load it successfully, you must reinstall the DOCA Host package.

  1. Download the package (DOCA 3.3.0 example):

    $ wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.3.0/host/doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb

  2. Purge existing DOCA and OFED modules:

    $ sudo for f in $( dpkg --list | grep doca | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep mlnx | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep dpdk | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep ofed | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo /usr/sbin/ofed_uninstall.sh --force
    $ sudo apt-get autoremove

  3. Install the new package and restart services:

    $ sudo dpkg -i doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb
    $ sudo apt-get update
    $ sudo apt-get -y install doca-all
    $ sudo /etc/init.d/openibd restart
    Once the reinstallation is complete, confirm the module is successfully loaded according to section "DOCA Telemetry DPA | Verifying the fwctl Driver".

Environment

DOCA Telemetry DPA-based applications can run on:

  • Host machines – ConnectX-7 or BlueField-3 and newer

  • DPU targets – BlueField-3 and newer

Architecture

DOCA Telemetry DPA provides comprehensive profiling data, including:

  • Process and thread information – Monitor all active DPA processes and threads

  • Cumulative performance counters – Track cumulative performance metrics to evaluate application behavior

  • Event tracer data – Capture detailed event-based traces for in-depth analysis

To interact with a device, users must create a DOCA Telemetry DPA context using doca_telemetry_dpa_create(). Each context is independent of DOCA DPA contexts, meaning changes in DPA configurations are not automatically reflected in the telemetry context. A device typically corresponds to a specific port on a NIC. 

Performance counters are not owned by the doca_telemetry_dpa context. Other active telemetry contexts could manage the same counters. Ensure only a single context is used to profile the DPA to avoid conflicts.

Configuration Phase

Device Support

DOCA Telemetry DPA requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".

As device capabilities may change (see DOCA Core Device Support), it is recommended to check your device using the doca_telemetry_dpa_cap_is_supported() method:

Output Structure Format

The user application is responsible for allocating the output structures. To that end, DOCA Telemetry DPA provides helper methods that return the structure size in bytes (see section DOCA Telemetry DPA | Execution Phase for more details).

The  doca_telemetry_dpa context supports the following layout structures for the profile data:

doca telemetry_dpa_process_info

dpa_process_id

Global DPA process ID

num_of_threads

Number of threads in the process

process_name

The name of the process

doca_telemetry_dpa_thread_info

dpa_process_id

Global DPA process ID

dpa_thread_id

Global DPA thread ID

thread_name

The name of the thread

doca_telemetry_dpa_cumul_info

dpa_process_id

Global DPA process ID

dpa_thread_id

Global DPA thread ID

time

Total time in ticks the thread has been active

cycles

Total execution unit cycles the thread used

instructions

Total number of instructions the thread executed

num_executions

Total number of thread executions

doca_telemetry_dpa_event_sample

timestamp

Timestamp in µsec

cycles

Stamp of total execution unit (EU) cycles

instructions

Stamp of total number of instructions of this DPA EU

dpa_thread_id

Global DPA thread ID

eu_id

Execution unit ID

sample_id_in_eu

Running sample ID per EU. A single sample_id is assigned to both schedule in and out samples.

type

Type of event sample:

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_EMPTY_SAMPLE

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_SCHEDULE_IN

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_SCHEDULE_OUT

  • DOCA_TELEMETRY_DPA_EVENT_SAMPLE_TYPE_BUFFER_FULL

The user can retrieve the DPA timer ticks frequency, given in kHZ, using doca_telemetry_dpa_get_dpa_timer_freq(). With this frequency, timer ticks can be converted to running clock using the formula: clock_time = ticks/dpa_timer_frequency.

Establishing the Amount of Event Tracer Samples

The user must set the maximum amount of event tracer samples to retrieve. This value can be set using doca_telemetry_dpa_set_max_perf_event_samples() and retrieved using doca_telemetry_dpa_get_max_perf_event_samples().

Addressing All Processes and Threads

  • To retrieve a unique ID to address all processes running on the DPA:

    uint32_t all_processes_id;
    doca_telemetry_dpa_get_all_process_id(&all_processes_id);

  • To retrieve a unique ID to address all threads running on the DPA:

    uint32_t all_threads_id;
    doca_telemetry_dpa_get_all_threads_id(&all_threads_id);

Execution Phase

Retrieving Running DPA Process Information

Information about specific process or all processes running on the DPA can be retrieved using these steps:

  1. Get memory size for process list allocation:

    uint32_t size;
    doca_telemetry_dpa_get_process_list_size(context, process_id, &size);

  2. Retrieve process list:

    doca_telemetry_dpa_read_processes_list(context, processs_id, &process_num, &process_list);

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_processes_list().

Retrieving Running DPA Thread Information

Information about specific thread or all threads running on the DPA can be retrieved using these steps:

  1. Get memory size for thread list allocation:

    uint32_t size;
    doca_telemetry_dpa_get_thread_list_size(context, process_id, thread_id, &size);

  2. Retrieve thread list:

    doca_telemetry_dpa_read_thread_list(context, process_id, thread_id, &threads_num, &thread_list);

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_thread_list().

Retrieving Cumulative Performance Profile Information Samples

Cumulative performance samples can be retrieved for specific processes and threads or for all processes and threads (see DOCA Telemetry DPA | Address all processes and threads) running on the DPA using these steps:

  1. Start cumulative counter/s:

    doca_telemetry_dpa_counter_start(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);
    
  2. Stop cumulative counter/s:

    doca_telemetry_dpa_counter_stop(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);

  3. Get memory size for cumulative samples list allocation:

    uint32_t size;
    doca_telemetry_dpa_get_cumul_samples_size(context, process_id, thread_id, &size);

  4. Retrieve cumulative sample list:

    doca_telemetry_dpa_read_cumul_info_list(context, process_id, thread_id, &cumul_samples_num, &cumul_info_list);

The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_cumul_info_list(). 

Reset counters using doca_telemetry_dpa_counter_restart() to clear previous data.

Retrieving Event Tracer Profile Information Samples

Event tracer samples can be retrieved for specific processes and threads or for all processes and threads running on the DPA using these steps:

  1. Start event tracer counter/s:

    doca_telemetry_dpa_counter_start(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);
    
  2. Stop event tracer counter/s:

    doca_telemetry_dpa_counter_stop(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);

  3. Get memory size for event samples list allocation:

    uint32_t size;
    doca_telemetry_dpa_get_perf_event_samples_size(context, process_id, thread_id, &size);

  4. Retrieve event tracer sample list:

    doca_telemetry_dpa_read_perf_event_list(context, process_id, thread_id, &perf_event_samples_num, &event_info_list);

The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().

Ensure the maximum number of samples is set before starting counters using doca_telemetry_dpa_set_max_perf_event_samples().

If the process ID is set to address all processes, the thread ID must also be set to address all threads.

Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_perf_event_list(). 

Reset counters using doca_telemetry_dpa_counter_restart() before reuse.

State Machine

The following section describes the different states the doca_telemetry_dpa context goes through, how to move between states and what is allowed in each state.

Idle

The context has been created and is idle.

In this state, it is expected for the application to:

  • Destroy the context

  • Start the context for processing

Allowed operations:

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Running

Call stop

Running

In this state it is expected for the application to:

  1. Stop the context.

  2. Retrieve process information list

  3. Retrieve thread information list.

  4. Start/stop/reset counters for profiling capabilities.

  5. Retrieve profile samples for cumulative performance counters.

  6. Retrieve profile samples for event tracer.

Allowed operations:

  • Calling stop, moving the application to "Idle" state

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Successfully start the context

There are currently no state restrictions on the majority of API functions.

Alternative Datapath Options

DOCA Telemetry DPA supports only CPU-based datapaths.

Running the Sample

  1. Refer to the following documents:

    1. DOCA Installation Guide for Linux for details on how to install BlueField-related software.

    2. NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.

  2. To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:

    cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_dpa
    meson /tmp/build
    ninja -C /tmp/build
    

    The binary doca_telemetry_dpa is created under /tmp/build/.

Sample usage: 

Usage: doca_telemetry_dpa [DOCA Flags] [Program Flags]

DOCA Flags:
  -h, --help                        Print a help synopsis
  -v, --version                     Print program version information
  -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  -j, --json <path>                 Parse command line flags from an input json file

Program Flags:
  -p, --pci-addr                    DOCA device PCI device address
  -rt, --sample-run-time            Total sample run time, in miliseconds
  -ct, --counter-type               Counter type, cumulus (0) or event (1)
  -es, --event-samples              Set the maximum number of perf event samples to retrieve
  -pi, --process-id                 Specific process id to address
  -ti, --thread-id                  Specific thread id to address

The sample includes:

  1. Locating and opening a DOCA device.

  2. Creating a doca_telemetry_dpa instance.

  3. Retrieval of all or one specific process

  4. Retrieval of all or one specific thread

  5. Starting counters for the selected profile capability

  6. Retrieving the profile samples for the selected profile capability

  7. Displaying the retrieve profile information.

  8. Destroying the doca_telemetry_dpa context.

Last updated: