This guide provides instructions for building and developing applications that require telemetry data collection from NVIDIA® BlueField® and NVIDIA® ConnectX® networking platforms using the DOCA Telemetry DPA API.
DOCA Telemetry DPA is supported at alpha level.
Introduction
The DOCA Telemetry DPA library provides access to detailed telemetry data and performance statistics for the Data Path Accelerator (DPA) on supported NVIDIA networking platforms. With its API, developers can monitor and analyze DPA processes, threads, and profiling data for efficient application performance optimization.
Prerequisites
To utilize DOCA Telemetry DPA, your system must meet the following baseline requirements:
-
Firmware: Version
>=28/32/40.43.1000is required for ConnectX-7, BlueField-3, and ConnectX-8 devices. -
Driver: The
fwctldriver must be fully installed and actively loaded on the system.
Verifying the fwctl Driver
To verify that the fwctl driver is successfully loaded, check the device directories:
$ ls /sys/class/fwctl/
$ ls /dev/fwctl
The expected output for a standard 2-port device is fwctl0 fwctl1.
Manually Loading the Driver
If the directories /sys/class/fwctl or /dev/fwctl do not exist or are empty, the module may be installed but inactive.
Check for the module's presence:
$ grep fwctl -R /lib/modules/$(uname -r)/
If the output confirms the presence of fwctl.ko and mlx5_fwctl.ko, manually load the module and verify its status:
$ sudo modprobe mlx5_fwctl
$ lsmod | grep fwctl
Reinstalling the DOCA Host Package
If you cannot locate the installed fwctl module while manually loading the driver, or if the modprobe command fails to load it successfully, you must reinstall the DOCA Host package.
-
Download the package (DOCA 3.3.0 example):
$ wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.3.0/host/doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb
-
Purge existing DOCA and OFED modules:
$ sudo for f in $( dpkg --list | grep doca | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done $ sudo for f in $( dpkg --list | grep mlnx | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done $ sudo for f in $( dpkg --list | grep dpdk | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done $ sudo for f in $( dpkg --list | grep ofed | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done $ sudo /usr/sbin/ofed_uninstall.sh --force $ sudo apt-get autoremove -
Install the new package and restart services:
Once the reinstallation is complete, confirm the module is successfully loaded according to section "DOCA Telemetry DPA | Verifying the fwctl Driver".$ sudo dpkg -i doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb $ sudo apt-get update $ sudo apt-get -y install doca-all $ sudo /etc/init.d/openibd restart
Environment
DOCA Telemetry DPA-based applications can run on:
-
Host machines – ConnectX-7 or BlueField-3 and newer
-
DPU targets – BlueField-3 and newer
Architecture
DOCA Telemetry DPA provides comprehensive profiling data, including:
-
Process and thread information – Monitor all active DPA processes and threads
-
Cumulative performance counters – Track cumulative performance metrics to evaluate application behavior
-
Event tracer data – Capture detailed event-based traces for in-depth analysis
To interact with a device, users must create a DOCA Telemetry DPA context using doca_telemetry_dpa_create(). Each context is independent of DOCA DPA contexts, meaning changes in DPA configurations are not automatically reflected in the telemetry context. A device typically corresponds to a specific port on a NIC.
Performance counters are not owned by the doca_telemetry_dpa context. Other active telemetry contexts could manage the same counters. Ensure only a single context is used to profile the DPA to avoid conflicts.
Configuration Phase
Device Support
DOCA Telemetry DPA requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".
As device capabilities may change (see DOCA Core Device Support), it is recommended to check your device using the doca_telemetry_dpa_cap_is_supported() method:
Output Structure Format
The user application is responsible for allocating the output structures. To that end, DOCA Telemetry DPA provides helper methods that return the structure size in bytes (see section DOCA Telemetry DPA | Execution Phase for more details).
The doca_telemetry_dpa context supports the following layout structures for the profile data:
|
doca telemetry_dpa_process_info |
|
|---|---|
|
|
Global DPA process ID |
|
|
Number of threads in the process |
|
|
The name of the process |
|
doca_telemetry_dpa_thread_info |
|
|---|---|
|
|
Global DPA process ID |
|
|
Global DPA thread ID |
|
|
The name of the thread |
|
doca_telemetry_dpa_cumul_info |
|
|---|---|
|
|
Global DPA process ID |
|
|
Global DPA thread ID |
|
|
Total time in ticks the thread has been active |
|
|
Total execution unit cycles the thread used |
|
|
Total number of instructions the thread executed |
|
|
Total number of thread executions |
|
doca_telemetry_dpa_event_sample |
|
|---|---|
|
|
Timestamp in µsec |
|
|
Stamp of total execution unit (EU) cycles |
|
|
Stamp of total number of instructions of this DPA EU |
|
|
Global DPA thread ID |
|
|
Execution unit ID |
|
|
Running sample ID per EU. A single |
|
|
Type of event sample:
|
The user can retrieve the DPA timer ticks frequency, given in kHZ, using doca_telemetry_dpa_get_dpa_timer_freq(). With this frequency, timer ticks can be converted to running clock using the formula: clock_time = ticks/dpa_timer_frequency.
Establishing the Amount of Event Tracer Samples
The user must set the maximum amount of event tracer samples to retrieve. This value can be set using doca_telemetry_dpa_set_max_perf_event_samples() and retrieved using doca_telemetry_dpa_get_max_perf_event_samples().
Addressing All Processes and Threads
-
To retrieve a unique ID to address all processes running on the DPA:
uint32_t all_processes_id; doca_telemetry_dpa_get_all_process_id(&all_processes_id);
-
To retrieve a unique ID to address all threads running on the DPA:
uint32_t all_threads_id; doca_telemetry_dpa_get_all_threads_id(&all_threads_id);
Execution Phase
Retrieving Running DPA Process Information
Information about specific process or all processes running on the DPA can be retrieved using these steps:
-
Get memory size for process list allocation:
uint32_t size; doca_telemetry_dpa_get_process_list_size(context, process_id, &size);
-
Retrieve process list:
doca_telemetry_dpa_read_processes_list(context, processs_id, &process_num, &process_list);
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_processes_list().
Retrieving Running DPA Thread Information
Information about specific thread or all threads running on the DPA can be retrieved using these steps:
-
Get memory size for thread list allocation:
uint32_t size; doca_telemetry_dpa_get_thread_list_size(context, process_id, thread_id, &size);
-
Retrieve thread list:
doca_telemetry_dpa_read_thread_list(context, process_id, thread_id, &threads_num, &thread_list);
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_thread_list().
Retrieving Cumulative Performance Profile Information Samples
Cumulative performance samples can be retrieved for specific processes and threads or for all processes and threads (see DOCA Telemetry DPA | Address all processes and threads) running on the DPA using these steps:
-
Start cumulative counter/s:
doca_telemetry_dpa_counter_start(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT); -
Stop cumulative counter/s:
doca_telemetry_dpa_counter_stop(context, process_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_CUMULATIVE_EVENT);
-
Get memory size for cumulative samples list allocation:
uint32_t size; doca_telemetry_dpa_get_cumul_samples_size(context, process_id, thread_id, &size);
-
Retrieve cumulative sample list:
doca_telemetry_dpa_read_cumul_info_list(context, process_id, thread_id, &cumul_samples_num, &cumul_info_list);
The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_cumul_info_list().
Reset counters using doca_telemetry_dpa_counter_restart() to clear previous data.
Retrieving Event Tracer Profile Information Samples
Event tracer samples can be retrieved for specific processes and threads or for all processes and threads running on the DPA using these steps:
-
Start event tracer counter/s:
doca_telemetry_dpa_counter_start(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER); -
Stop event tracer counter/s:
doca_telemetry_dpa_counter_stop(context, all_processes_id, DOCA_TELEMETRY_DPA_COUNTER_TYPE_EVENT_TRACER);
-
Get memory size for event samples list allocation:
uint32_t size; doca_telemetry_dpa_get_perf_event_samples_size(context, process_id, thread_id, &size);
-
Retrieve event tracer sample list:
doca_telemetry_dpa_read_perf_event_list(context, process_id, thread_id, &perf_event_samples_num, &event_info_list);
The current state and type of a counter can be retrieved using doca_telemetry_dpa_get_counter_state() and doca_telemetry_dpa_get_counter_type().
Ensure the maximum number of samples is set before starting counters using doca_telemetry_dpa_set_max_perf_event_samples().
If the process ID is set to address all processes, the thread ID must also be set to address all threads.
Ensure memory is properly allocated using the retrieved size before calling doca_telemetry_dpa_read_perf_event_list().
Reset counters using doca_telemetry_dpa_counter_restart() before reuse.
State Machine
The following section describes the different states the doca_telemetry_dpa context goes through, how to move between states and what is allowed in each state.
Idle
The context has been created and is idle.
In this state, it is expected for the application to:
-
Destroy the context
-
Start the context for processing
Allowed operations:
-
Configuring the context according to section "DOCA Telemetry DPA | Configuration Phase"
It is possible to reach this state as follows:
|
Previous State |
Transition Action |
|---|---|
|
None |
Create the context |
|
Running |
Call stop |
Running
In this state it is expected for the application to:
-
Stop the context.
-
Retrieve process information list
-
Retrieve thread information list.
-
Start/stop/reset counters for profiling capabilities.
-
Retrieve profile samples for cumulative performance counters.
-
Retrieve profile samples for event tracer.
Allowed operations:
-
Calling stop, moving the application to "Idle" state
It is possible to reach this state as follows:
|
Previous State |
Transition Action |
|---|---|
|
Idle |
Successfully start the context |
There are currently no state restrictions on the majority of API functions.
Alternative Datapath Options
DOCA Telemetry DPA supports only CPU-based datapaths.
Running the Sample
-
Refer to the following documents:
-
DOCA Installation Guide for Linux for details on how to install BlueField-related software.
-
NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.
-
-
To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:
cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_dpa meson /tmp/build ninja -C /tmp/buildThe binary
doca_telemetry_dpais created under/tmp/build/.
Sample usage:
Usage: doca_telemetry_dpa [DOCA Flags] [Program Flags]
DOCA Flags:
-h, --help Print a help synopsis
-v, --version Print program version information
-l, --log-level Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
--sdk-log-level Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
-j, --json <path> Parse command line flags from an input json file
Program Flags:
-p, --pci-addr DOCA device PCI device address
-rt, --sample-run-time Total sample run time, in miliseconds
-ct, --counter-type Counter type, cumulus (0) or event (1)
-es, --event-samples Set the maximum number of perf event samples to retrieve
-pi, --process-id Specific process id to address
-ti, --thread-id Specific thread id to address
The sample includes:
-
Locating and opening a DOCA device.
-
Creating a
doca_telemetry_dpainstance. -
Retrieval of all or one specific process
-
Retrieval of all or one specific thread
-
Starting counters for the selected profile capability
-
Retrieving the profile samples for the selected profile capability
-
Displaying the retrieve profile information.
-
Destroying the
doca_telemetry_dpacontext.
Last updated: