DOCA SDK Documentation

DOCA Time Sync Application Guide

This guide provides an example of a Time Sync implementation on top of NVIDIA® BlueField® DPU.

Introduction

The DOCA Time Sync reference application demonstrates how to trigger and correlate events across processors operating on different clock domains within a BlueField DPU environment.

The application triggers events on the x86 host, the BlueField Arm cores, and the Data Path Accelerator (DPA) subsystem. It then leverages the DOCA Clock cross-timestamping functionality (available in DOCA Core) to correlate these events against the NVIDIA® ConnectX® real-time clock (RTC). This allows the application to determine the precise chronological ordering of events occurring across the different subsystems.

The application will use the common ConnectX clock to sync events to. It will automatically detect whether the ConnectX has Real-Time clock or Free Running clock enabled.

The application uses the common ConnectX clock to synchronize events. It will automatically detect whether the ConnectX device has the real-time clock or the free-running clock enabled.

To manually verify if the real-time clock is enabled on the device (if disabled, it defaults to free-running mode), use the following command, replacing <device> with your specific PCIe address or MST device:

sudo mlxconfig -d <device> --enable_verbosity q | grep REAL_TIME_CLOCK_ENABLE

System Design

The DOCA Time Sync architecture consists of two distinct applications:

  • DPU application (doca_time_sync_dpu) – Runs on the BlueField DPU Arm cores.

  • Host application (doca_time_sync_host) – Runs on the x86 host system.

Operational workflow:

  1. The DPU application must be started first. It establishes a DOCA Comch server to listen for incoming connections.

  2. The Host application connects to the Comch server on the DPU.

  3. Event triggering:Host to Arm: The Host application sends a message via the Comch connection to the DPU Arm cores, triggering the first event.Arm to DPA: The DPU application loads a (3.4.0) DOCA DPA kernel and communicates with it to trigger a second event on the DPA subsystem.

  4. All event timestamps are relayed back to the Host application, where they are correlated and ordered based on the common DOCA Clock.

image-2025-12-16_14-6-14-1.png

Application Architecture

The DOCA Time Sync application generates a sequence of four events across three distinct processors, inserting variable delays between each step to simulate real-world processing latency.

Event Sequence

The application tracks the following four events:

  1. x86 Host: Packages and sends a request message to the DPU

  2. BlueField Arm Cores: Receives the message from the Host

  3. DPA Subsystem: Executes a Remote Procedure Call (RPC) triggered by the Arm cores

  4. x86 Host: Receives the response message from the DPU

Clock Synchronization and Correlation

Each event records a timestamp using its processor's local clock:

  • x86 Host & Arm Cores: Use the system RTC.

  • DPA Subsystem: Uses its internal local timer.

To order these events chronologically, all timestamps are correlated against a single common reference clock: the  ConnectX NIC clock embedded in the BlueField DPU.

Host and Arm Core Synchronization

Both the x86 and DPU applications utilize the DOCA Clock cross-timestamping library (part of DOCA Core). This library captures the local clock time and the common NIC clock time simultaneously to establish a precise correlation.

DPA Subsystem Synchronization

The DPA kernel can only capture time using its local timer. To synchronize this local DPA timestamp with the common NIC clock, the Host application performs a retrospective calculation:

  1. Uses DOCA Clock to capture the current relationship between the NIC clock and the DPA timer.

  2. Determines the duration (in seconds/nanoseconds) between the current DPA time and the recorded event time.

  3. Subtracts this delta from the current NIC time to derive the precise NIC time when the DPA event occurred.

Output Logging

The Host application aggregates all data and outputs it to a log file (time_sync.log). Each entry includes:

  • Synchronized time: The calculated time on the common NIC clock.

  • Local time: The raw timestamp from the processor's local clock.

  • Accuracy: The margin of error for the synchronization.

  • Event description: A label identifying the specific event step.

DOCA Libraries

This application leverages the following DOCA libraries:

Refer to their respective programming guide for more information.

Compiling the Application

Please refer to the DOCA Installation Guide for Linux for details on how to install BlueField-related software.

DOCA reference applications are installed with full source code and build instructions. This allows you to compile them as-is or modify the source code to create custom versions.

For more information about the applications as well as development and compilation tips, refer to the DOCA Reference Applications page.

The source code for the application is located in the following directory: 

/opt/mellanox/doca/applications/time_sync/

Compiling All Applications

All DOCA applications are defined under a single Meson project. By default, the build process compiles all of them.

MPI is used for the compilation of this application. Make sure that MPI is installed on your setup (openmpi is provided as part of the installation of DOCA, as part of the doca-all and doca-ofed meta-packages).

Compiling the application requires updating the LD_LIBRARY_PATH and PATH environment variable to include MPI. For example, if openmpi is installed under /usr/mpi/gcc/openmpi-4.1.7rc1, then updating the environment variables should be like the following

export PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/bin:${PATH}
export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.1.7rc1/lib:${LD_LIBRARY_PATH}


To build all applications:

cd /opt/mellanox/doca/applications/
meson /tmp/build
ninja -C /tmp/build

The build system automatically detects the platform architecture.

  • On x86 Host, it generates doca_time_sync_host

  • On BlueField DPU, it generates doca_time_sync_dpu

The binary is created in /tmp/build/time_sync/.

Compiling Only the Current Application

To reduce build time, you can configure Meson to build only the Time Sync application.

Regardless of the method used, the binary (doca_time_sync_host or doca_time_sync_dpu) is created in /tmp/build/time_sync/.

Option 1: Command Line Configuration

Run the following commands to disable all applications and explicitly enable Time Sync:

cd /opt/mellanox/doca/applications/
meson /tmp/build -Denable_all_applications=false -Denable_time_sync=true
ninja -C /tmp/build

Option 2: Configuration File

edit the configuration file directly:

  1. Edit /opt/mellanox/doca/applications/meson_options.txt.
    Set enable_all_applications to falseSet enable_time_sync to true

  2. Run the standard compilation commands:

    cd /opt/mellanox/doca/applications/
    meson /tmp/build
    ninja -C /tmp/build
    

Running the Application

Application Execution

The Time Sync application is distributed as source code and must be compiled before execution.

Running on x86 Host

  • Usage syntax: 

    Usage: doca_time_sync_host [DOCA Flags] [Program Flags]
     
    DOCA Flags:
      -h, --help                        Print a help synopsis
      -v, --version                     Print program version information
      -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --log-filter                      Filter logs from specific modules, separated by comma
      -j, --json <path>                 Parse command line flags from an input json file
     
    Program Flags:
      -p, --pci-addr                    DOCA device PCI address
      -d, --delay                       Delay (msecs) to insert between event triggers (optional)

  • Example execution: 

    sudo ./doca_time_sync_host -p 3b:00.0 -d 1000

    Root Privileges

    The application requires sudo (root privileges) to access cross-timestamping system calls.

    PCIe Addresses

    Ensure 3b:00.0 matches your specific device's PCIe address.

Running on BlueField DPU

  • Usage syntax: 

    Usage: doca_time_sync_dpu [DOCA Flags] [Program Flags]
     
    DOCA Flags:
      -h, --help                        Print a help synopsis
      -v, --version                     Print program version information
      -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
      --log-filter                      Filter logs from specific modules, separated by comma
      -j, --json <path>                 Parse command line flags from an input json file
     
    Program Flags:
      -p, --pci-addr                    DOCA device PCI address
      -r, --repr-addr                   DOCA device representor PCI address (optional)

  • Example execution: 

    sudo ./doca_time_sync_dpu -p 03:00.0 -r 3b:00.0

    Root Privileges

    The application requires sudo (root privileges) to access cross-timestamping system calls.

    PCIe Addresses

    Ensure the device address (03:00.0) and representor address (3b:00.0) match your system configuration.

Command Line Flags

General Flags

Short Flag

Long Flag

Description

-h

--help

Prints a help synopsis and exits

-v

--version

Prints program version information and exits

-l

--log-level

Sets the numeric log level for the application:

  • 10 – DISABLE

  • 20 – CRITICAL 

  • 30 – ERROR

  • 40 – WARNING

  • 50 – INFO

  • 60 – DEBUG

  • 70 – TRACE (requires compilation with TRACE support)

N/A

--sdk-log-level

Sets the SDK numeric log level using the same 10-70 scale as above

N/A

--log-filter

Filters logs from specific modules (comma-separated list)

-j

--json

Parses command-line flags from a specified input JSON file

Refer to DOCA Arg Parser for more information regarding the supported flags and execution modes.

Host Program Flags

Short Flag

Long Flag

Description

p

pci-addr

PCIe address of device to connect DOCA Comch client to, and to cross-timestamp against.

This is a mandatory flag.

d

delay

Value in milliseconds of a delay to insert between the triggering of events. 

The parameter is optional. A default of 1 second (1000 msecs) will be used if excluded.

DPU Program Flags

Short Flag

Long Flag

Description

p

pci-addr

PCIe address of device to setup DOCA Comch server on, and to cross-timestamp against.

This is a mandatory flag.

r

repr-addr

Representor address of the DOCA Comch device to use.

This is an optional flag. If excluded, the first found representor associated with PCIe address will be used.

Troubleshooting

Refer to the NVIDIA BlueField Platform Software Troubleshooting Guide for any issue encountered with the compilation, installation, or execution of the DOCA applications.

Application Code Flow

Common

  1. Parse application argument. 

    1. Initialize arg parser resources and register DOCA general parameters.

      doca_argp_init();
      
    2. Register Time Sync application parameters. 

      time_sync_common_reg_params();
      
    3. Parse the arguments. 

      doca_argp_start();
      
  2. Open DOCA devices for use in the application:

    time_sync_common_open_dev_with_caps();
    
    // DPU only
    time_sync_common_open_repr();
    Parse PCIe address for associated DOCA deviceVerify the selected device has the required capabilitiesComch supportDPA timer supportDOCA DPA support (DPU only)Repr Support (DPU only)Open valid deviceOn DPU, parse and open repr device

  3. Create a DOCA Clock Context:

    time_sync_common_create_clock();

  4. Run Host or DPU-specific code.

  5. Destroy DOCA Clock Context:

    time_sync_common_destroy_clock();

  6. Close DOCA devices:

    time_sync_common_close_devs();

  7. Destroy Arg Parser:

    doca_argp_destroy();

Host (x86) App

  1. Initialize Comch client. 

    time_sync_host_init_comch_client();
    
    1. Create progress engine

    2. Create DOCA Comch Client context

    3. Configure taskpool/callbacks for sending and receiving messages

    4. Start Comch Client

  2. Run main loop.

    time_sync_host_main_loop();
    
    1. Progress until Client is fully connected to Comch Server on DPU

    2. Get event time (on host and NIC) before sending a message

    3. Create and send a message to DPU containing the input delay time in milliseconds

    4. Wait to receive a response from the DPU containing DPU and DPA event times

    5. Get event time (on host and NIC) of message receive

    6. Convert received DPA time to NIC time using cross-timestamping functions

    7. Log the local and synchronized time of all events to 'time_sync.log'

  3. Clean up Comch Client

    time_sync_host_close_comch_client();
    Stop the DOCA Comch ClientProgress until connection is fully shut down and context is IDLEDestroy Client context and progress engine

DPU App

  1. Load DPA application.

    time_sync_dpu_load_dpa_app();
    
    1. Create a new DOCA DPA context

    2. Add an app to the context (app with given name is compiled alongside DPU app - source in /opt/mellanox/doca/applications/time_sync/dpu_device/time_sync_dev.c)

    3. Start the DPA context

  2. Initiate Comch Server.

    time_sync_dpu_init_comch_server();
    
    1. Create progress engine

    2. Create DOCA Comch Server context

    3. Configure taskpool/callbacks for sending and receiving messages

    4. Configure callback for connection events

    5. Start Comch Server

  3. Run main loop.

    time_sync_dpu_main_loop();
    
    1. Progress until an x86 client has established a connection to the Comch Server

    2. Receive a message from ClientExtract the delay from request messageSleep for 'delay' millisecondsGet event time (on ARM and NIC) of message receiveSleep for 'delay' millisecondsTrigger a remote procedure call to the loaded DPA app which will return its local time from running kernelSleep for 'delay' millisecondspackage event times from ARM/DPA and send response message back to host

    3. Progress until x86 has finished and closed Client Comch connection

  4. Clean up Comch Server

    time_sync_dpu_close_comch_server();
    Stop the DOCA Comch ServerProgress until context is IDLEDestroy Server context and progress engine

  5. Clean up DPA application

    time_sync_dpu_unload_dpa_app();
    Stop the DOCA DPA contextDestroy DPA context

References

  • /opt/mellanox/doca/applications/time_sync/

Last updated: