Overview
The DOCA DPA Device Verbs library provides RDMA operations support for DPA applications, enabling high-performance RDMA operations (Send, Receive, Read, Write, Atomic) directly on the DPA device without CPU involvement.
The library follows an RDMA-core/ibverbs-like API design pattern, making it intuitive for developers experienced with traditional RDMA programming. It maintains the same conceptual model of work requests (WRs), queue pairs (QPs), and completion queues (QPs) but is optimized for execution within DPA kernels.
The library uses a WR-based model for posting RDMA operations:
-
Send WRs (
doca_dpa_dev_verbs_send_wr) – For outbound RDMA operations, including Send, RDMA Write, RDMA Read, and Atomic operations. -
Receive WRs (
doca_dpa_dev_verbs_recv_wr) – For posting receive buffers for incoming data.
Key features:
-
DPA-native RDMA operations – Execute RDMA operations directly on the DPA device.
-
RDMA-core compatible interface – Familiar API design following rdma-core/ibverbs patterns.
-
WR-based model – Post send and receive WRs similar to traditional RDMA programming.
-
Support for QPs and SRQs – Enable QPs and Shared Receive Queues (SRQ).
-
Scatter-gather (SG) operations – Support complex memory layouts with multiple regions per operation.
Deployment
The DOCA DPA Verbs library and header files are:
-
libdoca_dpa_dev_verbs.a(DOCA Libs) -
doca_dpa_dev_verbs.h(DOCA Includes)
Architecture
The DOCA DPA SDK does not implement multi-thread synchronization primitives, and all DOCA DPA objects are non-thread-safe. Developers must ensure that both the user program and kernels are designed to prevent race conditions.
Component breakdown:
-
Host-side libraries:DOCA RDMA Verbs Library: Handles QP/SRQ creation, configuration, and lifecycle management.DOCA DPA Library: Manages DPA context creation, DPA completion contexts, threads, memory management, and overall DPA orchestration.
-
DPA device libraries:DPA Device Verbs Library: Device-side library for direct RDMA operations.DPA Device Library: Device-side completion context processing and completion element handling.
-
DPA hardware: The underlying DPA device that executes operations.
Core Concepts
DPA Device Handles
The DOCA DPA Verbs library uses specific handles to represent DPA device resources:
-
doca_dpa_dev_verbs_qp_t: Handle for a QP on the DPA device. -
doca_dpa_dev_verbs_srq_t: Handle for a SRQ on the DPA device. -
doca_dpa_dev_completion_t: Handle for a CQ on the DPA device.
Work Request Structures
The library defines structures for WRs:
-
doca_dpa_dev_verbs_send_wr: Structure for send WRs. -
doca_dpa_dev_verbs_recv_wr: Structure for receive WRs. -
doca_dpa_dev_verbs_sge: SG element for describing memory regions.
These structures are used to configure and post WRs to QPs or SRQs.
Operation Types
The DOCA DPA Verbs library supports various RDMA operation types:
-
Send operations:
SENDandSEND_WITH_IMM -
RDMA operations:
RDMA_WRITE,RDMA_WRITE_WITH_IMM, andRDMA_READ -
Atomic operations:
ATOMIC_FETCH_ADD
DPA Completion Context Management
The library provides a completion context management system that spans both host and device sides:
-
Host-side completion contexts (
struct doca_dpa_completion): Managed by the DOCA DPA Library, with configurable queue size, thread attachment, and lifecycle control. -
Device-side completion context processing: Utilizes the
doca_dpa_dev_completion_thandle for processing completions on the DPA device. -
DPA completion elements (
doca_dpa_dev_completion_element_t): Represent individual completion events containing metadata about completed operations. -
Supported completion types include send, receive (RDMA Write with Immediate, Send, Send with Immediate), and error completions.
API Reference
Enumerations and Constants
The doca_dpa_dev_verbs.h header file contains complete enum definitions and values.
Send Work Request Opcodes
The enum doca_dpa_dev_verbs_send_wr_opcode defines RDMA operation types:
-
WRITE: One-sided RDMA write operation. -
WRITE_WITH_IMM: RDMA write with immediate data, generating a completion on the receiver. -
SEND: Two-sided send operation requiring a posted receive buffer on the remote side. -
SEND_WITH_IMM: Send operation with immediate data attached. -
READ: One-sided RDMA read operation retrieving data from remote memory. -
ATOMIC_FETCH_ADD: Atomic fetch-and-add operation on a remote memory location.
Send Work Request Flags
The enum doca_dpa_dev_verbs_send_wr_flags controls WR behavior:
-
SIGNALED: Generates a CQ entry (CQE) when the operation completes. -
SOLICITED: Requests a solicited event on the remote side (used with send operations).
Fence Modes
The enum doca_dpa_dev_verbs_send_wr_fm controls ordering and synchronization between WRs:
-
NO_FENCE: No ordering constraints for maximum performance. -
INITIATOR_SMALL_FENCE: Light ordering constraint for local operations. -
FENCE: Standard fence ensuring previous operations complete before this one. -
STRONG_ORDERING_FENCE: Strongest ordering guarantee for critical operations. -
FENCE_AND_INITIATOR_SMALL_FENCE: Combined fence modes for specific use cases.
SRQ Types
The enum doca_dpa_dpa_dev_verbs_srq_type defines SRQ implementation types:
-
LINKED_LIST: SRQ implemented as a linked list structure. -
CONTIGUOUS: SRQ implemented as a contiguous memory buffer.
SG Element (SGE) Structure
The struct doca_dpa_dev_verbs_sge represents a memory region for data transfer:
-
addr: Virtual address of the memory region (uint64_t). -
length: Length of the memory region in bytes (uint32_t). -
lkey: Local key for the memory region (uint32_t).
For non-fully occupied SG lists, set the last entry's lkey field to DOCA_DPA_DEV_VERBS_SGE_TERMINATING_LKEY (0x100) to indicate the end of valid entries.
Send Work Request Configuration
See doca_dpa_dev_verbs_send_wr_set_* and doca_dpa_dev_verbs_send_wr_get_* functions in doca_dpa_dev_verbs.h for complete API signatures.
These APIs configure send WRs before posting them to a QP. Configuration must be completed before posting.
Operation Type
doca_dpa_dev_verbs_send_wr_set/get_opcode(): Sets/gets the RDMA operation type (SEND, WRITE, READ, ATOMIC_FETCH_ADD), determining the WR's fundamental behavior
Memory Settings
-
doca_dpa_dev_verbs_send_wr_set/get_sg_list(): Sets/gets the SG list pointing to local memory regions containing the data to send -
doca_dpa_dev_verbs_send_wr_set/get_sg_num_sge(): Sets/gets the number of SG elements in the list
Control and Synchronization
-
doca_dpa_dev_verbs_send_wr_set/get_send_flags(): Sets/gets completion generation and solicited event control flags -
doca_dpa_dev_verbs_send_wr_set/get_fence_mode(): Sets/gets ordering constraints between WRs
Optional Data
-
doca_dpa_dev_verbs_send_wr_set/get_imm_data(): Sets/gets immediate data forSEND_WITH_IMMandWRITE_WITH_IMMoperations -
doca_dpa_dev_verbs_send_wr_set/get_invalidate_rkey(): Sets/gets remote key to invalidate forSEND_WITH_INVoperations
RDMA-specific Configuration
Required for RDMA WRITE and READ operations:
-
doca_dpa_dev_verbs_send_wr_set/get_rdma_remote_addr(): Sets/gets the target memory address on the remote node -
doca_dpa_dev_verbs_send_wr_set/get_rdma_rkey(): Sets/gets the remote memory key for accessing the target memory region
Atomic Operations Configuration
Required for atomic operations:
-
doca_dpa_dev_verbs_send_wr_set/get_atomic_remote_addr(): Sets/gets the target memory address for the atomic operation -
doca_dpa_dev_verbs_send_wr_set/get_atomic_rkey(): Sets/gets the remote memory key for the atomic operation -
doca_dpa_dev_verbs_send_wr_set/get_atomic_compare_add(): Sets/gets the value to add in fetch-and-add operations -
doca_dpa_dev_verbs_send_wr_set/get_atomic_swap(): Sets/gets the swap value for compare-and-swap operations
Receive Work Request Configuration
See doca_dpa_dev_verbs_recv_wr_set_* and doca_dpa_dev_verbs_recv_wr_get_* functions in doca_dpa_dev_verbs.h for complete API signatures.
These APIs configure and query receive WRs before posting them to a QP or SRQ. Receive WRs prepare buffers to receive incoming data from remote nodes. All configuration must be completed before calling the posting APIs.
Configuration
-
doca_dpa_dev_verbs_recv_wr_set/get_sg_list(): Sets/gets the SG list pointing to local memory regions where incoming data will be stored -
doca_dpa_dev_verbs_recv_wr_set/get_sg_num_sge(): Sets/gets the number of SG elements in the list
The receive WR configuration is simpler than send WRs since receive operations are passive - they only specify where to store incoming data.
Work Request Posting
See doca_dpa_dev_verbs_qp_post_* functions in doca_dpa_dev_verbs.h for complete API signatures.
This section provides APIs for posting send and receive WRs to QPs. All posting functions return a WR counter that can be matched with completion events.
Standard Work Request Posting
-
doca_dpa_dev_verbs_qp_post_send_wr(): Posts a configured send WR to the send queue. Returns the send WR counter for completion tracking. -
doca_dpa_dev_verbs_qp_post_recv_wr(): Posts a receive WR to the receive queue. Prepares the queue to receive incoming data. Returns the send WR counter for completion tracking.
Raw WQE Posting
-
doca_dpa_dev_verbs_qp_post_send_raw_wqe(): Posts a custom-built send Work Queue Element directly to hardware, bypassing high-level WR processing. -
doca_dpa_dev_verbs_qp_post_recv_raw_wqe(): Posts a custom-built receive WQE with specified size.
SG List Usage
When using a SG list (sg_list) that is not fully occupied, set the last entry's lkey field to DOCA_DPA_DEV_VERBS_SGE_TERMINATING_LKEY to indicate the end of valid entries.
Commit Operations
See doca_dpa_dev_verbs_qp_*commit* functions in doca_dpa_dev_verbs.h for complete API signatures.
This section provides APIs for committing posted WRs to hardware, making them available for processing by the DPA device. WRs must be committed for the hardware to process them.
Standard Commit Operations
-
doca_dpa_dev_verbs_qp_commit_send(): Commits all pending send WRs to hardware with internal memory fence for ordering guarantees. -
doca_dpa_dev_verbs_qp_commit_recv(): Commits all pending receive WRs to hardware with internal memory fence.
Lightweight Commit Operations
User must perform memory fence operations before calling these functions.
-
doca_dpa_dev_verbs_qp_lw_commit_send(): Lightweight send commit without internal memory fence. Higher performance but requires manual memory synchronization. -
doca_dpa_dev_verbs_qp_lw_commit_recv(): Lightweight receive commit without internal memory fence. User responsible for proper memory ordering.
Shared Receive Queue Operations
Basic SRQ Operations
-
doca_dpa_dev_verbs_srq_post_recv_wr(): Posts a receive WR to the SRQ, specifying the SRQ type and receive WR structure. -
doca_dpa_dev_verbs_srq_commit_recv(): Commits pending receive WRs to the SRQ with internal memory fence. -
doca_dpa_dev_verbs_srq_lw_commit_recv(): Lightweight commit for SRQ receive WRs without internal memory fence. User responsible for memory fence operations.
SRQ Raw WQE Operations
doca_dpa_dev_verbs_srq_post_recv_raw_wqe(): Posts a custom-built receive WQE to the SRQ with specified type and size. Follows same principles as QP raw WQE posting.
SG List Usage
When providing a SG list (sg_list) that is not fully occupied, the user must set the last entry's lkey field to DOCA_DPA_DEV_VERBS_SGE_TERMINATING_LKEY to indicate the end of the valid entries.
SRQ Management
This API is relevant only for Linked-List SRQ.
doca_dpa_dev_verbs_srq_linked_list_ack_wr(): Acknowledges processed receive WR in linked list SRQ implementation, specifying the receive WR count (doca_dpa_dev_completion_element_get_wqe_counter() return value) to acknowledge.
Query APIs
The following APIs are for debug and inspection purposes only. Do not modify the returned WQ buffers or DBR addresses, as this can cause undefined behavior and system instability.
Queue Pair Query APIs
-
doca_dpa_dev_verbs_qp_get_wq(): Retrieves work queue attributes, including SQ/RQ buffer addresses, entry counts, and receive WQE size. The returned buffers are read-only; modifying them can cause undefined behavior. -
doca_dpa_dev_verbs_qp_get_dbr_addr(): Returns the doorbell record address for the QP. The returned address is read-only; modifying it can cause undefined behavior. -
doca_dpa_dev_verbs_qp_get_qpn(): Gets the QP number. -
doca_dpa_dev_verbs_qp_get_user_index(): Retrieves the user-assigned index for the QP.
SRQ Query APIs
-
doca_dpa_dev_verbs_srq_get_srqn(): Gets the SRQ number. -
doca_dpa_dev_verbs_srq_get_wq(): Retrieves SRQ work queue attributes, including buffer address, entry count, and WQE size. The returned buffers are read-only; modifying them can cause undefined behavior.
Integration with Host-side DOCA RDMA Verbs
The DPA Device Verbs library integrates with host-side DOCA RDMA Verbs to provide a complete RDMA solution.
QP/SRQ Configuration Modes
Two mutually exclusive modes are available for configuring QPs and SRQs with DPA integration.
Mode 1: DPA Context Integration (Basic)
This mode provides full DPA integration, where the DPA context manages all DPA-related resources:
-
Set the DPA context using
doca_verbs_qp_init_attr_set_dpa_ctx()for QPs ordoca_verbs_srq_init_attr_set_dpa_ctx()for SRQs. -
The DPA context automatically handles memory allocation, doorbell records, and user access regions.
-
This is the basic mode for DPA applications due to simplified resource management.
Mode 2: External Resource Management (Advanced)
This mode provides fine-grained control over DPA resources:
-
Set external user memory (UMEM) for the work queue buffer using
doca_verbs_qp_init_attr_set_external_umem()ordoca_verbs_srq_init_attr_set_external_umem(). -
Set external doorbell record (DBR) using
doca_verbs_qp_init_attr_set_external_dbr_umem()ordoca_verbs_srq_init_attr_set_external_dbr_umem(). -
Set external user access region (UAR) using
doca_verbs_qp_init_attr_set_external_uar(). -
The application is responsible for manual management of all DPA resources.
-
This mode requires a deep understanding of DPA resource management and memory allocation.
-
It is used for scenarios requiring custom resource allocation strategies and full user control.
Notes:
-
These two modes are mutually exclusive; you cannot mix DPA context mode with external resource mode.
-
Mode 1 is the basic mode for applications.
-
Mode 2 should only be used when custom resource management is required.
-
All resources in Mode 2 must be properly aligned and configured according to DPA hardware requirements.
Host-side Setup Requirements
For Mode 1 (DPA Context Integration)
-
Create a DOCA RDMA Verbs context and protection domain.
-
Retrieve the DOCA device from the Verbs PD.
-
Create a DPA context using
doca_dpa_create(). -
Configure QP initialization attributes.
-
Set the DPA context for the QP/SRQ.
-
Create and configure DPA completion contexts.
-
Associate completion contexts with QPs.
For Mode 2 (External Resource Management)
-
Create a DOCA RDMA Verbs context and protection domain.
-
Retrieve the DOCA device from the Verbs PD.
-
Allocate and configure external WQ UMEM.
-
Allocate and configure external DBR UMEM.
-
Allocate and configure external UAR.
-
Configure QP initialization attributes.
-
Set external resources for the QP/SRQ.
-
Create and configure DPA completion contexts.
-
Associate completion contexts with QPs.
Usage Patterns
Basic DPA QP Setup Pattern
// Create DOCA RDMA Verbs context and PD
doca_verbs_context_create(devinfo, 0, &verbs_ctx);
doca_verbs_pd_create(verbs_ctx, &pd);
// Retrieve DOCA device from Verbs PD
doca_verbs_pd_as_doca_dev(pd, &dev);
// Create DPA context and completion contexts
doca_dpa_create(dev, &dpa_ctx);
doca_dpa_completion_create(dpa_ctx, 256, &send_completion);
doca_dpa_completion_create(dpa_ctx, 256, &recv_completion);
// Configure QP with DPA completions
doca_verbs_qp_init_attr_create(&qp_init_attr);
doca_verbs_qp_init_attr_set_pd(qp_init_attr, pd);
doca_verbs_qp_init_attr_set_external_datapath_en(qp_init_attr, 1);
doca_verbs_qp_init_attr_set_dpa_ctx(qp_init_attr, dpa_ctx);
doca_verbs_qp_init_attr_set_send_dpa_completion(qp_init_attr, send_completion);
doca_verbs_qp_init_attr_set_receive_dpa_completion(qp_init_attr, recv_completion);
// Create queue pair
doca_verbs_qp_create(verbs_ctx, qp_init_attr, &qp);
// Get DPA handles
doca_dpa_dev_verbs_qp_t dpa_qp;
doca_dpa_dev_completion_t send_comp, recv_comp;
doca_verbs_qp_get_dpa_handle(qp, dpa_ctx, &dpa_qp);
doca_dpa_completion_get_dpa_handle(send_completion, &send_comp);
doca_dpa_completion_get_dpa_handle(recv_completion, &recv_comp);
Advanced DPA QP Setup Pattern
// Create DOCA RDMA Verbs context and PD
doca_verbs_context_create(devinfo, 0, &verbs_ctx);
doca_verbs_pd_create(verbs_ctx, &pd);
// Retrieve DOCA device from Verbs PD
doca_verbs_pd_as_doca_dev(pd, &dev);
// Create DPA context and completion contexts
doca_dpa_create(dev, &dpa_ctx);
doca_dpa_completion_create(dpa_ctx, 256, &send_completion);
doca_dpa_completion_create(dpa_ctx, 256, &recv_completion);
// Allocate external UMEM for QP using DPA heap addresses
uint64_t qp_wq_dpa_addr = doca_dpa_mem_alloc(dpa_ctx, qp_umem_size);
doca_umem_dpa_create(dpa_ctx, qp_wq_dpa_addr, &qp_umem);
uint64_t dbr_dpa_addr = doca_dpa_mem_alloc(dpa_ctx, qp_dbr_umem_size);
doca_umem_dpa_create(dpa_ctx, dbr_dpa_addr, &qp_dbr_umem);
// Create UAR for doorbell access
doca_uar_dpa_create(dpa_ctx, &dpa_uar);
// Configure QP with external resources
doca_verbs_qp_init_attr_create(&qp_init_attr);
doca_verbs_qp_init_attr_set_pd(qp_init_attr, pd);
doca_verbs_qp_init_attr_set_external_datapath_en(qp_init_attr, 1);
doca_verbs_qp_init_attr_set_external_umem(qp_init_attr, qp_umem, 0);
doca_verbs_qp_init_attr_set_external_dbr_umem(qp_init_attr, qp_dbr_umem, 0);
doca_verbs_qp_init_attr_set_external_uar(qp_init_attr, dpa_uar);
doca_verbs_qp_init_attr_set_send_dpa_completion(qp_init_attr, send_completion);
doca_verbs_qp_init_attr_set_receive_dpa_completion(qp_init_attr, recv_completion);
// Create queue pair
doca_verbs_qp_create(verbs_ctx, qp_init_attr, &qp);
// Get DPA handles
doca_dpa_dev_verbs_qp_t dpa_qp;
doca_dpa_dev_completion_t send_comp, recv_comp;
doca_dpa_dev_verbs_qp_t dpa_qp;
doca_verbs_qp_get_dpa_handle(qp, dpa_ctx, &dpa_qp);
SRQ Setup Pattern
// Create SRQ with DPA context
doca_verbs_srq_init_attr_create(&srq_init_attr);
doca_verbs_srq_init_attr_set_pd(srq_init_attr, pd);
doca_verbs_srq_init_attr_set_dpa(srq_init_attr, dpa_ctx);
doca_verbs_srq_create(verbs_ctx, srq_init_attr, &srq);
// Get SRQ handle and post receives
doca_dpa_dev_verbs_srq_t dpa_srq;
doca_verbs_srq_get_dpa_handle(srq, dpa_ctx, &dpa_srq);
RDMA Write Operation (Data path example)
__dpa_global__ void dpa_rdma_write_example(doca_dpa_dev_verbs_qp_t qp_handle,
uint64_t local_addr,
uint32_t length,
uint32_t lkey,
uint64_t remote_addr,
uint32_t rkey)
{
struct doca_dpa_dev_verbs_send_wr send_wr;
struct doca_dpa_dev_verbs_sge sge;
// Configure scatter-gather element
sge.addr = local_addr;
sge.length = length;
sge.lkey = lkey;
// Configure RDMA write work request
doca_dpa_dev_verbs_send_wr_set_opcode(&send_wr, DOCA_DPA_DEV_VERBS_SEND_WR_OPCODE_WRITE);
doca_dpa_dev_verbs_send_wr_set_sg_list(&send_wr, &sge);
doca_dpa_dev_verbs_send_wr_set_sg_num_sge(&send_wr, 1);
doca_dpa_dev_verbs_send_wr_set_rdma_remote_addr(&send_wr, remote_addr);
doca_dpa_dev_verbs_send_wr_set_rdma_rkey(&send_wr, rkey);
doca_dpa_dev_verbs_send_wr_set_fence_mode(&send_wr, DOCA_DPA_DEV_VERBS_SEND_WR_FM_NO_FENCE);
doca_dpa_dev_verbs_send_wr_set_send_flags(&send_wr, DOCA_DPA_DEV_VERBS_SEND_WR_FLAGS_SIGNALED);
// Post and commit
uint32_t wr_count = doca_dpa_dev_verbs_qp_post_send_wr(qp_handle, &send_wr);
doca_dpa_dev_verbs_qp_commit_send(qp_handle);
}
Optimizing Performance with Compile-Time Constants
For better performance:
-
Use compile-time constant values for the
opcodeparameter indoca_dpa_dev_verbs_send_wr_set_opcode(). This allows the compiler to optimize the code path. -
Use compile-time constant values for the
srq_typeparameter indoca_dpa_dev_verbs_srq_post_recv_wr(). This enables the compiler to generate more efficient code paths for SRQ operations.
Example: Use enum values like DOCA_DPA_DEV_VERBS_SEND_WR_OPCODE_WRITE, DOCA_DPA_DEV_VERBS_SRQ_TYPE_LINKED_LIST, or DOCA_DPA_DEV_VERBS_SRQ_TYPE_CONTIGUOUS as literal constants instead of passing them through variables.
Last updated:

