DOCA SDK Documentation

DOCA Telemetry Diag

This guide provides instructions on building and developing applications which require collecting telemetry information provided by NVIDIA® BlueField® and NVIDIA® ConnectX® families of networking platforms.

Introduction

doca_telemetry_diag provides programable access to an on-device mechanism which allows sampling of diagnostic data (e.g., statistics and counters). doca_telemetry_diag allows configuring such parameters as required data IDs or sampling period, and retrieving the generated information in several formats.

Prerequisites

To utilize DOCA Telemetry Diag, your system must meet the following baseline requirements:

  • Firmware: Version >=28/32/40.43.1000 is required for ConnectX-7, BlueField-3, and ConnectX-8 devices.

  • Driver: The fwctl driver must be fully installed and actively loaded on the system.

Verifying the fwctl Driver

To verify that the fwctl driver is successfully loaded, check the device directories: 

$ ls /sys/class/fwctl/
$ ls /dev/fwctl

The expected output for a standard 2-port device is fwctl0 fwctl1.

Manually Loading the Driver

If the directories /sys/class/fwctl or /dev/fwctl do not exist or are empty, the module may be installed but inactive.

Check for the module's presence:

$ grep fwctl -R /lib/modules/$(uname -r)/

If the output confirms the presence of fwctl.ko and mlx5_fwctl.ko, manually load the module and verify its status:

$ sudo modprobe mlx5_fwctl
$ lsmod | grep fwctl

Reinstalling the DOCA Host Package

If you cannot locate the installed fwctl module while manually loading the driver, or if the modprobe command fails to load it successfully, you must reinstall the DOCA Host package.

  1. Download the package (DOCA 3.3.0 example):

    $ wget https://www.mellanox.com/downloads/DOCA/DOCA_v3.3.0/host/doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb

  2. Purge existing DOCA and OFED modules:

    $ sudo for f in $( dpkg --list | grep doca | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep mlnx | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep dpdk | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo for f in $( dpkg --list | grep ofed | awk '{print $2}' ); do echo $f ; apt remove --purge $f -y ; done
    $ sudo /usr/sbin/ofed_uninstall.sh --force
    $ sudo apt-get autoremove

  3. Install the new package and restart services:

    $ sudo dpkg -i doca-host_3.3.0-088000-26.01-ubuntu2204_amd64.deb
    $ sudo apt-get update
    $ sudo apt-get -y install doca-all
    $ sudo /etc/init.d/openibd restart
    Once the reinstallation is complete, confirm the module is successfully loaded according to section "DOCA Telemetry Diag | Verifying the fwctl Driver".

Environment

DOCA Telemetry-based applications can run either on the host machine (ConnectX-7 or BlueField-3 and above) or on the DPU target (BlueField-3 and above).

DOCA Telemetry can only be run inside the DPU (on Arm) with the BlueField configured in DPU mode, as described in BlueField Modes of Operation.

Architecture

Diagnostic data is stored in firmware as a cyclic buffer of samples. Each sample represents the values of all the requested diagnostic data IDs and their corresponding sampling timestamps.

Device and Ownership

The DOCA telemetry library requires a ConnectX/BlueField DOCA device to sample from. The device can be accessed using any of its physical functions (PFs). 

If multiple devices exist in a setup, a doca_telemetry_diag context should be created for each.

The device supports two ownership models:

  1. Single domain

  2. Multiple domains

These impact how the library can be used, but do not impact the underlying hardware resources. The Amount of hardware resources available stays the same in both modes.

Configuring the ownership model can be performed using: DOCA Management

Single Domain Ownership Model

In this ownership model only one domain exists per device. Upon creation, the doca_telemetry_diag context assumes control of the associated hardware resources to prevent conflicts and ensure accurate data sampling. In rare instances, ownership may be overridden (e.g., if a process crashes before releasing ownership). 

The force_ownership parameter may be used when creating the context from a second process.

Once ownership is enforced for one PF, it cannot be claimed by a different PF. It is recommended to always use PF0 to prevent potential conflicts.

Multi Domain Ownership Model

In this ownership model a device can support multiple domains. Each doca_telemetry_diag context will use one of the available domains. This allows for the case where if the first domain does not use all the hardware resources then a second domain could be created to utilize the remaining resources. When in this mode Upon creation, the doca_telemetry_diag context will check if the device can support an additional active domain and if not it will report a failure to acquire the resource. There are some technical limitations when using multi domain mode, the most notable of these are:

  1. Synchronized start is not supported.

  2. Some Data ID values require a hardware resource that cannot be shared between domains. If another domain attempts to use a Data ID which requires the occupied hardware resource, it will be rejected

  3. The force ownership flag is ignored.

  4. The achieved sampling period for a context can be impacted by any already started contexts on the same device.

The multi domain ownership model prevents domains from being isolated when a process crashes or fails to cleanup the domain before exiting. Because of this and because multiple contexts are supported; forcefully taking ownership is no longer required and thus ignored.

It is recommended to always use PF0

Configuration Phase

Configuration

The context can be configured to match the application use case.

To discover if a certain configuration is supported, or what the min/max value for it is, refer to section "DOCA Telemetry Diag | Device Support".

Sampling Mode

The DOCA Telemetry Diagnostics library supports the following operational sampling methods:

  • Single sampling – Samples are collected and stored until the firmware buffer is filled, at which point sampling is automatically terminated. 

    You can manually restart the context to collect new samples, which will overwrite the previous data in the buffer.

  • Repetitive sampling – Samples are continuously collected. When the firmware buffer fills, new samples automatically overwrite the oldest samples. This process continues indefinitely until the context is explicitly stopped.

  • On demand – The device does not continuously collect samples in the background. Instead, the device fetches a single sample of the data strictly upon each manual query.

Considerations for Repetitive Sampling Mode

When configuring the DOCA Telemetry Diagnostics library to repetitive sampling, it is important to ensure that the firmware buffer is adequately sized to handle the data flow between hardware sampling and software retrieval. The firmware buffer size is dictated by the log_max_num_samples property.

  1. Determine sampling rates:Hardware sampling rate – the frequency at which the hardware collects data (e.g., every 100 µsec)Software retrieval rate – the average time interval between successive data retrievals by the software (e.g., every 500 msec)

Calculate AverageSamplesPerRetrieval using the following equation:

 

For example: 

  1. To ensure smooth data processing and prevent data loss, the firmware buffer should be large enough to accommodate at least twice the average number of samples collected during the retrieval period:

     

    For example: 

    samples

Moreover, the number of retrieved samples should be enlarged if the retrieving process may spike occasionally. For example, if the process time between retrieval calls is up to 6 times of the average, then the number of samples should be multiplied by 6+1=7.

Sampling Period

The sampling period can be configured using doca_telemetry_diag_set_sample_period.

In some cases, depending on the number and type of data IDs configured, the actual sampling period may be higher. The actual sampling period can be queried using doca_telemetry_diag_get_sample_period after configuring the data IDs.

After starting the doca_telemetry_diag context with doca_telemetry_diag_start, the actual sampling period may differ from the desired sampling rate. To address this, you can dynamically adjust the sampling period using doca_telemetry_diag_reconfig_sample_period.

Synchronized Start

Diagnostics data is sampled by the device every given sampling period. When sampling this way, each data entry in a sample may be recorded at a slightly different time.

Synchronized start mode enables diagnostics counters to begin all data measurements at the same time (i.e., during the same clock cycle). This way, the sample period is guaranteed to be identical for all samples. Synchronized start diagnostic counters can be configured to be cleared at the beginning of each sampling period.

Not all data IDs can be sampled in synchronized start mode. See section "DOCA Telemetry Diag | Data IDs" for additional details 

This is only supported when a device is configured to use the single domain ownership model

The following diagrams illustrate how synchronized start affects the sampling timeline:

sampling-with-and-without-synced-start.png

In synchronized start mode, counters are stopped during the collection time of each sample (illustrated in red in the diagram). If the application is required to normalize the counter to time, the actual sample period should be considered.

For example, if the global_icmc_hit (GIH) counter is sampled and the sample period is 100 µsec, then the global_icmc_hit per second, should be calculated as follows:

Data IDs

The on-device mechanism provides the following diagnostic data classes:

  • Counter – monotonically increasing and counting different events in the device.If doca_telemetry_diag_set_data_clear is set, the counters are cleared at the beginning of each sampling period (valid only if synchronized start mode is used and operational mode is set to single or repetitive sampling).

  • Statistic – other collected diagnostic data about the performance of the device. Statistic diagnostic data is cleared on each sample.

Each diagnostic data is represented by a unique identifier, the data ID. Appendix "List of Supported Data IDs" lists the currently supported data IDs.

After applying the configuration, the list of data IDs to be sampled should be applied by calling doca_telemetry_diag_apply_counters_list_by_id.

Data ID Configuration Constraints
  • Configuration failures – Not all combinations of data IDs can be configured simultaneously. If the doca_telemetry_diag_apply_counters_list_by_id operation fails, it will return the index of the faulty data ID and the reason for the failure. You can retry the operation after omitting the faulty ID.

  • Synchronized start support – Not all data IDs support synchronized start mode. If you configure synchronized start and the apply operation fails with DOCA_ERROR_BAD_CONFIG, the specified data ID does not support this mode.

Output Formats

The doca_telemetry_diag context supports the following layout modes for sampled data:

Layout Mode

Data ID Presence

Data Size

Timestamp Granularity

Mode 0

Included in output

64 bits

Per data entry

Mode 1

Omitted from output

64 bits

Per sample (start and end)

Mode 2

Omitted from output

32 bits

Per sample (start and end)

The order of the data IDs in the output strictly matches the order in which they were applied using doca_telemetry_diag_apply_counters_list_by_id.

The sample layout of these modes is illustrated in the following diagrams:

sample-layout-modes.png

Device Support

DOCA Telemetry Diagnostics requires a device to operate. For picking a device, refer to "DOCA Core Device Discovery".

As device capabilities may change (see DOCA Core Device Support), it is recommended to select your device using the following method:

  • doca_telemetry_diag_cap_is_supported

Some devices can allow different capabilities as follows:

  • The maximum number of data IDs

  • The maximum number of samples that can be held by the firmware

  • Support for data clear

  • Support for synchronized start

  • Support for different sample modes

  • Support for different timestamp sources

Execution Phase

Once the doca_telemetry_diag context is started, samples can be retrieved by calling the doca_telemetry_diag_query_counters function.

The library only retrieves new samples without duplications and returns fewer samples than requested if there are no more new samples.

There is a slight difference in behavior between the different sampling modes:

  • In case of "On demand" sampling mode, each time the doca_telemetry_diag_query_counters is called, a new sample is captured and retrieved. 

  • For "Single" and "Repetitive" sampling modes, multiple samples can be retrieved in a single call. The application defines the maximum number of samples it wishes to retrieve and supplies a buffer large enough to contain these samples.

    The sample size can be obtained using doca_telemetry_diag_get_sample_size.

    The user should limit the number of requested samples according to the log_max_num_samples property.

  • In case of "Single" sampling mode, once sampling is terminated, the user may call doca_telemetry_diag_restart to restart the sampling process, overwriting the previous results.

  • During the execution phase, you can dynamically adjust the sampling period using doca_telemetry_diag_reconfig_sample_period to change the sampling rate. Note that this function has no effect in "On demand" sampling mode. For other sampling modes, it's recommended to verify the actual sampling period using doca_telemetry_diag_get_sample_period after reconfiguration to ensure it matches the expected value.

State Machine

The following section describes the different states the doca_telemetry_diag context goes through, how to move between states and what is allowed in each state.

Idle

The context is Idle and has ownership of a diagnostics data domain.

In this state it is expected for the application to:

  • Destroy the context (releases the ownership).

  • Apply a configuration, moving the context to "Configured" state.

Allowed operations:

It is possible to reach this state as follows:

Previous State

Transition Action

None

Create the context

Configured

Call stop

Ready

Call stop

Running

Call stop

Configured

In this state it is expected for application to:

  • Apply the list of data IDs configuration using doca_telemetry_diag_apply_counters_list_by_id, moving the context to "Ready" state.

Allowed operations:

  • Checking if a data ID is supported using doca_telemetry_diag_check_data_id 

  • Calling stop, moving it to "Idle" state

It is possible to reach this state as follows:

Previous State

Transition Action

Idle

Successfully apply the configuration, calling doca_telemetry_diag_apply_config.

Ready

All the necessary configuration was applied and the context is ready to start sampling.

In this state it is expected that application:

  • Start the context, moving it to "Running" state.

Allowed operations:

  • Calling stop, moving it to "Idle" state.

It is possible to reach this state as follows:

Previous State

Transition Action

Configured

Successfully apply the list of counters, calling doca_telemetry_diag_apply_counters_list_by_id.

Running

In this state samples are generated and can be retrieved.

In this state it is expected that application:

  • Query the counters.

Allowed operations:

  • For "Single" sample mode, restarting the context if needed.

  • Calling stop, moving it to "Idle" state

It is possible to reach this state as follows:

Previous State

Transition Action

Ready

Successfully start the context

Alternative Datapath Options

DOCA Telemetry Diag supports only CPU-based datapaths.

DOCA Telemetry Diagnostics Sample

This section describes a telemetry diagnostics sample based on the doca_telemetry library.

The sample illustrates the utilization of DOCA telemetry diagnostics APIs to initialize and configure the doca_telemetry_diag context, as well as querying and parsing diagnostic counters.

Running the Sample

  1. Refer to the following documents:(3.4.0) DOCA Installation Guide for Linux for details on how to install BlueField-related software.NVIDIA BlueField Platform Software Troubleshooting Guide for any issue you may encounter with the installation, compilation, or execution of DOCA samples.

  2. To build a given sample, run the following command. If you downloaded the sample from GitHub, update the path in the first line to reflect the location of the sample file:

    cd /opt/mellanox/doca/samples/doca_telemetry/telemetry_diag
    meson /tmp/build
    ninja -C /tmp/build
    

    The binary doca_telemetry_diag is created under/tmp/build/.

Sample usage:

Usage: doca_telemetry_diag [DOCA Flags] [Program Flags]

DOCA Flags:
  -h, --help                        Print a help synopsis
  -v, --version                     Print program version information
  -l, --log-level                   Set the (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  --sdk-log-level                   Set the SDK (numeric) log level for the program <10=DISABLE, 20=CRITICAL, 30=ERROR, 40=WARNING, 50=INFO, 60=DEBUG, 70=TRACE>
  -j, --json <path>                 Parse command line flags from an input json file

Program Flags:
  -p, --pci-addr                    DOCA device PCI device address
  -di, --data-ids                   Path to data ids JSON file
  -o, --output                      Output CSV file - default: "/tmp/out.csv"
  -rt, --sample-run-time            Total sample run time, in seconds
  -sp, --sample-period              Sample period, in nanoseconds
  -ns, --log-num-samples            Log max number of samples
  -sr, --max-samples-per-read       Max num samples per read
  -sm, --sample-mode                sample mode (0 - single, 1 - repetitive, 2 - on demand)
  -of, --output-format              output format (0 - output_format_0, 1 - output_format_1, 2 - output_format_2)
  -sym, --sync-mode                 sync mode (0 - sync_mode_no_sync, 1 - sync_mode_sync_start)
  -tss, --timestamp-source          timestamp source (0 - timestamp_source_frc, 1 - timestamp_source_rtc)
  -f, --force-ownership             Force ownership when creating context
  -e, --example-json-path           Generate an example json file with the default data_ids to the given path and exit immediately. This file can be used as input later on. All other flags are ignored
  -rs, --reconfig-sample-period     Reconfig sample period, in nanoseconds. It means after the first sampling run with the sample_period, if the reconfig_sample_period is non-0, before the current diag instance is stopped and destroyed, the second sampling run will be done based on the reconfig_sample_period.

The sample logic includes:

  1. Locating a DOCA device.

  2. Initializing and configuring the doca_telemetry_diag instance.

  3. Applying a list of data IDs to sample (either from a source JSON file or the default data IDs).

  4. Starting the doca_telemetry_diag instance.

  5. Allocating a buffer according to the sample size and amount of desired samples.

  6. Querying the actual sampling period, after start.

  7. Retrieving samples and writing the retrieved data to a *.csv file (either once or periodically).

  8. If reconfig_sample_period is non-zero, retrieving samples and writing the sample file again based on the reconfigured sampling period.

  9. Stopping the data IDs sampling.

  10. Releasing all resources and destroying the context.

If the error message cannot acquire ownership is shown:

  • In the single domain ownership model; you can use the command option --force-ownership to take the ownership (this may; reclaim an isolated context if a process failed to release it previously, or forcefully take control from an existing process. Use this option with caution)

  • In the multiple domain ownership model; the maximum number of domains for this device are already allocated, One or more of the other domains must be released so they can be reused.

The sample can use data IDs given by the user using a JSON file. An example of the JSON file format can be created by using the -e flag on the sample, to export an example JSON file containing the default data IDs to a given path. 

 Visualization of Data IDs

The Ngauge tool is useful to visualize the data IDs prior to use or deployment in your application. Please see the Ngauge documentation for more information.

Appendix – List of Supported Data IDs

The following table lists the data IDs currently supported by DOCA:

Name

Description

Data Class

Data ID

port_rx_bytes

The number of received bytes on the physical port


Counter

0x10200001000000XX

  • XX - Local port


port_priority_rx_bytes

The number of received bytes on the physical port and priority


Counter

0x1020000200000YXX

  • XX - Local port

  • Y - Local port priority

port_rx_packets

The number of received packets on the physical port


Counter

0x10200003000000XX

  • XX - Local port


port_priority_rx_packets

The number of received packets on the physical port and priority


Counter

0x1020000400000YXX

  • XX - Local port

  • Y – local port priority

port_rx_discard_buf_packets

The number of received packets dropped due to lack of buffers on a physical port

Counter

0x10200005000000XX

  • XX - Local port


port_priority_rx_pauses_packets

The number of link-layer pause packets received on a physical port and priority

Counter

0x1020000600000YXX

  • XX - Local port

  • Y - Local port priority

host_rx_buffer_discards

The number of packets discarded due to no available data or descriptor buffers in the RX buffer, per host

Counter

0x10400001000000XX

  • XX - Host ID

host_rx_transport_pass_packets

The number of packets that pass from the RX Transport to the Scatter engine, per host

Counter

0x10800001000000XX

  • XX - Host ID

host_rx_transport_out_of_buffer_packets

The number of dropped packets due to a lack of WQE for the associated QPs/RQs (excluding hairpin QPs/RQs)

Counter

0x10800002000000XX

  • XX - Host ID


host_rx_transport_out_of_buffer_hairpin_packets

The number of dropped packets due to a lack of WQE for the associated hairpin QPs/RQs

Counter

0x10800003000000XX

  • XX - Host ID


port_rx_transport_ecn_packets

The number of RoCEv2 packets received by the notification point which were marked for experiencing the congestion (i.e., ECN bits 11 on the ingress RoCE traffic), per port

Counter

0x10800004000000XX

  • XX – Local port


port_rx_transport_cnp_handled_packets

The number of CNP received packets handled by the Reaction Point, per port

Counter

0x10800005000000XX

  • XX – Local port


port_tx_transport_cnp_sent_packets

The number of CNP packets sent by the Notification Point, per port

Counter

0x11000001000000XX

  • XX – Local port


tx_transport_done_due_to_cc_deschedule_events

The number of QP descheduled due to congestion control rate limitation

Counter

0x1100000200000000


port_tx_bytes

The number of transmitted bytes on the physical port (excluding loopback traffic)

Counter

0x11400001000000XX

  • XX - Local port


port_priority_tx_bytes

The number of transmitted bytes on the physical port and priority (excluding loopback traffic)

Counter

0x1140000200000YXX

  • XX - Local port

  • Y - Local port priority

port_tx_packets

The number of transmitted packets on the physical port (excluding loopback traffic)

Counter

0x11400003000000XX

  • XX - Local port


port_priority_tx_packets

The number of transmitted packets on the physical port and priority (excluding loopback traffic)

Counter

0x1140000400000YXX

  • XX - Local port

  • Y - Local port priority

port_priority_tx_pauses_packets

The number of link-layer pause packets transmitted on a physical port and priority

Counter

0x1140000500000YXX

  • XX - Local port

  • Y - Local port priority

pcie_link_inbound_bytes

The number of bytes received from the PCIe toward the device, per PCIe link

Counter

0x1160000100ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_outbound_bytes

The number of bytes transmitted from the device toward the PCIe, per PCIe link

Counter

0x1160000200ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_inbound_data_bytes

The number of data bytes received from the PCIe (excluding headers) toward the device, per PCIe link

Counter

0x1160000300ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_outbound_data_bytes

The number of data bytes transmitted from the device toward the PCI (excluding headers), per PCIe link

Counter

0x1160000400ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_write_stalled_time_no_posted_data_credits_ns

The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient data credits per PCIe link

Counter

0x1160000500ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_write_stalled_time_no_posted_header_credits_ns

The time period (in nanoseconds) in which the device had outbound posted write requests but stalled due to insufficient header credits per PCIe link

Counter

0x1160000600ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_read_stalled_time_no_non_posted_data_credits_ns

The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient data credits per PCIe link

Counter

0x1160000700ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_read_stalled_time_no_non_posted_header_credits_ns

The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to insufficient header credits per PCIe link

Counter

0x1160000800ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_read_stalled_time_no_completion_buffers_ns

The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to no NIC completion buffers per PCIe link

Counter

0x1160000900ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_tclass_read_stalled_time_ordering_ns

The time period (in nanoseconds) in which the device had outbound non-posted read requests but stalled due to PCIe ordering semantics per PCIe link and PCIe tclass

Counter

0x1160000aZZZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZZZ – (tclass (0 – 7) << 6) | (Depth (0 – 63))

pcie_link_latency_total_read_ns

The total latency (in nanoseconds) for all PCIe read from the device per PCIe link

Statistic

0x1160000b00ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_latency_total_read_packets

The total number of packets used for the pcie_link_latency_total_read_ns calculation

Statistic

0x1160000c00ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_latency_max_read_ns

The maximum latency (in nanoseconds) for a single PCIe read from the device per PCIe link

Statistic

0x1160000d00ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

pcie_link_latency_min_read_ns

The minimum latency (in nanoseconds) for a single PCIe read from the device per PCIe link

Statistic

0x1160000e00ZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZ – Depth (0 – 63)

mplhr_pcie_link_latency_read_hist_packets

The number of PCIe reads with latency in the range of the PCIe Latency Histogram bin per PCIe link.
The sampled value will be the same as the MPLH corresponding bin entry

"Bin" is currently 0→9 (10 bins in total)


0x1160000fZZZZYYXX

  • XX – Node

  • YY – PCIe index

  • ZZZZ – (Bin (0 – 31) << 6) | (Depth (0 – 63))

global_completion_engine_rx_cqes

Number of responder (RX) CQEs

Counter

0x10c0000100000000

function_completion_engine_rx_cqes

Number of RX CQEs per function

Counter

0x10c000020000XXXX

  • XXXX – vhca_id

global_completion_engine_tx_cqes

Number of requestor (TX) CQEs

Counter

0x10c0000400000000

function_completion_engine_tx_cqes

Number of TX CQEs per function

Counter

0x10c000050000XXXX

  • XXXX – vhca_id

global_icmc_request

Number of accesses to ICMC

Counter

0x1180000100000000

global_icmc_hit

Number of ICMC hits

Counter

0x1180000200000000

global_icmc_miss

Number of ICMC misses

Counter

0x1180000300000000

We don't have a way to export this macro.

Last updated: