NVIDIA Accelerated IO (XLIO) Documentation

XLIO Tuning Report

The XLIO Tuning Report is a post-run diagnostic report generated when an XLIO-enabled process exits. It summarizes the runtime environment, active XLIO profile, effective configuration, traffic counters, socket state, and derived performance indicators. The report also annotates common anomalies with # WARNING comments so that configuration or system-level problems can be identified quickly.

Use the tuning report when:

  • A workload has lower throughput or higher latency than expected.

  • XLIO logs report buffer allocation failures, hardware receive drops, transmit Work Queue Element (WQE) exhaustion, or other resource pressure.

  • You need to verify which JSON configuration values, profiles, or auto-corrections were active during a run.

  • You need a compact diagnostic artifact to share with NVIDIA support.

Enabling the Tuning Report

The report is controlled by monitor.report.mode, which maps to the legacy XLIO_PRINT_REPORT environment variable.

JSON Configuration Value

Legacy Environment Variable

Default

Description

monitor.report.mode

XLIO_PRINT_REPORT

auto

Controls whether the tuning report is generated at process exit.

monitor.report.file_path

XLIO_REPORT_FILE

/tmp/xlio_report_%d.txt

Output path for the report. %d is replaced with the process ID.

monitor.stats.fd_num

XLIO_STATS_FD_NUM

0

Maximum number of sockets monitored by the XLIO statistics mechanism. Set this above zero for per-socket traffic details in the report.

monitor.report.mode accepts the following values:

Value

Behavior

auto or -1

Generate a report only when selected anomalies are detected. This is the default.

disable or 0

Never generate a report.

enable or 1

Always generate a report when the process runs normally.

JSON configuration example:

{
  "monitor": {
    "report": {
      "mode": "enable",
      "file_path": "/tmp/xlio_report_%d.txt"
    },
    "stats": {
      "fd_num": 1024
    }
  }
}

Equivalent legacy environment configuration:

XLIO_PRINT_REPORT=1 XLIO_REPORT_FILE=/tmp/xlio_report_%d.txt XLIO_STATS_FD_NUM=1024

In auto mode, XLIO generates the tuning report only when one of the following anomalies is detected:

  • Buffer allocation failures in XLIO buffer pools, such as rx_rwqe, rx_stride, tx, or zc.

  • The hardware receives packet drops.

  • TX WQE exhaustion, reported as ring_tx_dropped_wqes > 0.

When auto mode generates a report, XLIO also writes a warning-level log message with the report path:

XLIO detected performance anomalies. Diagnostic report written to: /tmp/xlio_report_<pid>.txt

Report Sections

The tuning report is written in plain text and contains the following sections:

Section

Contents

Preamble

Report format version, timestamp, PID, process duration, and report status comments.

System Context

XLIO version, kernel, NIC device information, MTU, link speed, and hugepage state.

Active Profile

The active XLIO profile, such as none, latency, ultra_latency, nginx, nginx_dpu, or nvme_bf3.

Effective Config

Non-default configuration values, their defaults, and why each value changed.

Runtime Stats

Traffic counters, errors, drops, retransmits, ring diagnostics, and buffer pool state.

Socket Summary

Socket counts, offload status, listen state, and connection information.

Performance Indicators

Derived metrics such as polling hit rate, software RX drop rate, TX retransmit rate, and hardware RX drops.

The final lines of a complete report are:

# End of XLIO Tuning Report
# Report generated successfully

If these lines are missing, the report may be incomplete or truncated. Re-run the workload and verify that the report file is complete before using it for tuning decisions.

Effective Config

The Effective Config section shows only non-default parameters. Each entry includes the effective value, the default, and the reason the value differs from the default:

core.resources.memory_limit: 8 GB
  # default: 2 GB | reason: User-configured | Memory limit for XLIO resources

The reason field can include:

Reason

Meaning

User-configured

The value was set explicitly in JSON configuration or XLIO_INLINE_CONFIG.

Profile

The value was set by an active XLIO profile.

Auto-corrected

XLIO adjusted the value to satisfy a runtime constraint.

If the report shows # All parameters at default values, no configuration values differ from the default values. If the report shows # Config registry not available, the JSON configuration registry was not available for this run, often because the process used legacy environment variables without the new JSON configuration path.

Full Detail and Fallback Detail

The tuning report can include two levels of runtime detail.

Full detail includes per-socket traffic statistics, socket-level errors, software RX drops, polling hit rate, offloaded versus non-offloaded traffic split, and listen socket statistics. Use full detail for complete performance analysis.

Fallback detail is used when per-socket statistics are not available. It contains ring-level totals only, such as ring_total_rx_packets, ring_total_tx_packets, ring_total_rx_bytes, and ring_total_tx_bytes. The report marks this case with:

# Per-socket traffic stats require monitor.stats.fd_num > 0

If fallback detail is shown, re-run with monitor.stats.fd_num set to at least the expected number of active sockets:

{
  "monitor": {
    "stats": {
      "fd_num": 1024
    }
  }
}

If the stats pool is smaller than the number of sockets, the report may include:

# Note: per-socket traffic stats cover X/Y sockets (increase monitor.stats.fd_num for full coverage)

In that case, traffic split numbers are partial. Increase monitor.stats.fd_num to at least Y and re-run.

Report Annotations

The tuning report uses inline annotations to make the output easier to interpret.

Annotation

Meaning

# WARNING:

An anomaly was detected and should be investigated.

# ERROR:

Report generation failed for part of the report. Some data may be missing.

# Note:

Contextual information. This is not necessarily a problem.

Common notes include short process duration, event-driven API behavior, partial per-socket stats coverage, and fallback detail mode.

Basic Analysis Workflow

Use the following workflow when reading a tuning report:

  1. Check the report completeness and process duration. Be cautious with very short runs, as they typically do not provide enough data for reliable throughput analysis..

  2. Read the System Context section. Note the NIC speed, MTU, kernel, XLIO version, and hugepage state.

  3. Identify the Active Profile. Profiles can intentionally override various configuration parameters.

  4. Review Effective Configuration. Pay attention to reason: User-configured, reason: Profile, and reason: Auto-corrected.

  5. Scan Runtime Stats for # WARNING annotations.

  6. Check the Socket Summary. Verify that expected sockets were created and offloaded.

  7. Review Performance Indicators. Look for low polling hit rate, software RX drops, hardware RX drops, or high TX retransmit rate.

  8. Correlate warnings before changing configuration. Multiple warnings can share the same root cause.

Example Report Excerpt

# XLIO Tuning Report
# report_format_version: 1
# PID: 43210
# Duration: 6m 12s

## System Context
  nic_device: mlx5_0  speed: 100 Gbps  MTU: 9000
  hugepages_2048kB_free: 128

## Active Profile
  profile_spec: nginx

## Effective Config (non-default only)
  network.protocols.tcp.wmem: 2 MB
    # default: 1 MB | reason: Profile | Write buffer size
  monitor.stats.fd_num: 1024
    # default: 0 | reason: User-configured | Max tracked file descriptors

## Runtime Stats
  total_tx_packets: 48712340
  total_tx_bytes: 71159817440
  tx_throughput: 57.33 Gbps
  tx_errors: 23847  # WARNING: TX errors detected
  ring_tx_dropped_wqes: 23847  # WARNING: WQE exhaustion detected
  buffer_pool_tx_alloc_failures: 0

## Socket Summary
  total_sockets: 51200
  tcp_sockets: 51200
  offloaded_sockets: 51200
  non_offloaded_sockets: 0

## Performance Indicators
  poll_hit_rate: 91.4%

# End of XLIO Tuning Report
# Report generated successfully

Example Interpretation

In this example, tx_errors exactly matches ring_tx_dropped_wqes, which indicates that TX failures are caused by send queue exhaustion. The active profile is nginx, and the Effective Config section shows that the network.protocols.tcp.wmem profile is set to 2 MB.

At high connection counts, large TCP send buffers can increase aggregate send queue demand when many connections share a transmit ring. A typical first tuning step is to reduce the network.protocols.tcp.wmem to a smaller value such as 128 KB or 256 KB and rerun the workload. If core.resources.memory_limit was explicitly lowered below its default or profile value, restore memory headroom as well. If drops persist, review performance.rings.tx.ring_elements_count and verify that TSO is enabled and active for the workload.

Configuration change example:

{
  "network": {
    "protocols": {
      "tcp": {
        "wmem": "256 KB"
      }
    }
  }
}

Sharing Reports

The report can contain hostnames, process command lines, NIC names, IP addresses, and configuration values. Review and redact sensitive information before sharing the report outside your organization.

When sharing with NVIDIA support, include:

  • The complete tuning report file.

  • The workload goal, such as expected throughput or latency.

  • The XLIO configuration file or XLIO_INLINE_CONFIG used for the run.

  • Whether the peer endpoint also uses XLIO.

  • Network context such as NIC speed, MTU, and whether there are switches are in the path.

Last updated: