The XLIO Tuning Report is a post-run diagnostic report generated when an XLIO-enabled process exits. It summarizes the runtime environment, active XLIO profile, effective configuration, traffic counters, socket state, and derived performance indicators. The report also annotates common anomalies with # WARNING comments so that configuration or system-level problems can be identified quickly.
Use the tuning report when:
-
A workload has lower throughput or higher latency than expected.
-
XLIO logs report buffer allocation failures, hardware receive drops, transmit Work Queue Element (WQE) exhaustion, or other resource pressure.
-
You need to verify which JSON configuration values, profiles, or auto-corrections were active during a run.
-
You need a compact diagnostic artifact to share with NVIDIA support.
Enabling the Tuning Report
The report is controlled by monitor.report.mode, which maps to the legacy XLIO_PRINT_REPORT environment variable.
|
JSON Configuration Value |
Legacy Environment Variable |
Default |
Description |
|---|---|---|---|
|
|
|
|
Controls whether the tuning report is generated at process exit. |
|
|
|
|
Output path for the report. |
|
|
|
|
Maximum number of sockets monitored by the XLIO statistics mechanism. Set this above zero for per-socket traffic details in the report. |
monitor.report.mode accepts the following values:
|
Value |
Behavior |
|---|---|
|
|
Generate a report only when selected anomalies are detected. This is the default. |
|
|
Never generate a report. |
|
|
Always generate a report when the process runs normally. |
JSON configuration example:
{
"monitor": {
"report": {
"mode": "enable",
"file_path": "/tmp/xlio_report_%d.txt"
},
"stats": {
"fd_num": 1024
}
}
}
Equivalent legacy environment configuration:
XLIO_PRINT_REPORT=1 XLIO_REPORT_FILE=/tmp/xlio_report_%d.txt XLIO_STATS_FD_NUM=1024
In auto mode, XLIO generates the tuning report only when one of the following anomalies is detected:
-
Buffer allocation failures in XLIO buffer pools, such as
rx_rwqe,rx_stride,tx, orzc. -
The hardware receives packet drops.
-
TX WQE exhaustion, reported as
ring_tx_dropped_wqes > 0.
When auto mode generates a report, XLIO also writes a warning-level log message with the report path:
XLIO detected performance anomalies. Diagnostic report written to: /tmp/xlio_report_<pid>.txt
Report Sections
The tuning report is written in plain text and contains the following sections:
|
Section |
Contents |
|---|---|
|
Preamble |
Report format version, timestamp, PID, process duration, and report status comments. |
|
System Context |
XLIO version, kernel, NIC device information, MTU, link speed, and hugepage state. |
|
Active Profile |
The active XLIO profile, such as |
|
Effective Config |
Non-default configuration values, their defaults, and why each value changed. |
|
Runtime Stats |
Traffic counters, errors, drops, retransmits, ring diagnostics, and buffer pool state. |
|
Socket Summary |
Socket counts, offload status, listen state, and connection information. |
|
Performance Indicators |
Derived metrics such as polling hit rate, software RX drop rate, TX retransmit rate, and hardware RX drops. |
The final lines of a complete report are:
# End of XLIO Tuning Report
# Report generated successfully
If these lines are missing, the report may be incomplete or truncated. Re-run the workload and verify that the report file is complete before using it for tuning decisions.
Effective Config
The Effective Config section shows only non-default parameters. Each entry includes the effective value, the default, and the reason the value differs from the default:
core.resources.memory_limit: 8 GB
# default: 2 GB | reason: User-configured | Memory limit for XLIO resources
The reason field can include:
|
Reason |
Meaning |
|---|---|
|
|
The value was set explicitly in JSON configuration or |
|
|
The value was set by an active XLIO profile. |
|
|
XLIO adjusted the value to satisfy a runtime constraint. |
If the report shows # All parameters at default values, no configuration values differ from the default values. If the report shows # Config registry not available, the JSON configuration registry was not available for this run, often because the process used legacy environment variables without the new JSON configuration path.
Full Detail and Fallback Detail
The tuning report can include two levels of runtime detail.
Full detail includes per-socket traffic statistics, socket-level errors, software RX drops, polling hit rate, offloaded versus non-offloaded traffic split, and listen socket statistics. Use full detail for complete performance analysis.
Fallback detail is used when per-socket statistics are not available. It contains ring-level totals only, such as ring_total_rx_packets, ring_total_tx_packets, ring_total_rx_bytes, and ring_total_tx_bytes. The report marks this case with:
# Per-socket traffic stats require monitor.stats.fd_num > 0
If fallback detail is shown, re-run with monitor.stats.fd_num set to at least the expected number of active sockets:
{
"monitor": {
"stats": {
"fd_num": 1024
}
}
}
If the stats pool is smaller than the number of sockets, the report may include:
# Note: per-socket traffic stats cover X/Y sockets (increase monitor.stats.fd_num for full coverage)
In that case, traffic split numbers are partial. Increase monitor.stats.fd_num to at least Y and re-run.
Report Annotations
The tuning report uses inline annotations to make the output easier to interpret.
|
Annotation |
Meaning |
|---|---|
|
|
An anomaly was detected and should be investigated. |
|
|
Report generation failed for part of the report. Some data may be missing. |
|
|
Contextual information. This is not necessarily a problem. |
Common notes include short process duration, event-driven API behavior, partial per-socket stats coverage, and fallback detail mode.
Basic Analysis Workflow
Use the following workflow when reading a tuning report:
-
Check the report completeness and process duration. Be cautious with very short runs, as they typically do not provide enough data for reliable throughput analysis..
-
Read the System Context section. Note the NIC speed, MTU, kernel, XLIO version, and hugepage state.
-
Identify the Active Profile. Profiles can intentionally override various configuration parameters.
-
Review Effective Configuration. Pay attention to
reason: User-configured,reason: Profile, andreason: Auto-corrected. -
Scan Runtime Stats for
# WARNINGannotations. -
Check the Socket Summary. Verify that expected sockets were created and offloaded.
-
Review Performance Indicators. Look for low polling hit rate, software RX drops, hardware RX drops, or high TX retransmit rate.
-
Correlate warnings before changing configuration. Multiple warnings can share the same root cause.
Example Report Excerpt
# XLIO Tuning Report
# report_format_version: 1
# PID: 43210
# Duration: 6m 12s
## System Context
nic_device: mlx5_0 speed: 100 Gbps MTU: 9000
hugepages_2048kB_free: 128
## Active Profile
profile_spec: nginx
## Effective Config (non-default only)
network.protocols.tcp.wmem: 2 MB
# default: 1 MB | reason: Profile | Write buffer size
monitor.stats.fd_num: 1024
# default: 0 | reason: User-configured | Max tracked file descriptors
## Runtime Stats
total_tx_packets: 48712340
total_tx_bytes: 71159817440
tx_throughput: 57.33 Gbps
tx_errors: 23847 # WARNING: TX errors detected
ring_tx_dropped_wqes: 23847 # WARNING: WQE exhaustion detected
buffer_pool_tx_alloc_failures: 0
## Socket Summary
total_sockets: 51200
tcp_sockets: 51200
offloaded_sockets: 51200
non_offloaded_sockets: 0
## Performance Indicators
poll_hit_rate: 91.4%
# End of XLIO Tuning Report
# Report generated successfully
Example Interpretation
In this example, tx_errors exactly matches ring_tx_dropped_wqes, which indicates that TX failures are caused by send queue exhaustion. The active profile is nginx, and the Effective Config section shows that the network.protocols.tcp.wmem profile is set to 2 MB.
At high connection counts, large TCP send buffers can increase aggregate send queue demand when many connections share a transmit ring. A typical first tuning step is to reduce the network.protocols.tcp.wmem to a smaller value such as 128 KB or 256 KB and rerun the workload. If core.resources.memory_limit was explicitly lowered below its default or profile value, restore memory headroom as well. If drops persist, review performance.rings.tx.ring_elements_count and verify that TSO is enabled and active for the workload.
Configuration change example:
{
"network": {
"protocols": {
"tcp": {
"wmem": "256 KB"
}
}
}
}
Sharing Reports
The report can contain hostnames, process command lines, NIC names, IP addresses, and configuration values. Review and redact sensitive information before sharing the report outside your organization.
When sharing with NVIDIA support, include:
-
The complete tuning report file.
-
The workload goal, such as expected throughput or latency.
-
The XLIO configuration file or
XLIO_INLINE_CONFIGused for the run. -
Whether the peer endpoint also uses XLIO.
-
Network context such as NIC speed, MTU, and whether there are switches are in the path.
Last updated: