This page guides network administrators and developers in fine-tuning the ZTR-RTT CC algorithm on NVIDIA network adapters. It details the specific configuration variables, hardware counters, and debug tools required for optimization.
Introduction
The Zero Touch RoCE Round-Trip Time Congestion Control (ZTR-RTT CC) algorithm utilizes hardware-based feedback loops to proactively manage network congestion. To ensure optimal performance across diverse workloads, the algorithm offers a highly configurable parameter set. This document provides a comprehensive reference for tuning ZTR-RTT CC, detailing its configuration parameters, fixed-point datatypes (such as fxpN), and the specific mlxreg commands used to apply these tunings. Additionally, it outlines the available performance counters and debugging modes necessary for monitoring congestion states and troubleshooting network flows.
ZTR-RTT CC Parameters
Datatypes
-
All parameters are saved as integers in the algorithm. Therefore, integer values need to be used for tuning.
-
Some parameters have a different datatype than integer.
-
In the parameters table we summarize the available integer values and the other datatype values.
-
-
Datatype fixed point
N(fxpN):-
A 32bits integer represents fixed point number with N bits in the fraction part
32-N bits
.
N bits
-
To cast between integer representation of
fxpNto its real value need to divide the integer by-
For example, 3932 in
fxp16represents the number
-
-
Parameter Table
|
Index |
Parameter Name |
Description |
Units |
Datatype |
Range |
Default – RoCE |
Recommended Tuning Range |
|---|---|---|---|---|---|---|---|
|
0 |
|
NIC port bandwidth |
Gb/s |
Integer |
100–800 |
Auto config by device bandwidth |
Auto config by device bandwidth |
|
1 |
|
ALPHA represents the linear connection between target RTT and
The formula is:
|
None |
|
As integer: 0–216 |
6553 |
As integer: 0–216 |
|
As fxp16: 0–1 |
0.1 |
As fxp16: 0–1 |
|||||
|
2 |
|
The maximal multiplicative decrement allowed in one update. The algorithm updates a congestion window (
For better stability the algorithm limit the rate decrement by |
None |
|
As integer: 1–216 |
63570 |
As integer: 45875–64880 |
|
As fxp16:
|
0.97 |
As fxp16: 0.7–0.99 |
|||||
|
3 |
|
The maximal multiplicative decrement allowed in one update. The algorithm updates a congestion window (
For better stability the algorithm limit the rate increment by |
None |
|
As integer: 216–220 |
69468 |
As integer: 216–218
|
|
As fxp16: 1–16 |
1.06 |
As fxp16: 1–4 |
|||||
|
4 |
|
Additive increase value in Bytes per 100Gb/s. That is, config X will give the value
|
Bytes per 100Gb/s |
Integer |
1–5000 |
9 |
5–100 |
|
5 |
|
Hyper additive increase value per 100Gb/s. That is, config X will give the value
|
Bytes per 100Gb/s |
Integer |
1–5000 |
300 |
200–2000 |
|
6 |
|
After this period without any decrement, moving from additive increase to hyper additive increase |
nsec |
Integer |
1– |
7000000 (7ms) |
100µs–20ms |
|
7 |
|
The minimal value of |
nsec |
Integer |
1– |
15000 (15µs) |
2µs–15µs |
|
8 |
|
When RTT is above this value |
nsec |
Integer |
1–220 |
250000 (250µs) |
30µs–1ms |
|
9 |
|
At the first time RTT passed |
None |
|
As integer: 1–220 |
65536 |
As integer: 10485–220 |
|
As fxp20:
|
0.0625 |
As fxp20: 0.01–1 |
|||||
|
10 |
|
Use only RTT as congestion indication |
None |
Boolean |
0,1 |
0 |
0 |
|
11 |
|
CNP to validate RTT. When this parameter is set we ignore RTT measurement if there was no CNP during it. |
None |
Boolean |
0,1 |
0 |
0,1 |
|
12 |
|
When set, the rate in TX events based on the delay measured until this time. |
None |
Boolean |
0,1 |
1 |
0,1 |
|
13 |
|
Available only in debug version of the algorithm. Disable the algorithm rate updates and set a fix rate. When value is 0 the algorithm works as usual. |
None |
|
As integer: 1–223 |
0 |
0 |
|
As fxp20:
|
0 |
0 |
|||||
|
14 |
|
The maximal rate of the NIC scheduler. Can be more than line rate to improve tx pipelining. |
None |
|
As integer: 1–223 |
2097152 |
As integer: 220–222 |
|
As fxp20:
|
2 |
As fxp20: 1 to 4 |
|||||
|
15 |
|
The value 1 is available only with Evaluate the congestion state by comparing the RTT to the minimum measured RTT. |
None |
Boolean |
0,1 |
0 |
0,1 |
|
16 |
|
Param to enable advance feature in the algorithm that was not fully tested. |
None |
Boolean |
0,1 |
0 |
0,1 |
Parameter Tunning Command
mlxreg -d <dev> -y --set "cmd_type=8,value=<parameter value>" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=<parameter index>"
ZTR_RTT CC Counters
|
Index |
Name |
Description |
|---|---|---|
|
0 |
|
Number of CNPs handled by the algorithm. Active only if |
|
1 |
|
Number of NACKs handled by the algorithm. |
|
2 |
|
Number of additive increments. |
|
3 |
|
Number of hyper additive increments. |
|
4 |
|
Number of decrements. |
|
5 |
|
Number of hyper decrements. |
|
6 |
|
Number of decrements in TX event. Active only if |
|
7 |
|
Maximal RTT measured by the algorithm. |
|
8 |
|
Minimal RTT measured by the algorithm. |
|
9 |
|
Sum of RTTs measured by the algorithm. |
|
10 |
|
Number of RTTs measured by the algorithm. With |
|
11 |
|
Number of time RTT measurement was not validate by CNP. Active only when |
|
12 |
|
The maximal output rate determined by the algorithm. |
|
13 |
|
The minimal output rate determined by the algorithm. |
|
14 |
|
Number of times the algorithm detected global minimum RTT. |
|
15 |
|
Number of RTT probe timeouts. |
Debug Tools
Debug Mode
ZTR RTTCC algorithm has two working modes:
-
Default in
algo_slot 0. -
Debug in
algo_slot 1.
Counters are available only in debug mode.
Moving to debug mode requires the following steps:
-
Disable deployment mode in
algo_slot 0. -
Enable debug mode in
alg_slot 1and enable counters.
Example:
-
Disable deployment mode:
sudo mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=2" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=0,algo_param_index=0" -
Enable debug mode and counters:
sudo mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=1,counter_en=1" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0"
Query Counters
Example:
-
Reset counters:
sudo mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=13" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0" -
Query counters:
mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=12" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0"
In the output text[i] will indicate the value of counter number i.
Last updated: