DOCA SDK Documentation

ZTR-RTT CC Parameter and Counter Configuration

This page guides network administrators and developers in fine-tuning the ZTR-RTT CC algorithm on NVIDIA network adapters. It details the specific configuration variables, hardware counters, and debug tools required for optimization.

Introduction

The Zero Touch RoCE Round-Trip Time Congestion Control (ZTR-RTT CC) algorithm utilizes hardware-based feedback loops to proactively manage network congestion. To ensure optimal performance across diverse workloads, the algorithm offers a highly configurable parameter set. This document provides a comprehensive reference for tuning ZTR-RTT CC, detailing its configuration parameters, fixed-point datatypes (such as fxpN), and the specific mlxreg commands used to apply these tunings. Additionally, it outlines the available performance counters and debugging modes necessary for monitoring congestion states and troubleshooting network flows.

ZTR-RTT CC Parameters

Datatypes

  • All parameters are saved as integers in the algorithm. Therefore, integer values need to be used for tuning.

  • Some parameters have a different datatype than integer.

    • In the parameters table we summarize the available integer values and the other datatype values.

  • Datatype fixed point N (fxpN):

    • A 32bits integer represents fixed point number with N bits in the fraction part

      32-N bits

      .

      N bits 

    • To cast between integer representation of fxpN to its real value need to divide the integer by

      • For example, 3932 in fxp16 represents the number

Parameter Table

Index

Parameter Name

Description

Units

Datatype

Range

Default – RoCE

Recommended Tuning Range

0

BW_G

NIC port bandwidth

Gb/s

Integer

100–800

Auto config by device bandwidth

Auto config by device bandwidth

1

ALPHA

ZTR_RTTCC evaluate the congestion state by comparing the RTT to target RTT which is .

ALPHA represents the linear connection between target RTT and   .

The formula is:

None

fxp16

As integer: 0–216

6553

As integer: 0–216

As fxp16: 0–1

0.1

As fxp16: 0–1

2

MAX_DEC

The maximal multiplicative decrement allowed in one update.

The algorithm updates a congestion window (cwnd) in bytes and converges it to rate by the formula:

For better stability the algorithm limit the rate decrement by MAX_DEC.

None

fxp16

As integer: 1–216

63570 

As integer: 45875–64880

As fxp16: –1

0.97

As fxp16: 0.7–0.99

3

MAX_INC

The maximal multiplicative decrement allowed in one update.

The algorithm updates a congestion window (cwnd) in bytes and converges it to rate by the formula:

For better stability the algorithm limit the rate increment by MAX_INC.

None

fxp16

As integer: 216–220

69468 

As integer: 216–218


As fxp16: 1–16

1.06

As fxp16: 1–4

4

AI

Additive increase value in Bytes per 100Gb/s.

That is, config X will give the value .

Bytes per 100Gb/s

Integer

1–5000 

9

5–100

5

HAI

Hyper additive increase value per 100Gb/s.

That is, config X will give the value .

Bytes per 100Gb/s

Integer

1–5000

300

200–2000

6

HAI_PERIOD_NS

After this period without any decrement, moving from additive increase to hyper additive increase

nsec

Integer

1–UINT32_MAX

7000000 (7ms)

100µs–20ms

7

CONGESTION_DELAY_THRESHOLD

ZTR_RTTCC will react only when the MIN RTT is above this value.

The minimal value of targetRTT is CONGESTION_DELAY_THRESHOLD. See (3.4.0) ZTR-RTT CC Parameter and Counter Configuration#ALPHA.

nsec

Integer

1–UINT32_MAX

15000 (15µs)

2µs–15µs

8

MAX_DELAY

When RTT is above this value ZTR_RTTCC will react more aggressively.

nsec

Integer

1–220

250000 (250µs)

30µs–1ms

9

RATE_ON_FIRST_CONGESTION

At the first time RTT passed MAX_DELAY, ZTR_RTTCC will set the rate to this value.

None

fxp20

As integer: 1–220

65536 

As integer: 10485–220

As fxp20: –1

0.0625

As fxp20: 0.01–1

10

DELAY_ONLY

Use only RTT as congestion indication

None

Boolean

0,1

0

0

11

CNP_VLD_RTT

CNP to validate RTT.

When this parameter is set we ignore RTT measurement if there was no CNP during it.

None

Boolean

0,1

0

0,1

12

TX_DEC

When set, ZTR_RTTCC does not wait for RTT measurement to end.

the rate in TX events based on the delay measured until this time.  

None

Boolean

0,1

1

0,1

13

FIXED_RATE

Available only in debug version of the algorithm.

Disable the algorithm rate updates and set a fix rate.

When value is 0 the algorithm works as usual.

None

fxp20

As integer: 1–223

0

0

As fxp20: –8

0

0

14

FAST_SCHED

The maximal rate of the NIC scheduler.

Can be more than line rate to improve tx pipelining.

None

fxp20

As integer: 1–223

2097152

As integer: 220–222

As fxp20:  –8

2

As fxp20: 1 to 4

15

TOPOLOGY_AWARE

The value 1 is available only with ADVANCED_FEATURES_EN=1.

Evaluate the congestion state by comparing the RTT to the minimum measured RTT.

None

Boolean

0,1

0

0,1

16

ADVANCED_FEATURES_EN

Param to enable advance feature in the algorithm that was not fully tested.

None

Boolean

0,1

0

0,1

Parameter Tunning Command

mlxreg -d <dev> -y --set "cmd_type=8,value=<parameter value>" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=<parameter index>"

ZTR_RTT CC Counters

Index

Name

Description

0

ZTR_CC_CNP_HANDLE_COUNTER

Number of CNPs handled by the algorithm.

Active only if CNP_DEC or CNP_VLD_RTT parameters are set.

1

ZTR_CC_NACK_HANDLE_COUNTER

Number of NACKs handled by the algorithm.

2

ZTR_CC_AI_INC_COUNTER

Number of additive increments.

3

ZTR_CC_HAI_INC_COUNTER

Number of hyper additive increments.

4

ZTR_CC_DEC_COUNTER

Number of decrements.

5

ZTR_CC_HYPER_DEC_COUNTER

Number of hyper decrements.

6

ZTR_CC_TX_DEC_COUNTER

Number of decrements in TX event.

Active only if TX_DEC parameter is set.

7

ZTR_CC_MAX_RTT

Maximal RTT measured by the algorithm.

8

ZTR_CC_MIN_RTT 

Minimal RTT measured by the algorithm.

9

ZTR_CC_SUM_RTT

Sum of RTTs measured by the algorithm.

10

ZTR_CC_NUM_RTT

Number of RTTs measured by the algorithm.

With  ZTR_CC_SUM_RTT and ZTR_CC_NUM_RTT, the average RTT can be calculated.

11

ZTR_CC_NOT_VLD_RTT_COUNTER

Number of time RTT measurement was not validate by CNP.

Active only when CNP_VLD_RTT parameter is set.

12

ZTR_CC_MAX_RATE

The maximal output rate determined by the algorithm.

13

ZTR_CC_MIN_RATE

The minimal output rate determined by the algorithm.

14

ZTR_CC_EMPTY_SYS_RTT_COUNTER

Number of times the algorithm detected global minimum RTT.

15

ZTR_CC_RTT_TIMEOUT_COUNTER

Number of RTT probe timeouts.

Debug Tools

Debug Mode

ZTR RTTCC algorithm has two working modes:

  • Default in algo_slot 0.

  • Debug in algo_slot 1.

Counters are available only in debug mode.

Moving to debug mode requires the following steps:

  1. Disable deployment mode in algo_slot 0.

  2. Enable debug mode in alg_slot 1 and enable counters. 

Example:

  • Disable deployment mode:

    sudo mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=2" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=0,algo_param_index=0"
    
  • Enable debug mode and counters:

    sudo mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=1,counter_en=1" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0"
    

Query Counters 

Example:

  • Reset counters:

    sudo mlxreg -d /dev/mst/mt4129_pciconf0  -y --set "cmd_type=13" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0"
    
  • Query counters:

     mlxreg -d /dev/mst/mt4129_pciconf0 -y --set "cmd_type=12" --reg_name PPCC --indexes "local_port=1,pnat=0,lp_msb=0,algo_slot=1,algo_param_index=0"
    

In the output text[i] will indicate the value of counter number i.

Last updated: