ZTR-RTT Congestion Control Algorithm

Congestion Control

Congestion Control provides performance isolation when multiple applications running on the same cluster. Additionally, it prevents congestion spreading when there is a slow receiver, reduce latency in the cluster, improves fairness, prevents parking-lot effects and packet's drop in lossy networks.

The diagram below shows an example of head of the line blocking scenario. 

Head of the Line Blocking Scenario

image-2024-9-18_13-4-10.png

Datacenter Congestion Control Challenges

Developing a congestion control algorithm for datacenters present the following challenges:

  • Several µ-sec of latency with hundreds of Gbps of bandwidthCongestion buildup is fast, so the congestion loop should be short 

  • A wide variety of traffic types, topologies and applicationsHard to develop an algorithm that suits allCongestion Control algorithms are constantly being introduced with new congestion indications

  • Hardware implementation is not robust enough

  • Software implementation reacts too slow

ZTR-RTT CC Infrastructure

To face the challenges above, NVIDIA CC algorithm is developed on top of an infrastructure with the following characteristics:

ZTR RTTCC Infrastructure

image-2024-9-18_13-9-43.png

RTT Measurement Flow

RTT Measurement Flow

image-2024-9-18_13-11-54.png

ZTR RTTCC Algorithm

ZTR RTTCC Algorithm

image-2024-9-25_17-20-17.png


Last updated: