Congestion Control provides performance isolation when multiple applications running on the same cluster. Additionally, it prevents congestion spreading when there is a slow receiver, reduce latency in the cluster, improves fairness, prevents parking-lot effects and packet's drop in lossy networks.
The diagram below shows an example of head of the line blocking scenario.
Head of the Line Blocking Scenario
Datacenter Congestion Control Challenges
Developing a congestion control algorithm for datacenters present the following challenges:
-
Several µ-sec of latency with hundreds of Gbps of bandwidthCongestion buildup is fast, so the congestion loop should be short
-
A wide variety of traffic types, topologies and applicationsHard to develop an algorithm that suits allCongestion Control algorithms are constantly being introduced with new congestion indications
-
Hardware implementation is not robust enough
-
Software implementation reacts too slow
ZTR-RTT CC Infrastructure
To face the challenges above, NVIDIA CC algorithm is developed on top of an infrastructure with the following characteristics:
ZTR RTTCC Infrastructure
RTT Measurement Flow
RTT Measurement Flow
ZTR RTTCC Algorithm
ZTR RTTCC Algorithm
Last updated: