Introduction
The Data Center Quantized Congestion Notification (DCQCN) algorithm operates in a closed-loop system consisting of three primary roles:
-
The Congestion Point (CP)
-
The Notification Point (NP)
-
The Reaction Point (RP)
Each NIC can have multiple flows transmitting to a CP, and each flow must react to congestion independently.
Congestion Point – The Switch
The CP is typically the network switch. It monitors its egress queue length and marks passing packets to signal congestion.
-
ECN Marking: The CP marks the Explicit Congestion Notification (ECN) bits in the packet header based on the egress queue size.
-
Marking Probability: As the queue size grows past a minimum threshold (
Kmin), the probability of marking a packet increases linearly until it reaches a maximum probability (Pmax) at the full mark threshold (Kmax).
-
Compatibility: This is a standard functionality supported by commodity Ethernet switches and is also utilized for DCTCP.
Notification Point – The Receiver
The NP is the receiving NIC. Its job is to alert the sender that congestion was encountered along the path.
-
CNP Generation: When an ECN-marked packet arrives at the NP, a Congestion Notification Packet (CNP) is generated and sent back to the sender.
-
Hardware Acceleration: This CNP generation is implemented directly in the NIC's hardware, providing a highly fast response.
-
Flow Identification: The CNP specifically identifies the flow (Queue Pair) experiencing congestion.
-
QoS Delivery: The CNP can be delivered via a guaranteed low-latency path to ensure the sender is notified as quickly as possible.
Reaction Point – The Sender
The RP is the sending NIC. It uses the incoming CNPs (or lack thereof) to dynamically calculate and adjust its transmission rate. The RP manages its rate through three mechanisms: Alpha Update, Rate Decrease, and Rate Increase.
Alpha (α) Update
Alpha is a weighting factor that determines how aggressively the sending rate should be modified. It is updated every DCE_TCP_RTT microseconds.
-
Incrementing Alpha: If a CNP was received during the update period, alpha is incremented using the following formula
.
-
Decrementing Alpha: If no CNP was received during the update period, alpha is decremented
.
-
The
Parameter:
controls the aggressiveness of alpha's changes. The lower
is, the more aggressive the changes will be.
Rate Decrease Logic
The RP monitors incoming CNPs over a defined window called RATE_REDUCE_MONITOR_PERIOD (in microseconds). After this time elapses, if CNPs were detected, the rate is cut.
-
Remember Target Rate: The algorithm saves the current rate as the target rate recovery point (
). If
CLAMP_TARGET_RATEis enabled, the algorithm only remembers this on the first decrement. -
Decrease Current Rate: The rate is mathematically reduced using the current alpha value and the
RPG_GDparameter: -
Reset Counters: After a rate decrease, all increment timers and counters (Time, Byte Counter, etc.) are immediately reset.
Rate Increase Logic
If no CNPs are received, the RP will systematically increase its transmission rate based on two primary triggers:
-
Timer: Rate increments after
TIME_RESETmicroseconds. -
Byte Counter: Rate increments after
BYTE_RESETbytes are transmitted.
The rate increase happens across three distinct stages:
Fast Recovery
This is the initial stage of rate recovery.
Additive Increase (AI)
This stage begins after RPG_THRESHOLD consecutive time resets or RPG_THRESHOLD consecutive byte resets.
-
The Target Rate is increased by a fixed Additive Increase (AI) value:
-
The Current Rate is then recalculated:
Hyper Increase (HAI)
This aggressive stage begins only after RPG_THRESHOLD consecutive time resets and RPG_THRESHOLD consecutive byte resets.
-
The Target Rate is increased by a fixed Hyper Additive Increase (HAI) value:
-
The Current Rate is then recalculated:
Last updated: