NVIDIA FreeBSD for ConnectX-4 and above Adapter Cards

Quality of Service

Quality of Service (QoS) is a mechanism of assigning a priority to a network flow and manage its guarantees, limitations and its priority over other flows. This is accomplished by mapping the User Priority (UP) to a hardware Traffic Class (TC). TC is assigned with the QoS attributes and the different flows behave accordingly.

Packet Pacing and Quality of Service (QoS) features do not co-exist.

Procedure_Heading_Icon.PNG To be able to work with QoS, make sure to disable Packet Pacing in firmware:

  1. Create a file with the following content. 

    # vim /tmp/disable_packet_pacing.txt
    MLNX_RAW_TLV_FILE
    0x00000004 0x0000010c 0x00000000 0x00000000
    


  2. Update firmware configuration to disable Packing Pacing. 

    mlxconfig -d pci0:<x>:0:0 -f /tmp/disable_packet_pacing.txt set_raw
    



  3. Reset the firmware. 

    mlxfwreset -d pci0:<x>:0:0 reset
    



Priority Code Point (PCP)

PCP is used as a means for classifying and managing network traffic, and providing QoS in Layer 2 Ethernet networks. It uses the 3-bit PCP field in the VLAN header for the purpose of packet classification.

Procedure_Heading_Icon.PNG To create a VLAN interface and assign the desired priority to it:

# ifconfig mce<N>.<vlan> create
# ifconfig mce<N>.<vlan> vlanpcp <prio>

VLAN 0 Priority Tagging

The VLAN 0 Priority Tagging feature enables 802.1Q Ethernet frames to be transmitted with VLAN ID set to zero.
Setting the VLAN ID tag to zero allows its tag to be ignored, and the Ethernet frame to be processed according to the priority configured in the 802.1P bits of the 802.1Q Ethernet frame header.

Procedure_Heading_Icon.PNG To enable VLAN 0 priority tagging on a specific interface: 

# ifconfig mce<N> pcp <prio>

Procedure_Heading_Icon.PNG To disable VLAN 0 priority tagging on a specific interface:

# ifconfig mce<N> -pcp


Switch port must be configured to accept VLAN 0 priority tagged packets. Otherwise, these packets may be dropped.


Differentiated Service Code Point (DSCP)

Differentiated services or DiffServ is a computer networking architecture that specifies a simple and scalable mechanism for classifying and managing network traffic and providing quality of service (QoS) on IP networks.
DiffServ uses a 6-bit DSCP in the 8-bit DS field in the IP header for packet classification purposes. The DS field replaces the outdated IPv4 TOS field.

Trust State

Trust state enables prioritizing sent/received packets based on packet fields.

The default trust state is PCP. Ethernet packets are prioritized based on the value of the field (PCP/DSCP/BOTH).

Procedure_Heading_Icon.PNG To configure Trust State, use the following sysctl node: 

# sysctl -d dev.mce.<N>.conf.qos.trust_state
dev.mce.<N>.conf.qos.trust_state: Set trust state, 1:PCP 2:DSCP 3:BOTH


QoS with RDMA

RDMA application is responsible for setting QoS values.

  • In RDMA CM mode, QoS is set in the rdma_id_private struct in the tos field.
    Incoming RDMA CM connections always take precedence setting the current priority.

  • In non-RDMA CM mode, priority values are set using a modify_qp command with ibv_qp_attr parameter. IPv4 type of service (“ToS”) and IPv6 traffic class are set using the attr.ah_attr.grh.traffic_class field. VLAN PCP is set using the attr.ah_attr.sl field.

Mapping User Priority to Traffic Class

This feature allows users to map a specific User Priority (UP) to a specific TC.

Note that this configuration is permanent and will not be reset to default unless manually changed.

Example

Procedure_Heading_Icon.PNG To map UP 5 to TC 4 on device mce0: 


# sysctl dev.mce.0.conf.qos.prio_0_7_tc=1,0,2,3,4,4,6,7
dev.mce.0.conf.qos.prio_0_7_tc: 1 0 2 3 4 5 6 7 -> 1 0 2 3 4 4 6 7

Note: By default, UP 0 is mapped to TC 1, and UP 1 is mapped to TC 0: 

# sysctl dev.mce.0.conf.qos.prio_0_7_tc
dev.mce.0.conf.qos.prio_0_7_tc: 1 0 2 3 4 5 6 7


Mapping DSCP to Priority Mapping

Each DSCP value can be mapped to a priority using the following sysctl nodes: 

dev.mce.<N>.conf.qos.dscp_56_63_prio: 7 7 7 7 7 7 7 7
dev.mce.<N>.conf.qos.dscp_48_55_prio: 6 6 6 6 6 6 6 6
dev.mce.<N>.conf.qos.dscp_40_47_prio: 5 5 5 5 5 5 5 5
dev.mce.<N>.conf.qos.dscp_32_39_prio: 4 4 4 4 4 4 4 4
dev.mce.<N>.conf.qos.dscp_24_31_prio: 3 3 3 3 3 3 3 3
dev.mce.<N>.conf.qos.dscp_16_23_prio: 2 2 2 2 2 2 2 2
dev.mce.<N>.conf.qos.dscp_8_15_prio: 1 1 1 1 1 1 1 1
dev.mce.<N>.conf.qos.dscp_0_7_prio: 0 0 0 0 0 0 0 0

Example: 

# sysctl dev.mce.0.conf.qos.dscp_0_7_prio=1,1,1,1,1,1,1,1
dev.mce.0.conf.qos.dscp_0_7_prio: 0 0 0 0 0 0 0 0 -> 1 1 1 1 1 1 1 1


Maximum Rate Limiting

This feature allows users to rate limit a specific TC. Rate limit defines a maximum bandwidth allowed for a TC. Please note that 10% deviation from the requested values is considered acceptable. 

Note that instead of setting the maximum rate for a single priority, you should pass the maximum rates for all relevant priorities as a single input.

Notes:

  • This configuration is permanent and will not be set to default unless manually changed.

  • Rate is specified in kilobits, where kilo=1000.

  • Rate must be divisible by 100,000, meaning that values must be in 100Mbs units.

  • Examples for valid values:200000 - 200Mbs1000000 - 1Gbs3400000 - 3.4Gbs

  • 0 value = unlimited rate

Example:

Procedure_Heading_Icon.PNG To “rate limit” TC 4 on device mce1 to 2.4Gbits: 

# sysctl dev.mce.0.conf.qos.tc_max_rate=0,0,0,0,2400000,0,0,0
dev.mce.0.conf.qos.tc_max_rate: 0 0 0 0 0 0 0 0 -> 0 0 0 0 2400000 0 0 0


Enhanced Transmission Selection (ETS)

To be able to fully utilize this feature, make sure Priority Flow Control (PFC) feature is enabled.

Enhanced Transmission Selection standard (ETS) exploits the time periods in which the offered load of a particular Traffic Class (TC) is less than its minimum allocated bandwidth by allowing the difference to be available to other traffic classes.

After servicing the strict priority TCs, the amount of bandwidth (BW) left on the wire may be split among other TCs according to a minimal guarantee policy.

If, for instance, TC0 is set to 80% guarantee and TC1 to 20% (the TCs sum must be 100), then the BW left after servicing all strict priority TCs will be split according to this ratio.

Since this is a minimal guarantee, there is no maximum enforcement. This means, in the same example, that if TC1 did not use its share of 20%, the reminder will be used by TC0.

Example

sysctl dev.mce.0.conf.qos.tc_rate_share=20,10,10,10,10,10,10,20

In this example, Priority 7 and Priority 0 are guaranteed for 20% of the bandwidth, and all the rest are guaranteed for 10% of the bandwidth.


Priority Flow Control (PFC) Hardware Buffer Configuration

Hardware buffers configuration can be tuned for priority flow control (PFC).

Parameter

Description

dev.mce.X.conf.qos.buffers_size

This parameter is used to set the buffer size.
The hardware allows to configure up to eight buffers sizes. The total sum of all buffers must not exceed the hardware memory size. The limitation is enforced automatically. Sysctl allows to set each buffer size. Buffer space exhaustion causes the adapter card to send xoff to the other side of the link.

dev.mce.X.conf.qos.buffers_prio

This parameter shows the mapping between priority to buffer.
Maps buffer index into the hardware-defined priority.
Note that the priority is the internal number after translation from the external QoS parameters.

dev.mce.X.conf.qos.cable_length

For more precise determination of the moment when xoff should be issued, users may specify the cable length in meters to calculate the signal propagation delay.



Last updated: