NVIDIA FreeBSD for ConnectX-4 and above Adapter Cards

Packet Pacing

This feature is supported in firmware v12.17.1016 and above.

Packet pacing, also known as “rate limit,” defines a maximum bandwidth allowed for a TCP connection. Limitation is done by hardware where each QP (transmit queue) has a rate limit value from which it calculates the delay between each packet sent.

Procedure_Heading_Icon.PNG To enable Packet Pacing in firmware

  1. Create a file with the following content. 

    # vim /tmp/enable_packet_pacing.txt
    MLNX_RAW_TLV_FILE
    0x00000004 0x0000010c 0x00000000 0x00000001
    
  2. Update firmware configuration to enable Packing Pacing: 

    mlxconfig -d pci0:<x>:0:0 -f /tmp/enable_packet_pacing.txt set_raw 
    
  3. Reset the firmware. 

    mlxfwreset -d pci0:<x>:0:0 reset
    

Packet Pacing and Quality of Service (QoS) features do not co-exist.


Setting Rates for Packet Pacing

Rates that are being used with packet pacing must be defined in advance.

New Rates Configuration.01`00000

  • Newly configured rates must be within a certain range, determined by the firmware, and they can be read through sysctl.For a minimum value, run:  sysctl dev.mce.<N>.rate_limit.tx_limit_min For a maximum value, run:  sysctl dev.mce.<N>.rate_limit.tx_limit_max

  • The number of configured rates is also determined by the firmware. In order to check how many rates can be defined, run: 

    sysctl dev.mce.<N>.rate_limit.tx_rates_max
    
  • To add a new rate: 

     sysctl dev.mce.<N>.rate_limit.tx_limit_add=800000
    

    This will add the defined rate to the next available index. If all rates were already defined with an index, the new rate will not be added. 

    Rates are determined and then saved in bits per second.
    Rates requested for a new socket are added in bytes per second.

  • To remove a rate limit, run: 

    sysctl dev.mce.<N>.rate_limit.tx_limit_clr=80000
    

Deviation: The user can specify a maximum deviation of the rate via sysctl. If the rate limit table cannot satisfy the requirement, rate limiting will be disabled.

  • For minimum value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_min
    
  • For maximum value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_max
    
  • For changing the deviation value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation=10000
    
  • For reading the current deviation value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation
    



Limitation: Rate values must be multiples of 1000.

Burst size is determined by the hardware, and can be configured via sysctl:

  • For a minimum value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_burst_size_min
    
  • For a maximum value, run: 

    sysctl dev.mce.<N>.rate_limit.tx_burst_size_max
    
  • For changing burst level, run: 

    sysctl dev.mce.<N>.rate_limit.tx_burst_size=150
    
  • To read which burst level was defined, run: 

    sysctl dev.mce.<N>.rate_limit.tx_burst_size
    
  • For displaying the packet pacing configuration, run: 

    sysctl dev.mce.<N>.rate_limit.tx_rate_show
    	ENTRY	BURST	RATE [bit/s]
    	------------------------------------
    	  0	150	800000
    	  1	150	40000
    	  2	150	1000000
    	  3	150	25000000000
    
    	ENTRY	BURST	RATE [bit/s]
    	------------------------------------
    	  0	3	800000
    	  1	3	40000
    	  2	3	1000000
    	  3	3	25000000000
    

    where: 

    Entry

    Rate limit table entry

    Burst

    Burst size configured for rate limit traffic

    Rate

    Rate configured for the relevant index

All rates are shown in bits per second.


Using Packet Pacing Sockets


1. Create a rate-limited socket according to the desired rate using the setsockopt() interface based on the previous section: 


setsockopt(s, SOL_SOCKET, SO_MAX_PACING_RATE, pacing_rate, sizeof(pacing_rate))

SO_MAX_PACING_RATE

Marks the socket as a rate limited socket

pacing_rate

Defined rate in bytes/sec. The type is unsigned int.
Note: The same value entered via sysctl in bytes instead of bits.

  • A rate-limited ring corresponding to the requested rate will be created and associated to the relevant socket.

  • Rate-limited traffic will be transmitted when data is sent via the socket.

2. Modify the rate-limited value using the same socket.

3. Destroy the relevant ring upon TCP socket completion.


Error Detection

Detecting failures can be done using the getsockopt() interface to query a specific socket.

Feature Characteristics

  • MLNX_OFED for FreeBSD supports up to 100,000 rate limited TCP connections.

  • Each TCP connection is mapped to a specific SQ

Limitations

  • Max rate limited rings is 100,000

  • Min rate: 1 Kbps

  • Max rate: 100 Gbps 

    #> sysctl -a | grep rate_limit
    sysctl dev.mce.<N>.rate_limit.tx_limit_min: 1000
    sysctl dev.mce.<N>.rate_limit.tx_limit_max: 100000000000
    

Performance Tuning

The following settings are recommended for a large number of connections to reduce the amount of overhead related to connection processing, as well as to handle the increased use of network buffers.

  • Increase size of rate limit send queue: 

    # sysctl dev.mce.<N>.rate_limit.tx_queue_size=1024
    
  • Reduce number of completion events per rate limit send queue: 

    # sysctl dev.mce.<N>.rate_limit.tx_completion_fact=-1
    
  • Increase non-rate-limit send queue size: 

    # sysctl dev.mce.<N>.conf.tx_queue_size=16384
    
  • Reduce number of completion events per send queue: 

    # sysctl dev.mce.<N>.conf.tx_completion_fact=-1
    
  • Increase receive queue size and allow many packets to be accumulated. 

    This gives better TX burst performance: 

    # sysctl dev.mce.<N>.conf.rx_queue_size=16384
    # sysctl dev.mce.<N>.conf.rx_coalesce_usecs=250
    # sysctl dev.mce.<N>.conf.rx_coalesce_pkts=4096
    
  • Note for production. Allow high number of connections to terminate simultaneously: 

    # sysctl net.inet.icmp.icmplim=-1
    
  • Increase memory pool for network buffers: 

    # sysctl kern.ipc.nmbufs=100000000
    


Last updated: